WO2022206498A1 - 一种基于联邦迁移学习的模型训练方法及计算节点 - Google Patents

一种基于联邦迁移学习的模型训练方法及计算节点 Download PDF

Info

Publication number
WO2022206498A1
WO2022206498A1 PCT/CN2022/082380 CN2022082380W WO2022206498A1 WO 2022206498 A1 WO2022206498 A1 WO 2022206498A1 CN 2022082380 W CN2022082380 W CN 2022082380W WO 2022206498 A1 WO2022206498 A1 WO 2022206498A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
computing node
parameter value
model parameter
aggregation
Prior art date
Application number
PCT/CN2022/082380
Other languages
English (en)
French (fr)
Inventor
詹德川
施意
李新春
宋绍铭
邵云峰
李秉帅
钱莉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022206498A1 publication Critical patent/WO2022206498A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a model training method and computing node based on federated transfer learning.
  • Federated learning also known as federated machine learning, federated learning, federated learning, etc.
  • FL can effectively help multiple computing nodes perform data usage and machine learning while meeting the requirements of user privacy protection, data security and government regulations.
  • Learning modeling; transfer learning (TL) is to take the model developed for task A as the initial point and reuse it in the process of developing the model for task B, that is, learn the model trained based on the existing task. The knowledge is transferred to the new task to help the model retrain.
  • federated learning method is called federated averaging (FedAvg).
  • the structure of FedAvg generally includes a server and some clients.
  • the technical process mainly includes the process of model distribution and model aggregation.
  • the client downloads the model from the server, trains it on local data, and uploads the model to the server after training to a certain extent; model aggregation
  • the server will collect the models uploaded by each client and perform model fusion.
  • ADDA adversarial discriminative domain adaptation
  • the feature of ADDA is that the features extracted from the source domain data and the features extracted from the target domain data will be merged together as the features of the training data.
  • the structure generally includes a feature extractor and a discriminator.
  • the feature extractor is responsible for extracting the features of the training data. It is hoped that the extracted features can confuse the discriminator, so that the discriminator cannot distinguish whether the feature belongs to the source domain or the target domain. Then you need to try your best to ensure that you can distinguish which domain the features from the feature extractor belong to.
  • the embodiments of the present application provide a model training method based on federated transfer learning and a computing node, which are used to use the first data set on the first computing node to assist the second data set on the second computing node to train the model, and realize the domain Alignment, and only the parameter values of the model are passed between computing nodes, and no data or data features are passed, which fully protects the privacy of user data.
  • the co-training of the model improves the performance of the model.
  • the embodiments of the present application first provide a model training method based on federated transfer learning, which can be used in the field of artificial intelligence, such as the field of computer vision.
  • the method includes: first, the first computing node on the first computing node; A first model parameter value (denoted by G s ) of a model (eg, a feature extractor) and a second model parameter value (denoted by T s ) of a second model (eg, a classifier) on the first compute node are maintained
  • G s and fixed T s use the first data set on the first computing node for the third model on the first computing node (eg, domain discriminator, or (referred to as the discriminator for short) is trained to obtain the third model parameter value of the third model on the first computing node (represented by D s ), wherein the first model parameter value G s is the first computing node to the third model A model parameter value obtained after model training, and the second model
  • the first model is used to perform feature extraction on the input data; the second model is used to perform a target task based on the features extracted by the first model, for example, the target task may be a classification task (eg, a target detection task , semantic segmentation task, speech recognition task, etc.), or a regression task, which is not limited here; the third model is used to identify the source domain of the features extracted by the first model.
  • the computing node where the input data corresponding to the feature is located can be distinguished according to the data distribution of the source domain, for example, to determine whether the acquired feature is from the source domain device or the target domain device.
  • the first computing node will receive the first aggregated parameter value (denoted by D-all) based on the third model parameter value Ds and the fourth model parameter value (denoted by Dt ) ) to obtain, the fourth model parameter value D t is the model parameter value of the third model on the second computing node, and the third model on the second computing node is adopted by the second computing node.
  • the data set (which may be referred to as the second data set) is obtained by training.
  • the first computing node may also perform the above steps repeatedly until an iteration termination condition is reached, and the iteration termination condition may be a preset training round, or a loss
  • the function convergence can also be other set training termination conditions, which are not specifically limited here.
  • the first data set on the first computing node may be a labeled data set
  • the second data set on the second computing node is an unlabeled or less labeled data set.
  • the first computing node may be one or multiple. When there are multiple first computing nodes, the operations performed on each computing node are similar. Here I won't go into details.
  • the process of training each model on the side of the first computing node and obtaining the model parameter values of each model is described in detail.
  • the first computing node and the second computing node communicate with each other Only the model parameter values, without passing data or data features, protects data privacy; and based on the passed model parameter values, the features extracted from the first data set by the first model on the first computing node and the features on the second computing node are The distribution differences of the features extracted from the second dataset by the first model are gradually reduced during the iterative training process, enabling domain alignment. Therefore, the above-mentioned embodiments of the present application realize the collaborative training of the model under the condition of taking into account the domain alignment and data privacy.
  • the first computing node receives the first aggregated parameter value D-all, and the first aggregated parameter value D-all is based on the third model parameter value Ds and the fourth model parameter value D
  • the process of obtaining t may be: first, the first computing node sends the third model parameter value D s to the second computing node, so that the second computing node sends the third model parameter value D s and the fourth model parameter value D t Perform aggregation to obtain the first aggregation parameter value D-all; after that, the first computing node receives the first aggregation parameter value D-all sent by the second computing node.
  • the first aggregated parameter value D-all is obtained by aggregating a third model parameter value D s and a fourth model parameter value D t ; if there are multiple first computing nodes , the first aggregated parameter value D-all is composed of a plurality of third model parameter values D s (that is, each first computing node has a corresponding third model parameter value D s , D 1 , ..., D n denoted ) is aggregated with the fourth model parameter value D t .
  • the aggregation process of the third model parameter value D s and the fourth model parameter value D t is performed by the second computing node.
  • the second computing node there is no need to deploy a new computing node, which saves costs and expands the
  • the scenarios in which the model training method provided in this application can be applied are presented.
  • the process of obtaining the second aggregation parameter value G-all based on the first model parameter value G s may be divided into two cases based on whether the first computing node is one or more: 1) In the case where there is one first computing node, the first computing node will send the first model parameter value G s (that is, one G s ) on the first computing node to the second computing node.
  • the second aggregation parameter value G-all is essentially the first model parameter value G s ; 2)
  • each first computing node The first model parameter value G s obtained by each other (that is, multiple G s , represented by G 1 , ..., G n respectively) will be sent to the second computing node.
  • the second computing node will receive The obtained first model parameter values G 1 , . . . , G n are aggregated to obtain a second aggregated parameter value G-all.
  • the method further includes: the first computing node sending the updated first model parameter value and the updated second model parameter value to the second computing node. If the first computing node repeatedly performs the above steps until the iteration termination condition is reached, then after the first computing node repeatedly performs the above steps until the iteration termination condition is reached, the method further includes: the first computing node updates the last update obtained The first model parameter value and the second model parameter value obtained by the last update are sent to the second computing node.
  • the first computing node since the first computing node may be one or more than one, in the case of multiple iterations, the first computing node will update the first model parameter value obtained by the last update.
  • Sending the second model parameter value obtained by the last update to the second computing node can be divided into two cases: 1) In the case of one first computing node, the first computing node will The second model parameter value T s is sent to the second computing node, so that the second computing node performs the target task according to the first model on the second computing node and the second model on the second computing node, wherein the second computing node
  • the model parameter value of the first model on the G-all is the second aggregated parameter value G-all obtained from the last update
  • the model parameter value of the second model on the second computing node is the second model parameter value obtained by the last update.
  • each first computing node individually updates the second model parameter value T s obtained by the last update (that is, multiple T s , respectively using T 1 , . . . ). ..., T n ) are sent to the second computing node, so that the second computing node aggregates the second model parameter values T 1 , ..., T n obtained from the last update to obtain the third aggregated parameter value ( can be represented by Ts-all), and further enables the second computing node to perform the target task according to the first model on the second computing node and the second model on the second computing node, wherein the first model on the second computing node
  • the value of the model parameter is the second aggregated parameter value G-all obtained by the last update, and the value of the model parameter of the second model on the second computing node is the third aggregated parameter value Ts-all.
  • the first computing node without deploying a new computing node, after the first computing node completes the iterative training, it will send the second model parameter value obtained by the last update to the second computing node, so that the The second computing node performs the target task based on the latest model parameter values of the first model and the latest model parameter values of the second model, since the respective final model parameter values of the first model and the second model on the second computing node are obtained via the first
  • the computing node and the second computing node are jointly trained, so the model performance is improved.
  • the first computing node receives the first aggregated parameter value D-all, and the first aggregated parameter value D-all is based on the third model parameter value Ds and the fourth model parameter value D
  • the process of obtaining t may also be: the first computing node sends the third model parameter value D s to the third computing node, and the second computing node also sends the fourth model parameter value D t to the third computing node, so as to Make the third computing node aggregate the third model parameter value D s and the fourth model parameter value D t from the second computing node to obtain the first aggregated parameter value D-all; 3.
  • the aggregation process of the third model parameter value D s and the fourth model parameter value D t can be performed by an additionally deployed third computing node, which reduces the computing overhead of the second computing node and improves the The computing speed of the second computing node.
  • the process of obtaining the second aggregation parameter value G-all based on the first model parameter value G s can be divided into two cases according to whether there are one or more first computing nodes: 1) In the case where there is one first computing node, the first computing node will send the first model parameter value G s (ie, one G s ) on the first computing node to the third computing node, and then the third computing node The computing node sends the first model parameter value G s to the second computing node.
  • the second aggregated parameter value G-all is essentially the first model parameter value G s ; 2) On the first computing node, it is In the case of multiple (assuming n, n ⁇ 2), then each first computing node will use the first model parameter value G s obtained by itself (that is, multiple G s , respectively use G 1 ,..., G n represents) sending to the third computing node, so that the third computing node aggregates the received first model parameter values G 1 , . . . , G n to obtain the second aggregated parameter value G-all, And the obtained second aggregation parameter value G-all is sent by the third computing node to the second computing node.
  • the second aggregation parameter value G- How all is obtained in the case of deploying a new computing node (ie, a third computing node), when there are one or more first computing nodes, the second aggregation parameter value G- How all is obtained, with flexibility.
  • the first computing node sends the updated first model parameter value and the updated second model parameter value to the third computing node. If the first computing node repeatedly performs the above steps until the iteration termination condition is reached, then after the first computing node repeatedly performs the above steps and reaches the iteration termination condition, the method further includes: the first computing node updates the A model parameter value and the second model parameter value obtained by the last update are sent to the third computing node. It should be noted that, in this embodiment of the present application, since the first computing node may be one or more than one, in the case of multiple iterations, the first computing node will update the first model parameter value obtained by the last update.
  • Sending the second model parameter value obtained by the last update to the third computing node can be divided into two cases: 1) In the case of one first computing node, the first computing node will send the third computing node obtained by the last update.
  • the second model parameter value T s is sent to the third computing node, and the third computing node sends the second model parameter value T s obtained by the last update to the second computing node, so that the second computing node
  • the first model on the computing node and the second model on the second computing node execute the target task, wherein the model parameter value of the first model on the second computing node is the second aggregated parameter value G- all, the model parameter value of the second model on the second computing node is the second model parameter value T s obtained by the last update; 2) there are multiple (assuming n, n ⁇ 2) on the first computing node
  • each first computing node sends the second model parameter value T s obtained by the last update (that is, multiple T s , respectively represented by T 1 , .
  • the aggregated parameter value Ts-all is sent to the second computing node, so that the second computing node executes the target task according to the first model on the second computing node and the second model on the second computing node, wherein the The model parameter value of the first model is the second aggregated parameter value G-all obtained by the last update, and the model parameter value of the second model on the second computing node is the third aggregated parameter value Ts-all.
  • the second model parameter value obtained by the last update will be sent to the third computing node. It is sent by the computing node, and is directly forwarded by the third computing node or forwarded to the second computing node after aggregation, so that the second computing node performs the target task based on the latest model parameter values of the first model and the latest model parameter values of the second model.
  • the respective final model parameter values of the first model and the second model on the second computing node are obtained by the first computing node and the second computing node using the local data set for co-training, so the model performance is improved.
  • an embodiment of the present application further provides a model training method based on federated transfer learning, which can be used in the field of artificial intelligence, such as the field of computer vision, the method includes: first, the second computing node obtains the second aggregated parameter value G -all, the second aggregation parameter value G-all is obtained based on the first model parameter value G s of the first model trained on one or more first computing nodes, wherein each first computing node uses its own
  • the first data set on the device trains the first model on itself.
  • the first data set may be a labeled data set, and the first model is used to perform feature extraction on the input data.
  • the second computing node uses the second data set on the second computing node to The third model on the second computing node is trained to obtain the fourth model parameter value D t of the third model on the second computing node, wherein the third model is used to identify the source domain of the features extracted by the first model.
  • the computing node where the input data is located can be distinguished according to the data distribution of the source domain, for example, it is determined whether the acquired feature is from the source domain device or the target domain device.
  • the second computing node may also perform the above steps repeatedly until an iteration termination condition is reached, and the iteration termination condition may be a preset training round, or a loss
  • the function convergence can also be other set training termination conditions, which are not specifically limited here.
  • the first model parameter value when there is one first computing node, the first model parameter value can be directly represented by G s , and the second model parameter value can be directly represented by T s ; when the first computing node When there are more than one (assuming n, n ⁇ 2), then each first computing node will obtain its corresponding first model parameter value G s (ie, multiple G s , using G 1 , ..., G n respectively) ) and their corresponding second model parameter values T s (ie, a plurality of T s , respectively represented by T 1 , . . . , T n ).
  • the process of training each model on the second computing node side and obtaining the model parameter values of each model is described in detail.
  • the second computing node and the first computing node communicate with each other. Only the model parameter values, without passing data or data features, protects data privacy; and based on the passed model parameter values, the features extracted from the second dataset by the first model on the second computing node are the same as those on the first computing node.
  • the distribution difference of the features extracted from the first dataset by the first model is gradually reduced during the iterative training process, thereby achieving domain alignment. Therefore, the above-mentioned embodiments of the present application realize the collaborative training of the model while taking into account the domain alignment and data privacy, and can obtain a model with excellent performance on the target task.
  • the method may further include: the second computing node may further acquire the first aggregation parameter value D-all, where the first aggregation parameter value D-all is based on the third model parameter value D s (one or more) and the fourth model parameter value D t are obtained, and the third model parameter value D s is that each first computing node uses the first data set on itself to train the third model on itself The resulting model parameter values.
  • the second computing node updates the fourth model parameter value D t to the first aggregation parameter value D-all, and while keeping the first aggregation parameter value D-all unchanged (ie, fixing D-all), adopts
  • the second dataset trains the first model on the second computing node and the second model on the second computing node to obtain the seventh model parameter value of the first model on the second computing node (represented by G t ' ) and the eighth model parameter value of the second model on the second computing node (represented by T t ').
  • the second computing node side can also fix the first aggregation parameter value D-all on the second computing node.
  • the model and the second model are trained, that is, not only adversarial training is performed on the first computing node, but also adversarial training is performed on the second computing node, so that the features extracted from the first data set and those extracted from the second data set are features faster and better domain alignment, improving training speed and performance.
  • the second computing node obtains the first aggregation parameter value D-all, and the first aggregation parameter value D-all is based on the third model parameter value D s and the fourth model parameter value D
  • the process of obtaining t may be: first, the second computing node receives the third model parameter value D s (one or more) sent by one or more first computing nodes; The model parameter value Dt and each third model parameter value Ds are aggregated to obtain the first aggregated parameter value D-all.
  • the second computing node obtains the second aggregation parameter value G-all, and the second aggregation parameter value G-all is based on the first computing node after training on one or more first computing nodes.
  • the process of obtaining the first model parameter value G s of a model may be: the second computing node receives the updated first model parameter value G s (one or more) sent by one or more first computing nodes, respectively, Aggregate the seventh model parameter value G t ' (that is, the updated model parameter value of the first model on the second computing node) and each updated first model parameter value G s to obtain a second aggregation Parameter value G-all.
  • the first computing node may be one or multiple, it can be divided into two cases: 1) If the first computing node is one, the second computing node will combine the seventh model parameter value G t ' with the one The first model parameter value Gs is aggregated to obtain the second aggregated parameter value G-all; if there are multiple second computing nodes (assuming n, n ⁇ 2), the second computing node will use the seventh model parameter The value G t ′ is aggregated with each of the first model parameter values G 1 , . . . , G n to obtain a second aggregated parameter value G-all.
  • the aggregation process to obtain the second aggregation parameter value is performed by the second computing node, with flexibility.
  • the method may further include: the second computing node updates the second aggregation parameter value based on the updated first model parameter value, and receives data from one or more first computing nodes The updated second model parameter values are sent, and each updated second model parameter value and the updated eighth model parameter value T t ' (that is, the updated second model parameter value on the second computing node model parameter value) for aggregation to obtain the fourth aggregation parameter value (which can be represented by T-all).
  • the second computing node executes the target task according to the first model on the second computing node and the second model on the second computing node, wherein the model parameter value of the first model on the second computing node is the last update For the obtained second aggregation parameter value G-all, the model parameter value of the second model on the second computing node is the fourth aggregation parameter value T-all.
  • the method may further include: the second computing node updates the second aggregation parameter value based on the updated first model parameter value, and receives the data obtained by one or more first computing nodes.
  • the second model parameter value obtained by the last update sent by the node, and the second model parameter value obtained by each last update and the updated eighth model parameter value T t ' (that is, on the updated second computing node
  • the model parameter value of the second model) is aggregated to obtain the fourth aggregated parameter value (which can be represented by T-all).
  • the second computing node receives the second model parameter value T s obtained by the last update sent by the first computing node, and uses the second model parameter value T s obtained by the last update The model parameter value T s and the eighth model parameter value T t ' obtained from the last update are aggregated to obtain the fourth aggregate parameter value T-all; 2) In the case of multiple first computing nodes, the second computing The node receives the second model parameter value T s (that is , a plurality of T s , respectively represented by T 1 , . The second model parameter values T 1 , . . . , T n and T t ′ obtained from the last update are aggregated to obtain the fourth aggregated parameter value T-all.
  • the second computing node executes the target task according to the first model on the second computing node and the second model on the second computing node, wherein the model parameter value of the first model on the second computing node is the last update For the obtained second aggregation parameter value G-all, the model parameter value of the second model on the second computing node is the fourth aggregation parameter value T-all.
  • the second computing node without deploying a new computing node, it is specifically stated that after the second computing node also performs adversarial training, the second computing node will receive the last update sent by the first computing node to obtain The second model parameter value of the The final model parameter values are obtained through the co-training of the first computing node and the second computing node, so the model performance is improved.
  • the second computing node obtains a second aggregation parameter value, and the second aggregation parameter value is based on the first model of the first model trained on the one or more first computing nodes.
  • the process of obtaining the model parameter value may be: receiving the first model parameter value G s sent by each first computing node respectively, and aggregating each received first model parameter value to obtain the second aggregation Parameter value G-all.
  • the first computing node can be one or more than one, it can be divided into two cases: 1) In the case of one first computing node, the second computing node receives the data sent by the first computing node.
  • the first model parameter value G s (that is, a G s ) of the In the case of assuming n, n ⁇ 2)
  • the second computing node receives the first model parameter value G s sent by each first computing node (ie, multiple G s , respectively using G 1 , ..., G n represents ), in this case, the second computing node aggregates the received first model parameter values G 1 , . . . , G n to obtain a second aggregated parameter value G-all.
  • the second aggregation parameter value G is described from the second computing node side when there are one or more first computing nodes respectively. -all can be obtained by the second computing node, with flexibility.
  • the method further includes: 1) in the case that there is one first computing node, the second computing node receives the second model obtained by the last update sent by the first computing node parameter value T s , and execute the target task according to the first model on the second computing node and the second model on the second computing node, wherein the model parameter value of the first model on the second computing node is the last update The obtained second aggregation parameter value G-all, the model parameter value of the second model on the second computing node is the second model parameter value T s obtained by the last update; 2) the first computing node is a plurality of In this case, the second computing node receives the second model parameter value T s (that is, multiple T s , respectively represented by T 1 , .
  • the second model on the computing node executes the target task, wherein the model parameter value of the first model on the second computing node is the second aggregated parameter value G-all obtained by the last update, and the second The model parameter value of the model is the third aggregated parameter value Ts-all.
  • the above process is performed after the second computing node repeatedly performs the above steps until the iteration termination condition is reached.
  • the second computing node without deploying a new computing node, after the second computing node completes the iterative training, it will receive the last updated second model parameter value sent by the first computing node.
  • the computing node will perform the target task based on the latest model parameter values of the first model and the latest model parameter values of the second model, since the final model parameter values of the first model and the second model on the second computing node are calculated by the first
  • the node and the second computing node are obtained by co-training with their respective local data sets, so the model performance is improved while protecting data privacy.
  • the second computing node obtains the first aggregated parameter value D-all, and the first aggregated parameter value D-all is based on the third model parameter value Ds and the fourth model parameter value
  • the process of obtaining D t may also be: first, the second computing node sends the fourth model parameter value D t to the third computing node, and then the second computing node receives the first aggregated parameter value D-all from the third computing node , the first aggregated parameter value D-all is aggregated by the third computing node for each third model parameter value D s from one or more first computing nodes and the fourth model parameter value D t from the second computing node get.
  • the aggregation process of the third model parameter value D s and the fourth model parameter value D t is performed by an additionally deployed third computing node, which reduces the computing overhead of the second computing node and improves the The computing speed of the second computing node.
  • the second computing node obtains the second aggregation parameter value G-all, and the second aggregation parameter value G-all is based on the first computing node after training on one or more first computing nodes.
  • the process of obtaining the first model parameter value of a model may be: first, the second computing node sends the seventh model parameter value G t ' (that is, the updated value of the first model on the second computing node) to the third computing node model parameter value), after that, the second computing node receives the second aggregation parameter value G-all from the third computing node, and the second aggregation parameter value G-all is determined by the third computing node for the seventh model parameter value G t ′ and each updated first model parameter value G s from one or more first computing nodes are aggregated.
  • the first computing node may be one or multiple, it can be divided into two cases: 1) In the case of one first computing node, the second computing node receives the first computing node forwarded by the third computing node.
  • the model parameter value G s (first sent by the first computing node to the third computing node, and then forwarded by the third computing node to the second computing node), in this case, the second aggregation parameter value G-all is essentially the A model parameter value G s ; 2)
  • the second computing node receives the second aggregation parameter value G-forwarded by the third computing node all, the second aggregation parameter value is obtained by aggregating each first model parameter value by the third computing node, wherein each first model parameter value is sent by each first computing node to the third computing node , that is, each first computing node will send the obtained first model parameter value G s (that is, a plurality of G s , respectively represented
  • the node aggregates the received first model parameter values G 1 , . . . , G n to obtain a second aggregated parameter value G-all.
  • the obtained G-all may be further sent to the second computing node.
  • the method may further include: updating the obtained eighth model parameter value T t ' by the second computing node (that is, the updated model of the second model on the second computing node parameter value) to the third computing node, and each first computing node also sends the updated parameter value of the second model to the third computing node, and the third computing node sends each updated second model parameter value to the third computing node.
  • the parameter value and the updated eighth model parameter value T t ' are aggregated to obtain the fourth aggregated parameter value T-all.
  • the method may also be: the second computing node updates the eighth model parameter value T t ' (that is, the updated model parameter value of the second model on the second computing node for the last time) ) to the third computing node, and at the same time, each first computing node also sends the second model parameter value obtained by the last update to the third computing node, and the third computing node
  • the second model parameter value and the eighth model parameter value T t ' obtained from the last update are aggregated to obtain the fourth aggregate parameter value T-all.
  • the third computing node receives the second model parameter value T s obtained by the last update sent by the first computing node, and at the same time the third computing node receives the second computing node Send the eighth model parameter value T t ' obtained by the last update, and aggregate the second model parameter value T s obtained by the last update and the eighth model parameter value T t ' obtained by the last update to obtain The fourth aggregation parameter value T-all; 2) In the case of multiple first computing nodes, the third computing node receives the last updated second model parameter value T s ( That is , a plurality of T s , respectively represented by T 1 , . Aggregate the second model parameter values T 1 , . . .
  • the second computing node receives the fourth aggregation parameter value T-all sent by the third computing node, and executes the target task according to the first model on the second computing node and the second model on the second computing node, wherein , the model parameter value of the first model on the second computing node is the second aggregation parameter value G-all obtained from the last update, and the model parameter value of the second model on the second computing node is the fourth aggregation parameter value Value T-all.
  • the second computing node in the case of deploying a new computing node (ie, the third computing node), it is specifically stated that after the second computing node also performs adversarial training, the second computing node will receive the first computing node.
  • the second model parameter value obtained by the last update sent, so that the second computing node performs the target task based on the latest model parameter value of the first model and the latest model parameter value of the second model.
  • the final model parameter values of the model and the second model are obtained through the co-training of the first computing node and the second computing node, so the performance of the model is improved.
  • the second computing node obtains the second aggregation parameter value G-all, and the second aggregation parameter value G-all is based on the training results on one or more first computing nodes.
  • the process of obtaining the first model parameter value of the first model may be: the second computing node receives the second aggregation parameter value G-all from the third computing node, and the second aggregation parameter value G-all is paired by the third computing node.
  • Each first model parameter value G s from one or more first computing nodes is aggregated.
  • the first computing node may be one or multiple, it can be divided into two cases: 1) In the case where there is one first computing node, the second computing node receives and forwards the data received by the third computing node.
  • the first model parameter value G s (sent by the first computing node to the third computing node, and then forwarded by the third computing node to the second computing node), in this case, the second aggregation parameter value G-all is essentially is the first model parameter value G s ; 2) in the case of multiple first computing nodes (assuming n, n ⁇ 2), the second computing node receives the second aggregation parameter forwarded by the third computing node
  • the value G-all, the second aggregation parameter value is obtained by the third computing node aggregating each first model parameter value, wherein each first model parameter value is sent to the third by each first computing node.
  • the computing node sends, that is, each first computing node will send the obtained first model parameter value G s (that is, multiple G s , respectively represented by G 1 , . . . , G n ) to the third computing node.
  • the third computing node aggregates the received first model parameter values G 1 , . . . , G n to obtain the second aggregation parameter value G-all, and the third computing node will obtain the second aggregation parameter
  • the value G-all is sent to the second computing node.
  • the second aggregation parameter value G is described from the second computing node side when there are one or more first computing nodes respectively. -all can be obtained by the third computing node, with flexibility.
  • the method further includes: 1) in the case that there is one first computing node, the second computing node receives the second model obtained by the last update sent by the third computing node parameter value T s , and perform the target task according to the first model on the second computing node and the second model on the second computing node, and the parameter value T s of the second model obtained by the last update is obtained by the third computing node from the first model.
  • the model parameter value of the first model on the second computing node is the second aggregation parameter value G-all obtained by the last update
  • the model parameter value of the second model on the second computing node is the value The second model parameter value T s obtained from the last update; 2)
  • the second computing node receives the data sent by the third computing node.
  • the third aggregation parameter value Ts-all is used to perform the target task according to the first model on the second computing node and the second model on the second computing node, and the third aggregation parameter value Ts-all is calculated by the third computing node for each
  • the second model parameter values obtained by the last update received by each of the first computing nodes are aggregated T 1 , . . .
  • the updated second aggregation parameter value G-all, the model parameter value of the second model on the second computing node is the third aggregation parameter value Ts-all.
  • the above process is performed after the second computing node repeatedly performs the above steps until the iteration termination condition is reached.
  • the first computing node when a new computing node (ie, the third computing node) is deployed, after the second computing node completes the iterative training, the first computing node will update the second model parameters obtained in the last update.
  • the value is sent to the third computing node, and the third computing node directly forwards or aggregates it and forwards it to the second computing node, so that the second computing node performs execution based on the latest model parameter values of the first model and the latest model parameter values of the second model.
  • the model performance is improved.
  • an embodiment of the present application further provides a model training method based on federated transfer learning, which can be used in the field of artificial intelligence, such as the field of computer vision, the method includes: first, a first computing node (which may be one, or is a plurality of) the first model parameter values (represented by G s ) of the first model (eg, feature extractor) on the first computing node and the first model (eg, the classifier) on the first computing node.
  • a first computing node which may be one, or is a plurality of
  • the first model parameter values represented by G s
  • the first model eg, feature extractor
  • the first data set on the first computing node is used to compare the Three models (eg, domain discriminators, also referred to as discriminators for short) are trained to obtain third model parameter values (represented by D s ) of the third model on the first computing node, wherein the first model
  • the parameter value G s is the model parameter value obtained after the first computing node trains the first model
  • the second model parameter value T s is the model parameter value obtained after the first computing node trains the second model
  • the first data Sets can be labeled datasets.
  • the first model is used to perform feature extraction on the input data; the second model is used to perform a target task based on the features extracted by the first model, for example, the target task may be a classification task (eg, a target detection task , semantic segmentation task, speech recognition task, etc.), or a regression task, which is not limited here; the third model is used to identify the source domain of the features extracted by the first model.
  • the computing node where the input data is located can be distinguished according to the data distribution of the source domain, for example, it is determined whether the acquired feature is from the source domain device or the target domain device.
  • the second computing node will obtain the second aggregation parameter value (which can be represented by G-all), and the second aggregation parameter value G-all is based on the first model of the first model trained on one or more first computing nodes.
  • the model parameter value Gs is obtained, and the second computing node will also use the second aggregation parameter value G-all when the model parameter of the first model on the second computing node is the second aggregation parameter value G-all.
  • the second data set trains the third model on the second computing node to obtain the fourth model parameter value Dt of the third model on the second computing node.
  • the first computing node will receive the first aggregated parameter value (represented by D-all), where the first aggregated parameter value D-all is obtained based on the third model parameter value Ds and the fourth model parameter value Dt. After obtaining the first aggregation parameter value D-all, the first computing node will update the original third model parameter value Ds to the first aggregation parameter value D-all, that is, the third model on the first computing node will be updated.
  • the value of the model parameters of the The first model on the computing node and the second model on the first computing node are then trained to obtain the value of the fifth model parameter of the first model on the first computing node (represented by Gs') and the first model on the first computing node.
  • the sixth model parameter value of the second model of which can be represented by Ts'.
  • the first computing node uses the fifth model parameter value Gs' and the sixth model parameter value Ts' as the new first model parameter value and the new second model parameter value.
  • the above steps may also be repeatedly performed until the iteration termination condition is reached, and the iteration termination condition may be reaching a preset training round, or making the loss function converge, or It may be other set training termination conditions, which are not specifically limited here.
  • the method repeats the above steps until the iteration termination condition is reached as an example for description, which will not be repeated below.
  • the first model parameter value when there is one first computing node, the first model parameter value can be directly represented by G s , and the second model parameter value can be directly represented by T s ; when the first computing node When there are more than one (assuming n, n ⁇ 2), then each first computing node will obtain its corresponding first model parameter value G s (ie, multiple G s , using G 1 , ..., G n respectively) ) and their corresponding second model parameter values T s (ie, a plurality of T s , respectively represented by T 1 , . . . , T n ).
  • the process of training each model and obtaining the model parameter values of each model by the system composed of the first computing node and the second computing node is specifically described.
  • the first computing node and the second computing node Only model parameter values are passed between the two computing nodes, and no data or data features are passed, which protects data privacy; and based on the passed model parameter values, the first model on the first computing node is extracted from the first data set.
  • the distribution difference between the features of , and the features extracted from the second dataset by the first model on the second computing node is gradually reduced in the iterative training process, thereby realizing domain alignment. Therefore, the above-mentioned embodiments of the present application realize the collaborative training of the model under the condition of taking into account the domain alignment and data privacy.
  • the method may further include: the second computing node may further obtain the first aggregation parameter value D-all, the An aggregated parameter value D-all is obtained based on the third model parameter value D s (one or more) and the fourth model parameter value D t , and the third model parameter value D s is used by each first computing node on its own
  • the first data set takes values of model parameters obtained by training the third model on itself.
  • the second computing node updates the fourth model parameter value D t to the first aggregation parameter value D-all, and while keeping the first aggregation parameter value D-all unchanged (ie, fixing D-all), adopts
  • the second dataset trains the first model on the second computing node and the second model on the second computing node to obtain the seventh model parameter value of the first model on the second computing node (represented by G t ' ) and the eighth model parameter value of the second model on the second computing node (represented by T t ').
  • the second computing node may also apply the first model and the second model to the second computing node under the condition that the first aggregation parameter value D-all is fixed.
  • the model is trained, that is, adversarial training is performed not only on the first computing node, but also on the second computing node, so that the features extracted from the first data set and the features extracted from the second data set are faster. Implement domain alignment to improve training speed.
  • the first computing node receives the first aggregated parameter value D-all, and the first aggregated parameter value D-all is based on the third model parameter value Ds and the fourth model parameter value D
  • the process of obtaining t can be as follows: first, the first computing node sends the third model parameter value D s (one or more) to the second computing node, and the second computing node sends the third model parameter value D s and the fourth model parameter value D s to the second computing node.
  • the model parameter values D t are aggregated to obtain the first aggregate parameter value D-all.
  • the process of obtaining the first aggregated parameter value D-all by the second computing node is: combining the third model parameter value D s from the first computing node with the fourth model parameter The value D t is aggregated to obtain the first aggregation parameter value D-all; if there are multiple first computing nodes, the process of obtaining the first aggregation parameter value D-all by the second computing node is: The respective third model parameter value D s (that is, each first computing node has a corresponding third model parameter value D s , which can be represented by D 1 , . . . , D n respectively) and the fourth model parameter value D t aggregate to obtain the first aggregate parameter value D-all. Finally, the second computing node sends the aggregated first aggregation parameter value D-all to the first computing node.
  • the aggregation process of the third model parameter value Ds and the fourth model parameter value Dt is performed by the second computing node. In this case, there is no need to deploy a new computing node, which saves costs.
  • the first computing node receives the first aggregation parameter value D-all, and the first aggregation parameter value D-all is obtained based on the third model parameter value Ds and the fourth model parameter value Dt
  • the process may also be: the first computing node sends the third model parameter value Ds to the third computing node, and the second computing node also sends the fourth model parameter value Dt to the third computing node, and the third computing node sends the The third model parameter value Ds and the fourth model parameter value Dt are aggregated to obtain the first aggregated parameter value D-all; after that, the third computing node sends the first aggregated parameter value D-all to the first computing node.
  • the process for the third computing node to obtain the first aggregated parameter value D-all is to combine the third model parameter value Ds from the first computing node with the fourth model parameter value.
  • Dt is aggregated to obtain the first aggregation parameter value D-all;
  • the process for the third computing node to obtain the first aggregation parameter value D-all is:
  • the third model parameter value Ds (that is, each first computing node has a corresponding third model parameter value Ds, which can be represented by D1, . . . , Dn respectively) is aggregated with the fourth model parameter value Dt to obtain the first aggregate parameter value D-all, and finally the third computing node sends the first aggregation parameter value D-all to the second computing node.
  • the aggregation process of the third model parameter value D s and the fourth model parameter value D t can be performed by an additionally deployed third computing node, which reduces the computing overhead of the second computing node and improves the The computing speed of the second computing node.
  • the second computing node obtains the second aggregation parameter value G-all, and the second aggregation parameter value G-all is based on the first computing node after training on one or more first computing nodes.
  • the process of obtaining the first model parameter value G s of a model may be: the second computing node receives the updated first model parameter value G s (one or more) sent by one or more first computing nodes, respectively, The seventh model parameter value G t ' and each updated first model parameter value G s are aggregated to obtain the second aggregate parameter value G-all.
  • the first computing node may be one or multiple, it can be divided into two cases: 1) If the first computing node is one, the second computing node will combine the seventh model parameter value G t ' with the one The first model parameter value Gs is aggregated to obtain the second aggregated parameter value G-all; if there are multiple second computing nodes (assuming n, n ⁇ 2), the second computing node will use the seventh model parameter The value G t ′ is aggregated with each of the first model parameter values G 1 , . . . , G n to obtain a second aggregated parameter value G-all.
  • the aggregation process to obtain the second aggregation parameter value is performed by the second computing node, with achievability.
  • the method further includes: the first computing node converts the first model parameter value obtained by the last update and the value of the first model parameter obtained by the last update
  • the second model parameter value is sent to the second computing node.
  • the second computing node receives the first model parameter value and the second model parameter value obtained by the last update sent by the one or more first computing nodes, firstly, the second model parameter obtained by each last update is updated.
  • the value and the eighth model parameter value Tt' obtained by the last update are aggregated to obtain the fourth aggregated parameter value (which can be represented by T-all).
  • the second computing node receives the second model parameter value Ts obtained by the last update sent by the first computing node, and uses the second model parameter value Ts obtained by the last update
  • the parameter value Ts and the eighth model parameter value Tt' obtained from the last update are aggregated to obtain the fourth aggregated parameter value T-all
  • the second computing node receives each The second model parameter values Ts obtained by the last update sent by each of the first computing nodes (that is, multiple Ts, represented by T1, .
  • the parameter values T1, . . . , Tn and Tt' are aggregated to obtain a fourth aggregated parameter value T-all.
  • the second computing node executes the target task according to the first model on the second computing node and the second model on the second computing node, wherein the value of the model parameter of the first model on the second computing node is the last update For the obtained second aggregation parameter value G-all, the model parameter value of the second model on the second computing node is the fourth aggregation parameter value T-all.
  • the second computing node will receive the last update sent by the first computing node to obtain The first model parameter value and the second model parameter value, so that the second computing node performs the target task based on the latest model parameter value of the first model and the latest model parameter value of the second model.
  • the final model parameter values of the model and the second model are obtained through the co-training of the first computing node and the second computing node, so the performance of the model is improved.
  • the second computing node when there is one first computing node, the second computing node obtains a second aggregation parameter value, and the second aggregation parameter value is based on one or more first computing nodes
  • the process of obtaining the first model parameter values of the respective trained first models may be: the second computing node receives the first model parameter value G s (ie a G s ) sent by the first computing node, in this case , the second aggregation parameter value G-all is essentially the first model parameter value G s ; when there are multiple first computing nodes (assuming n, n ⁇ 2), the second computing node obtains the second aggregation
  • the process of obtaining the second aggregated parameter value based on the first model parameter values of the respectively trained first models on one or more first computing nodes may be:
  • the first model parameter values G s sent by the nodes respectively that is , multiple G s , respectively represented by G 1 , .
  • the second aggregation parameter value G-all can be determined by the second aggregation parameter value G-all. Compute nodes are obtained, with flexibility.
  • the method further includes: the first computing node converts the first model parameter value obtained by the last update and the value obtained by the last update to the first model parameter value obtained by the last update.
  • the second model parameter value of is sent to the second computing node. Since the first computing node may be one or more than one, the introduction is divided into two cases here: 1) In the case of one first computing node, the second computing node receives the data sent by the first computing node.
  • the target task will be executed according to the first model on the second computing node and the second model on the second computing node, wherein the first model on the second computing node
  • the model parameter value is the second aggregation parameter value G-all obtained by the last update
  • the model parameter value of the second model on the second computing node is the second model parameter value T s obtained by the last update; 2)
  • the second computing node receives the second model parameter value T s obtained by the last update sent by each first computing node (ie, multiple T s , respectively denoted by T 1 , ..., T n ), the second model parameter values T 1 , ..., T n obtained from each last update will be aggregated to obtain the third aggregated parameter value Ts-all, and according to the first
  • the first model on the second computing node and the second model on the second computing node execute the target task, wherein the model parameter value of the first model on the second computing node
  • the second computing node will receive the last updated first model parameter value and the second model sent by the first computing node. model parameter value, the second computing node will execute the target task based on the latest model parameter value of the first model and the latest model parameter value of the second model, because the final model parameters of the first model and the second model on the second computing node The value is obtained through the co-training of the first and second computing nodes, so the model performance is improved.
  • the second computing node obtains the first aggregation parameter value D-all, and the first aggregation parameter value D-all is based on the third model parameter value D s and the fourth model parameter value
  • the process of obtaining D t may also be: first, the second computing node sends the fourth model parameter value D t to the third computing node, and the first computing node also sends the third model parameter value D s (one or more) to the third computing node.
  • the third computing node sends the data, and the third computing node aggregates the third model parameter value D s and the fourth model parameter value D t to obtain the first aggregation parameter value D-all.
  • the process of obtaining the first aggregated parameter value D-all by the third computing node is: combining the third model parameter value D s from the first computing node with the fourth model parameter The value D t is aggregated to obtain the first aggregation parameter value D-all; if there are multiple first computing nodes, the process for the third computing node to obtain the first aggregation parameter value D-all is: The respective third model parameter value D s (that is, each first computing node has a corresponding third model parameter value D s , which can be represented by D 1 , . . . , D n respectively) and the fourth model parameter value D t aggregate to obtain the first aggregate parameter value D-all. Finally, the third computing node sends the aggregated first aggregation parameter value D-all to the second computing node.
  • the aggregation process of the third model parameter value D s and the fourth model parameter value D t is performed by an additionally deployed third computing node, which reduces the computing overhead of the second computing node and improves the The computing speed of the second computing node.
  • the second computing node obtains the second aggregation parameter value G-all, and the second aggregation parameter value G-all is based on the first computing node after training on one or more first computing nodes.
  • the process of obtaining the first model parameter value of a model may be as follows: first, the second computing node sends the seventh model parameter value G t ' to the third computing node, and then the third computing node sends the seventh model parameter value G t ' to the third computing node.
  • each updated first model parameter value Gs from one or more first computing nodes are aggregated to obtain a second aggregated parameter value G-all, which is then aggregated by the third computing node
  • the value G-all is sent to the second computing node. Since the first computing node may be one or multiple, the aggregation process can be divided into two cases: 1) In the case of one first computing node, the second computing node receives the data forwarded by the third computing node.
  • the first model parameter value G s (first sent by the first computing node to the third computing node, and then forwarded by the third computing node to the second computing node), in this case, the second aggregation parameter value G-all is essentially the first model parameter value G s ; 2) when there are multiple first computing nodes (assuming n, n ⁇ 2), the second computing node receives the second aggregation parameter value forwarded by the third computing node G-all, the second aggregated parameter value G-all is obtained by aggregating each first model parameter value by the third computing node, wherein each first model parameter value is sent to the third computing node by each first computing node Sending, that is, each first computing node will send the obtained first model parameter value G s (that is, multiple G s , respectively represented by G 1 , . . . , G n ) to the third computing node, the third The computing node aggregates the received first model parameter values G 1 , . . . , G n to obtain
  • the method further includes: the first computing node converts the first model parameter value obtained by the last update and the value of the first model parameter obtained by the last update The second model parameter value is sent to the third computing node.
  • the second computing node also sends the eighth model parameter value T t ' obtained by the last update to the third computing node, and the third computing node The second model parameter value obtained by one update and the eighth model parameter value T t ′ obtained by the last update are aggregated to obtain the fourth aggregate parameter value T-all.
  • the third computing node receives the second model parameter value T s obtained by the last update sent by the first computing node, and at the same time the third computing node receives the second computing node Send the eighth model parameter value T t ' obtained by the last update, and aggregate the second model parameter value T s obtained by the last update and the eighth model parameter value T t ' obtained by the last update to obtain The fourth aggregation parameter value T-all; 2) In the case of multiple first computing nodes, the third computing node receives the last updated second model parameter value T s ( That is , a plurality of T s , respectively represented by T 1 , . Aggregate the second model parameter values T 1 , . . .
  • the second computing node receives the fourth aggregation parameter value T-all sent by the third computing node, and executes the target task according to the first model on the second computing node and the second model on the second computing node, wherein , the model parameter value of the first model on the second computing node is the second aggregation parameter value G-all obtained from the last update, and the model parameter value of the second model on the second computing node is the fourth aggregation parameter value Value T-all.
  • the second computing node obtains the second aggregation parameter value G-all, and the second aggregation parameter value G-all is based on the training results on one or more first computing nodes.
  • the process of obtaining the first model parameter value of the first model may be: the second computing node receives the second aggregation parameter value G-all from the third computing node, and the second aggregation parameter value G-all is paired by the third computing node.
  • Each first model parameter value G s from one or more first computing nodes is aggregated.
  • the first computing node may be one or multiple, it can be divided into two cases: 1) In the case where there is one first computing node, the second computing node receives and forwards the data received by the third computing node.
  • the first model parameter value G s (sent by the first computing node to the third computing node, and then forwarded by the third computing node to the second computing node), in this case, the second aggregation parameter value G-all is essentially is the first model parameter value G s ; 2) in the case of multiple first computing nodes (assuming n, n ⁇ 2), the second computing node receives the second aggregation parameter forwarded by the third computing node
  • the value G-all, the second aggregation parameter value is obtained by the third computing node aggregating each first model parameter value, wherein each first model parameter value is sent to the third by each first computing node.
  • the computing node sends, that is, each first computing node will send the obtained first model parameter value G s (that is, multiple G s , respectively represented by G 1 , ..., G n ) to the third computing node, the The third computing node aggregates the received first model parameter values G 1 , . . . , G n to obtain the second aggregation parameter value G-all, and the third computing node will obtain the second aggregation parameter The value G-all is sent to the second computing node.
  • the second aggregation parameter value G is described from the second computing node side when there are one or more first computing nodes respectively. -all can be obtained by the third computing node, with flexibility.
  • the method further includes: the first computing node converts the first model parameter value obtained by the last update and the value obtained by the last update to the first model parameter value obtained by the last update.
  • the second model parameter value of is sent to the third computing node. Since the first computing node may be one or more than one, it will be introduced in two cases here: 1) In the case where there is one first computing node, the second computing node receives the data sent by the third computing node.
  • the second model parameter value T s obtained from the last update, and the target task is executed according to the first model on the second computing node and the second model on the second computing node, and the second model parameter value T s obtained from the last update Obtained from the first computing node by the third computing node, wherein the model parameter value of the first model on the second computing node is the second aggregated parameter value G-all obtained by the last update, and the value of the second aggregation parameter G-all on the second computing node
  • the model parameter value of the second model is the second model parameter value T s obtained by the last update; 2)
  • the second computing node Receive the third aggregation parameter value Ts-all sent by the third computing node, and execute the target task according to the first model on the second computing node and the second model on the second computing node, the third aggregation parameter value Ts-all obtained by the third computing node from the aggregation of the second model parameter values T 1
  • the first computing node when a new computing node (ie, the third computing node) is deployed, after the iterative training is completed, the first computing node will transfer the second model parameter value obtained by the last update to the third computing node. It is sent by the computing node and forwarded directly or aggregated by the third computing node to the second computing node.
  • the second computing node will perform the target task based on the latest model parameter values of the first model and the latest model parameter values of the second model.
  • the respective final model parameter values of the first model and the second model on the two computing nodes are obtained through the co-training of the first computing node and the second computing node, so the model performance is improved.
  • an embodiment of the present application further provides a data processing method, the method comprising: first, a computer device obtains input data to be processed, the input data is related to a target task to be executed, for example, when the target task is a classification task , then the input data refers to the data used for classification.
  • the computer device performs feature extraction on the input data through the trained first model to obtain a feature map, and processes the feature map through the trained second model to obtain output data, wherein the trained first model
  • the model parameter value of a model and the model parameter value of the trained second model are determined by the method of the first aspect or any possible implementation manner of the first aspect, or, the second aspect or the second aspect.
  • the method of the implementation manner, or, the third aspect or the method of any possible implementation manner of the third aspect is obtained by training.
  • the target task is the target detection task
  • the target detection task is generally aimed at the detection of target objects in the image.
  • the input data generally refers to the input image.
  • the computer equipment first uses the trained first model to perform feature extraction on the input image, and then uses the trained first model.
  • the second model performs target detection on the extracted feature map to obtain the detection result, that is, the output data is the detection result.
  • the target task is a classification task
  • the classification task may be performed on images.
  • the input data refers to the input images.
  • the computer equipment first uses the trained first model to perform feature extraction on the input images, and then uses the trained first model to extract features from the input images.
  • the latter second model classifies the extracted feature map, and outputs the classification result, that is, the output data is the classification result of the image.
  • the classification task may be performed not only for images, but also for text or audio.
  • the input data refers to the corresponding text data or audio data
  • the output data refers to the classification of text. Result or classification result of audio.
  • a fifth aspect of the embodiments of the present application provides a computing node, where the computing node, as a first computing node, has a function of implementing the method of the first aspect or any possible implementation manner of the first aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • a sixth aspect of the embodiments of the present application provides a computing node, where the computing node, as a second computing node, has the function of implementing the method of the second aspect or any possible implementation manner of the second aspect.
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • a seventh aspect of an embodiment of the present application provides a computing system, where the computing system includes a first computing node and a second computing node, and the computing system has the function of implementing the method of the third aspect or any possible implementation manner of the third aspect .
  • This function can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • An eighth aspect of an embodiment of the present application provides a computing node.
  • the computing node may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to call a program stored in the memory. to execute the method of the first aspect or any possible implementation manner of the first aspect in the embodiments of the present application.
  • a ninth aspect of an embodiment of the present application provides a computing node.
  • the computing node may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to call a program stored in the memory. to execute the method of the second aspect or any possible implementation manner of the second aspect in the embodiments of the present application.
  • a tenth aspect of an embodiment of the present application provides a computer device, which may include a memory, a processor, and a bus system, wherein the memory is used to store a program, and the processor is used to call the program stored in the memory to execute the fourth aspect of the embodiment of the present application or any one of the possible implementations of the fourth aspect.
  • An eleventh aspect of the embodiments of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are executed on a computer, the computer can execute the first aspect or any of the first aspects.
  • a twelfth aspect of the embodiments of the present application provides a computer program or computer program product including instructions, when the computer program or computer program product is run on a computer, the computer is made to execute the first aspect or any one of the first aspects.
  • a thirteenth aspect of an embodiment of the present application provides a chip, the chip includes at least one processor and at least one interface circuit, the interface circuit is coupled to the processor, and the at least one interface circuit is configured to perform a transceiving function and send an instruction
  • at least one processor is used to run a computer program or instruction, which has the function of implementing the method as described above in the first aspect or any possible implementation manner of the first aspect, and the function can be implemented by hardware or by Software implementation can also be implemented by a combination of hardware and software, where the hardware or software includes one or more modules corresponding to the above functions.
  • the interface circuit is used to communicate with other modules outside the chip. For example, the interface circuit can send the model parameter values of each model trained on the chip to the target device.
  • FIG. 1 is a schematic structural diagram of an artificial intelligence main frame provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a federated transfer learning system provided by an embodiment of the present application
  • FIG. 3 is another schematic diagram of the federated transfer learning system provided by the embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application
  • FIG. 5 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application
  • FIG. 6 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application
  • FIG. 7 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application.
  • FIG. 8 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application.
  • FIG. 9 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application.
  • FIG. 10 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application.
  • FIG. 11 is another schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a first computing node provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a second computing node provided by an embodiment of the present application.
  • 15 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 16 is a schematic structural diagram of a device provided by an embodiment of the present application.
  • the embodiments of the present application provide a model training method based on federated transfer learning and a computing node, which are used to use the first data set on the first computing node to assist the second data set on the second computing node to train the model, and realize the domain Alignment, and only the model parameter values of the model are passed between computing nodes, no data or data features are passed, and user data privacy is fully protected.
  • the collaborative training of the model improves the performance of the model.
  • the embodiments of the present application involve a lot of related knowledge of federated learning, transfer learning, model training, etc.
  • related terms and concepts that may be involved in the embodiments of the present application are first introduced below. It should be understood that the related concept interpretation may be limited due to the specific circumstances of the embodiments of the present application, but it does not mean that the present application can only be limited to the specific circumstances, and there may be differences in the specific circumstances of different embodiments. There is no specific limitation here.
  • a neural network can be composed of neural units, which can be specifically understood as a neural network with an input layer, a hidden layer, and an output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle is all is the hidden layer
  • each layer in the neural network can be called a neural network layer.
  • a neural network with many hidden layers is called a deep neural network (DNN).
  • DNN deep neural network
  • the work of each layer in a neural network can be expressed mathematically To describe, from the physical level, the work of each layer in the neural network can be understood as completing the transformation from the input space to the output space (that is, the row space of the matrix to the column through five operations on the input space (set of input vectors) Space), these five operations include: 1.
  • Space refers to the collection of all individuals of this type of thing, where W is the weight matrix of each layer of the neural network , each value in the matrix represents the weight value of a neuron in the layer.
  • the matrix W determines the space transformation from the input space to the output space described above, that is, the W of each layer of the neural network controls how to transform the space.
  • the purpose of training the neural network is to finally get the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially learning the way to control the spatial transformation, and more specifically, learning the weight matrix.
  • the learning model (may also be referred to as a learner, model, etc.) or other types of Machine models are essentially neural networks.
  • the loss function is used to characterize the gap between the predicted category and the true category, and the cross entropy loss function is a commonly used loss function in classification tasks.
  • the error back propagation (BP) algorithm can be used to correct the size of the parameters in the initial neural network model, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal until the output will generate an error loss, and updating the parameters in the initial neural network model by back-propagating the error loss information, so that the error loss converges.
  • the back-propagation algorithm is a back-propagation movement dominated by error loss, aiming to obtain the parameters of the optimal neural network model, such as the weight matrix.
  • a feature refers to the input variable, the x variable in simple linear regression, a simple machine learning task may use a single feature, while a more complex machine learning task may use millions of features.
  • the label is the y variable in simple linear regression, the label can be the future price of wheat, the species of animal/plant shown in the picture, the meaning of the audio clip, or anything.
  • a label may refer to a classification category of a picture. For example, there is a picture of a cat, people know it is a cat, but the computing device does not know it is a cat, what should I do? Then put a label on this picture, and the label is used to indicate to the computing device that the information contained in the picture is "cat", and then the computing device knows that it is a cat, and the computing device learns all cats based on this label. All cats can be known through this one cat. Therefore, labeling the data is to tell the computing device what the multiple features of the input variable describe (ie, y), and y can be called the label, or it can be called the target (ie, the target value).
  • a sample refers to a specific instance of data
  • a sample x represents an object
  • d represents the sample
  • the dimension of x that is, the number of features
  • samples are divided into labeled samples and unlabeled samples
  • labeled samples contain both features and labels
  • unlabeled samples contain features but no labels.
  • the task of machine learning is often to learn the input d
  • the latent patterns in the dimensional training sample set (which can be referred to as the training set for short).
  • the learning models adopted for tasks based on machine learning are essentially neural networks.
  • the model defines the relationship between the feature and the label.
  • the application of the model generally includes two stages: training and inference.
  • the training stage is used to train the model according to the training set to obtain the model parameter values of the trained model (same as those mentioned above.
  • the weight matrix of each layer of the neural network described above is similar), in the embodiment of this application, the data sets such as the first data set and the second data set are used as training sets to train each model involved in this application;
  • the trained model is used to perform label prediction on real unlabeled instances, and the prediction accuracy is one of the important indicators to measure the quality of a model training.
  • a neural network In the field of deep learning, since the neural network is composed of neural units, a neural network generally contains multiple neural network layers. Therefore, if the neural network is divided according to the specific functions of the neural network layers, various functions with specific functions can be obtained.
  • the neural network module of the present application several neural network modules involved in the embodiments of the present application are introduced here.
  • Feature extractor The part of the neural network from the input layer to an intermediate layer is used to extract features from the input data (such as samples), specifically by converting the original input data (such as , pictures, text, etc.) to extract some important features.
  • the first model may be a feature extractor.
  • Classifier According to the different tasks to be performed, some neural network layers after the feature extractor can have different functions. This part of the neural network layer can be called a subtask model, which is used to classify, regress or Other downstream subtasks, etc.
  • the downstream subtasks may be object detection tasks, classification tasks, speech recognition tasks, semantic segmentation tasks, and the like.
  • the following takes the subtask model used in the classification task as an example for description: when the subtask model is used in the classification task, it is used to classify the features extracted by the feature extractor to obtain the predicted label.
  • the second model may be a sub-task model, which is used to perform the target task based on the features extracted from the first model.
  • the second model may be a classifier.
  • the second model is taken as an example as a classifier for illustration.
  • Discriminator The structure is a part of the neural network layer after the feature extractor, which is used to identify the domain to which the features extracted by the feature extractor belong. It can be understood as a domain classifier (a special classifier), It's just that the input data is not classified at this time, but the source domain of the input data is distinguished.
  • the third model may be a discriminator.
  • Adversarial training is an important way to enhance the robustness of neural networks.
  • the adversarial training in this case refers to the combination of the feature extractor and the discriminator.
  • Adversarial training between discriminators Specifically, on the one hand, the discriminator needs to be trained to distinguish whether an extracted feature is from the target domain or the source domain; on the other hand, the feature extractor needs to be trained to extract enough confusion. The characteristics of the discriminator, in the process of confrontation between the two, both sides are effectively trained.
  • Federated learning is a machine learning method used to protect user privacy.
  • due to limitations such as insufficient data features on a single device or a small number of samples it is difficult to train a better machine learning model alone, so it is necessary to fuse the data of multiple devices together.
  • Federated learning came into being based on this requirement, which can effectively help multiple computing nodes to perform data usage and machine learning modeling while meeting the requirements of user privacy protection, data security and government regulations.
  • Transfer learning is a machine learning method that takes the model developed for task A as an initial point and reuses it in the process of developing a model for task B. That is to say, the knowledge learned from the model trained based on the existing task (such as the said task A) is transferred to the new task (such as the said task B) to help the model to retrain, through transfer learning
  • the knowledge that has been learned (contained in the model parameters) is shared with new tasks in some way to speed up and optimize the learning efficiency of the model, so that the model does not have to learn from scratch. For example, in the object detection task, using the model trained on the ImageNet dataset as the model for the new task can significantly improve the training efficiency.
  • the source domain refers to the side where knowledge is transferred out
  • the target domain is the side where knowledge is transferred in.
  • Federated transfer learning is a machine learning method that combines federated learning and transfer learning, that is, multi-task collaborative training of models (or neural networks) without sharing private data.
  • IID means that the probability distribution of each variable in a set of random variables is the same, and these random variables are independent of each other.
  • a set of random variables that are independent and identically distributed does not imply that every event has the same probability in their sample space. For example, the sequence of outcomes from rolling non-uniform dice is IID, but the probability of rolling each face up is not the same.
  • IID means that all samples in the input space X obey an implicit unknown distribution, and all samples of training data are independently sampled from this distribution; non-IID means that the training data are not sampled from the same distribution, or the training data are not sampled independently.
  • the data on the source domain is generally labeled data
  • the data on the target domain is generally data with no/less labels. Due to the lack of labels in the existing data in the target domain, it is very difficult to directly complete the related machine learning tasks, and the assistance of the source domain data is often required to improve the performance of the model and complete the related tasks. Since the data between different domains is often not independent and identically distributed, such a distribution difference makes the direct transfer of knowledge less effective, so it is often necessary to adopt a certain method to align the source domain and the target domain. Generally speaking, domain alignment It is to align the data distribution between different domains, thereby improving the transfer effect of the transfer learning. In this embodiment of the present application, domain alignment refers to aligning the distribution of data features extracted from different domains.
  • Figure 1 shows a schematic structural diagram of the main frame of artificial intelligence.
  • the above-mentioned artificial intelligence theme framework is explained in two dimensions (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as IoT data from traditional devices, including business data from existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, the productization of intelligent information decision-making, and the realization of landing applications. Its application areas mainly include: intelligent terminals, intelligent manufacturing, Smart transportation, smart home, smart healthcare, smart security, autonomous driving, smart city, etc.
  • the embodiments of the present application can be applied to the optimization of training methods for various models in machine learning, and the models trained by the model training method based on federated transfer learning of the present application can be specifically applied to various sub-fields in the field of artificial intelligence , for example, the field of image processing in the field of computer vision.
  • the data acquired by the infrastructure in this embodiment of the present application may be a local data set on each computing node in this embodiment of the present application.
  • the first The first data set on the computing node, the second data set on the second computing node, etc., the data in each data set may specifically be video data (for example, can be captured by a monitoring system), image data (for example, can be obtained from the mobile terminal.
  • each data set is not limited here, wherein the first computing node is used as a source domain device on which The first data set is a labeled data set, the second computing node is used as the target domain device, and the second data set on it is unlabeled data or data with a small number of labels.
  • FIG. 2 is a schematic diagram of the federated transfer learning system provided by the embodiment of the present application.
  • the system may include n first computing nodes (which can be denoted as S 1 , S 2 ,...,S n ) and one second computing node (which can be denoted as T), where n is an integer greater than or equal to 1, That is, the first computing node may be one or multiple, which is not limited here.
  • the first computing node serves as a source domain device
  • the second computing node serves as a target domain device
  • each computing node has a local data set
  • each first computing node has a local data set
  • It can be called the first data set
  • the local data set on the second computing node can be called the second data set.
  • Each first data set is a labeled data set
  • the second data set is unlabeled or less labeled. data set.
  • the data set on each computing node also has its own data distribution D, as shown in Figure 2, the data distribution of each first data set is D 1 , D 2 ,..., D n , the data of the second data set is The distribution is D T .
  • each computing node has the same initial model structure, and the domain alignment between different domains (including all source domain devices and target domain devices) is achieved through adversarial training.
  • Adversarial training makes the model of the initial model on each computing node.
  • the parameter values may be different.
  • the model parameter values of the model on each first computing node can be recorded as M 1 , M 2 ,..., Mn respectively, and the model parameter values on the second computing node can be recorded as M T , Specifically, it can be shown in Figure 2.
  • the model parameter values of the corresponding models (such as feature extractors, discriminators, classifiers, etc.) on each domain are aggregated into M (there are many ways to aggregate , for example, a simple average at the model parameter level, or by introducing some weighted averages or other more complex aggregation methods), and then assigning the aggregated model parameter value M to the model of the model on all computing nodes parameters, the whole process above is called a training iteration. After that, go through multiple rounds of iterations until a preset iteration round is reached, or other set training stopping conditions. Specifically, it may include the following steps: Step 1.
  • Step 2 in each first computing node, keep the model parameter values of the first model and the model parameter values of the second model unchanged.
  • Step 2 use the respective local data sets to train respective third models, which can be discriminators, and the second computing node also maintains the model parameter values of its own first model (by the first model of each first computing node).
  • Step 3 the model parameters of the third model trained on each first computing node The value is aggregated with the model parameter value of the third model trained on the second computing node to form the aggregated parameter value of the third model; step 4, assigning the obtained aggregated parameter value of the third model to each of the first computing nodes.
  • the model parameters of the third model, and each first computing node uses the respective local data set to train the respective first model and the second model again; Step 5, continuously iterate steps 2-4 until the iteration termination condition is reached .
  • the model of the corresponding model (eg, feature extractor, discriminator, classifier, etc.)
  • the parameter values are aggregated.
  • the second computing node as the target domain device can also implement the function of aggregating the model parameter values of the corresponding models in each domain.
  • the federated transfer learning system may not include a third computing node.
  • FIG. 3 is another schematic diagram of the federated transfer learning system provided by the embodiment of the application.
  • the function of value aggregation is carried by the second computing node as the target domain device.
  • the second computing node is generally used as a target domain device. If there are multiple second computing nodes, the model on each second computing node is sequentially analyzed in a similar manner. Training is not repeated here.
  • the function of aggregating the model parameter values of the corresponding models in each domain may also be implemented by the first computing node as the source domain device.
  • the specific aggregation process It is similar to the above-mentioned second computing node or third computing node, and will not be repeated here.
  • one of the first computing nodes may be arbitrarily selected as the execution body for the aggregation of model parameter values, or it may be a conformance selected by the user according to actual needs.
  • the first computing node of the condition is used as the execution body of the model parameter value aggregation, and there are various selection methods, which are not specifically limited here.
  • the model training method based on federated transfer learning provided by the embodiment of the present application It is also slightly different.
  • by deploying a new third computing node to aggregate the model parameter values for model training, and carrying the aggregation function of the third computing node on the second computing node for model training in the method flow There are also slight differences.
  • the following describes the model training method based on federated transfer learning provided by the embodiment of the present application from the perspective of one or more first computing nodes and whether a new third computing node is deployed.
  • the local data set on the first computing node is the first data set
  • the local data set on the second computing node is the second data set
  • each computing node uses The respective local data sets are used to train each model, which will not be repeated below.
  • s represents the source domain
  • t represents the target domain
  • the model parameters of the first model to the third model are G, T respectively.
  • D said.
  • the data and labels from the respective first data sets of the n first computing nodes, and the corresponding data distribution and label distribution are shown in the following formula (1):
  • x t is the data on the second computing node, is the data distribution of the second data set on the second computing node.
  • first model, the second model, and the third model are denoted as g( ⁇ ), c( ⁇ ), and d( ⁇ ), respectively.
  • first model, second model and third model from the i-th first computing node are respectively:
  • the first model, the second model and the third model from the second computing node are respectively: g t ( ⁇ ), c t ( ⁇ ), d t ( ⁇ ).
  • the first computing node is one, and a new third computing node is deployed
  • FIG. 4 is a schematic flowchart of a model training method based on federated transfer learning provided by an embodiment of the present application.
  • This embodiment targets a scenario where a single source domain device (ie, a single first computing node) is used. Local data and labels are used to assist a target domain device (ie, the second computing node) with no or few local data labels to improve model performance.
  • the method may include the following steps:
  • the first computing node locally trains the first model and the second model, and sends the model parameter value G s of the first model and the model parameter value T s of the second model obtained by training to the third computing node.
  • the first computing node uses its own first data set to perform supervised training on the first model on the first computing node and the second model on the first computing node, thereby obtaining the model parameter value G s of the first model ( G s can be referred to as the first model parameter value) and the model parameter value T s of the second model (T s can be referred to as the second model parameter value), and the obtained model parameter value G s and model parameter value T s are converted to The third computing node sends.
  • the first model is used to perform feature extraction on the input data, so the first model can also be referred to as a feature extractor, and the second model is used to perform the target task based on the features extracted by the first model (eg, object detection tasks, speech recognition tasks, semantic segmentation tasks, etc.), so the second model can be referred to as a subtask model (eg, a classifier in a classification task).
  • the first model can also be referred to as a feature extractor
  • the second model is used to perform the target task based on the features extracted by the first model (eg, object detection tasks, speech recognition tasks, semantic segmentation tasks, etc.), so the second model can be referred to as a subtask model (eg, a classifier in a classification task).
  • the first computing node first inputs the training data in the first data set into the first model, the first model extracts corresponding features from the training data, and then the first model transmits the extracted features to the second model
  • the extracted features will be input to the classifier for prediction to obtain the predicted class label, and then the predicted class label and the real class label can be characterized by the loss function. Differences between labels.
  • a typical loss function is the cross entropy loss function, which can be expressed as the following formula (3):
  • Equation (3) is only an illustration of a loss function in the embodiment of the present application, and an appropriate loss function can be selected according to actual application requirements, which is not limited here.
  • the first model and the second model may belong to different parts of the same neural network.
  • the first model may be used as a feature extractor to analyze the input data. Perform feature extraction, and the second model can be used as a label classifier to identify the features extracted by the first model.
  • the first model and the second model can be trained together to obtain a model of the first model.
  • the parameter value and the model parameter value of the second model can be fixed together, trained together, and uploaded together; in other application scenarios, the first model and the second model can also be trained separately.
  • the obtained model parameter values of the first model and the model parameter values of the second model do not have to be fixed together, trained together, and uploaded together.
  • the first computing node may also just send the model parameter value G s of the first model obtained by training to the third computing node.
  • the first computing node sends the model parameter value T s of the second model obtained by the last update to the third computing node.
  • the first model on the second computing node may be initialized with the G t .
  • the first computing node trains the third model locally while keeping G s and T s unchanged (which may be referred to as fixed G s and T s ) to obtain the model parameter value D s of the third model, And send D s to the third computing node.
  • the first computing node After the first computing node locally trains the first model and the second model on the first computing node, it will locally train the third model on the first computing node while keeping G s and T s unchanged.
  • the model parameter value D s of the third model on the first computing node is obtained (D s may be referred to as the third model parameter value), and D s is sent to the third computing node.
  • the third model is used to identify the domain to which the features extracted by the first model belong, and can be understood as a domain classifier (a special classifier), but at this time it is not an input data To classify, but to distinguish the source domain of the input data.
  • the goal of the third model is to try to distinguish whether the incoming features come from the source domain or the target domain.
  • the domain label of the source domain is 0 and the domain label of the target domain is 1, so the third model on the first computing node should try to output the predicted label 0.
  • a typical loss function can be It is represented by the following formula (4):
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t , and send D t to the third computing node.
  • the second computing node After the second computing node receives the model parameter value G s (that is, G t ) sent by the third computing node , it will also in the local Train the third model on the second computing node, so as to obtain the model parameter value D t of the third model on the second computing node (D t may be referred to as the fourth model parameter value), and change the model parameter value D t to the third model parameter value D t Sent by the compute node.
  • G s that is, G t
  • D t may be referred to as the fourth model parameter value
  • the goal of the third model is to try to distinguish whether the incoming features come from the source domain or the target domain.
  • the domain label of the source domain is 0 and the domain label of the target domain is 1, then the third model on the second computing node should try to output the predicted label 1.
  • a typical loss function can be expressed as the following formula (5 ) as shown:
  • equation (5) is only a schematic representation of a loss function in the embodiment of the present application, and an appropriate loss function can be selected according to actual application requirements, which is not limited here.
  • step 403 and step 404 are executed simultaneously, which is not specifically limited here.
  • the third computing node aggregates D s and D t to obtain a first aggregation parameter value D-all.
  • the third computing node After receiving the D s sent by the first computing node and the D t sent by the second computing node respectively, the third computing node aggregates the D s and D t to obtain the first aggregation parameter value D-all. In this way, when the model parameter of the third model is assigned the first aggregated parameter value D-all, the third model has the ability to identify the data features on the first data set and the data features on the second data set at the same time.
  • the aggregation method is not specifically limited here. Since the present application aggregates model parameter values of a model, and only transmits model parameter values or aggregated parameter values, and does not involve the transmission of original data or data features, data privacy can be protected.
  • the first computing node updates D s to D s ', and retrains the first model locally while keeping the model parameter values of the third model on the first computing node unchanged (that is, fixing D s '). and the second model, and send the model parameter value G s ' of the first model and the model parameter value T s ' of the second model obtained by training to the third computing node.
  • the first computing node After receiving the first aggregation parameter value D-all (ie D s ') sent by the third computing node, the first computing node updates D s to D s ' (that is, updates the third model parameter value to the first aggregation parameter. value), and while keeping the model parameter values of the third model on the first computing node unchanged (that is, fixing D s '), retrain the first model and the second model locally, and retrain the first model obtained by training
  • the model parameter value G s ' of the model (G s ' can be referred to as the fifth model parameter value)
  • the model parameter value T s ' of the second model T s ' can be referred to as the sixth model parameter value) are sent to the third computing node send.
  • the first computing node fixes D s ' and retrains the first model and the second model locally for the purpose of allowing the first model to extract enough features to confuse the third model, that is, to align the source as much as possible Domain and target domain features, in this step, a typical loss function can be expressed as the following formula (6):
  • equation (6) is only a schematic representation of a loss function in the embodiment of the present application, and an appropriate loss function can be selected according to actual application requirements, which is not limited here.
  • this part is to invert the domain label, that is, 0 becomes 1, and 1 becomes 0. This is to confuse the third model so that it predicts the source domain as the target domain and the target domain as the source domain.
  • the first computing node will further use G s ' and T s ' as new G s and T s respectively (that is, the fifth model parameter value and the sixth model parameter value as the new first model parameter value and the new The second model parameter value), repeat the above steps 402-407 until the iteration termination condition is reached.
  • the iteration termination condition may be reaching a preset training round, or making the loss function converge, or other set training. Termination conditions are not specifically limited here. It should be noted here that, in this embodiment of the present application, it is not limited which computing node is the execution body for judging the iteration termination condition. For example, it may be the first computing node or the third computing node.
  • the execution body for judging the iteration termination condition may be the third computing node or the first computing node.
  • the third computing node receives the G s and T s uploaded by the first computing node for the 100th time (for example, it can be counted by a counter deployed on the third computing node), and the third computing node determines that the iteration termination condition is reached at this time, For another example, when the first computing node completes the training of the first model and the second model locally for the 100th time (similarly, it can be obtained by counting the counters deployed on the first computing node), the first computing node determines that at this time The iteration termination condition is reached.
  • a computing node eg, the first computing node
  • the computing node will further send the judgment result (that is, it is determined that the iteration termination condition is reached) to other compute nodes (eg, the third compute node).
  • the manner of how to determine whether the iteration termination condition is reached is similar to this, and details are not repeated below.
  • steps 402-407 are the adversarial training process, and the adversarial training process is repeated continuously until the iteration termination condition is reached, and finally the features of the source domain and the target domain are basically aligned.
  • the first computing node determines whether the iteration termination condition is reached based on the value of the loss function of the model (eg, the first model or the second model). And it is assumed that in step 407 of the current training round (for example, the 60th time), the value of the corresponding loss function when the first computing node locally trains the first model and the second model is higher than that of the previous round (that is, the 59th time). ) is large, indicating that the loss function of the model has converged during the last round of training. In this case, the model parameter value G s and model parameter value T s obtained from the last update are not the current training.
  • the model parameter value G s and the model parameter value T s obtained in the round, but the model parameter value G s and the model parameter value T s obtained in the previous training round are used as the G s-new and T s obtained by the last update.
  • the second computing node can send.
  • the first computing node determines whether the iteration termination condition is reached based on the value of the loss function of the model (eg, the first model or the second model) Similar operations are performed here, and details are not repeated below.
  • the second computing node uses G t-new and T s-new to execute the target task.
  • the second computing node After obtaining G t-new and T t-new , the second computing node will use the G t-new and T t-new as the final model parameter values of the first model and the second model on the second computing node respectively (because The features of the source domain and the target domain have been aligned, which makes sense), and perform the target task according to the first model and the second model on the second computing node.
  • the target task can be target detection task, classification Tasks, speech recognition tasks, semantic segmentation tasks, etc., as long as the tasks that can be performed by the neural network, can be used as the target tasks that can be performed by the second computing node of the present application.
  • steps 409 and 410 may also be unnecessary.
  • the above embodiments of the present application realize domain alignment through the adversarial training process in steps 402-407, thereby reducing the difference between the features extracted from the source domain data and the features extracted from the target domain data.
  • the distribution difference which narrows the distribution difference between the data features extracted from the first dataset and the data extracted from the second dataset, can be better utilized than the traditional federated learning without domain alignment.
  • the first data set on the first computing node assists the second data set on the second computing node to train the model, and the performance of the model obtained by training will be better; in the second aspect, since the present application aggregates the model parameter values of the model , and only the model parameter value or aggregate parameter value is passed, and does not involve the transmission of original data or data features, which is essentially different from traditional transfer learning and the existing federated transfer learning based on feature transfer. Therefore, can protect privacy.
  • the method provided by the embodiment of the present application realizes the collaborative training of the model and improves the performance of the model.
  • the adversarial training process is only performed on the first computing node.
  • the adversarial training process can also be performed on the second computing node. Therefore, this embodiment of the present application also provides a model training method based on federated transfer learning. Please refer to FIG. 5 for details.
  • Another schematic flowchart of the model training method for transfer learning The difference between the embodiment corresponding to FIG. 5 and the embodiment corresponding to FIG. 4 above is that the embodiment corresponding to FIG. 5 also adds a confrontation training part to the second computing node part.
  • the method may include the following steps:
  • the first computing node locally trains the first model and the second model, and sends the model parameter value G s of the first model and the model parameter value T s of the second model obtained by training to the third computing node.
  • the first computing node locally trains the third model to obtain the model parameter value D s of the third model, And send D s to the third computing node.
  • the second computing node trains the third model locally to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t , and send D t to the third computing node.
  • the third computing node aggregates D s and D t to obtain a first aggregation parameter value D-all.
  • the first computing node updates D s to D s ', and retrains the first model locally while keeping the model parameter values of the third model on the first computing node unchanged (that is, fixing D s '). and the second model, and send the model parameter value G s ' of the first model and the model parameter value T s ' of the second model obtained by training to the third computing node.
  • Steps 501-507 are similar to the above-mentioned steps 401-407. For details, please refer to the above-mentioned steps 401-407, which will not be repeated here.
  • the second computing node updates D t to D t ', and under the condition that the model parameter values of the third model on the second computing node remain unchanged (that is, fixed D t '), locally train the first model and the second model, and send the model parameter value G t ' of the first model and the model parameter value T t ' of the second model obtained by training to the third computing node.
  • the second computing node After receiving the first aggregation parameter value D-all (ie D s ') sent by the third computing node, the second computing node updates the model parameter value D t of the third model on the second computing node to D s ' (that is, update the fourth model parameter value to the first aggregated parameter value), and train the first model locally while keeping the model parameter value of the third model on the second computing node unchanged (ie, fix D s '). and the second model, and train the model parameter value G t ' of the first model (G t ' can be called the seventh model parameter value) and the model parameter value T t ' of the second model (T t ' can be called the is the eighth model parameter value) sent to the third computing node.
  • the purpose of fixing D t ' on the second computing node and training the first model and the second model locally is also to enable the first model to extract features that are sufficient to confuse the third model, that is, Try to align the features of the source and target domains.
  • a typical loss function can be expressed as the following formula (7):
  • this part is to invert the domain label, that is, 0 becomes 1, and 1 becomes 0. This is to confuse the third model so that it predicts the source domain as the target domain and the target domain as the source domain.
  • the third computing node aggregates G s ' and G t ' to obtain a second aggregation parameter value G-all.
  • the third computing node has received the model parameter value G s ' and the model parameter value T s ' from the first computing node, and received the model parameter value G t ' and the model parameter value T t ' from the second computing node, Next, the third computing node further aggregates G s ' and G t ' to obtain the second aggregation parameter value G-all.
  • the first computing node will further regard G-all and T s ' as new G s and T s respectively, and repeat the above steps 502-509 until the iteration termination condition is reached, and the iteration termination condition may be reaching a preset training
  • the number of rounds can also be to make the loss function converge, or it can be other set training termination conditions, which are not limited here.
  • the third computing node aggregates T s (which can be called T s-new ) obtained by the last update and T t ′ (which can be called T t-new ) obtained by the last update, to obtain a fourth aggregation parameter Value T-all.
  • the third computing node in step 507 will receive the last updated model parameter value G s sent by the first computing node (can be called G s-new ) and the model parameter value T s (can be called T s-new ), and the third computing node will also receive the last updated model parameter value sent by the second computing node in step 508 G t ' (can be called G t-new ) and model parameter values T t ' (can be called T t-new ), so the third compute node will aggregate T s-new with T t-new to get The fourth aggregation parameter value T-all.
  • the third computing node sends the fourth aggregation parameter value T-all and the last updated G-all to the second computing node.
  • the third computing node further sends the fourth aggregation parameter value T-all and the last updated G-all to the second computing node.
  • the second computing node executes the target task using G-all and T-all obtained by the last update.
  • the second computing node After obtaining the last updated G-all and T-all, the second computing node will use the G-all and T-all as the final model parameter values of the first model and the second model on the second computing node respectively (because The features of the source domain and the target domain have been aligned, which makes sense), and perform the target task according to the first model and the second model on the second computing node.
  • the target task can be target detection task, classification Tasks, speech recognition tasks, semantic segmentation tasks, etc., as long as the tasks that can be performed by the neural network, can be used as the target tasks that can be performed by the second computing node of the present application.
  • steps 511 to 513 may also be unnecessary.
  • an adversarial training process is also introduced on the second computing node as the target domain device, which can train models with better performance in some specific task scenarios.
  • FIG. 4 and FIG. 5 describe the case where there is one first computing node and a new third computing node is deployed. In this embodiment of the present application, it will continue to introduce that there are multiple first computing nodes. , and a model training method based on federated transfer learning when a new third computing node is deployed, please refer to FIG. 6 for details.
  • FIG. 6 for details.
  • FIG. 6 is another schematic flowchart of the model training method based on federated transfer learning provided by the embodiment of the present application , the scenario for this embodiment is to use the local data of multiple source domain devices (that is, multiple first computing nodes) (each first computing node has its own first data set) and labels to assist a local data
  • a target domain device with no or few labels ie, a second compute node
  • the method may include the following steps:
  • Each first computing node trains the first model and the second model locally, and sends the model parameter value G i of the first model and the model parameter value T i of the second model obtained from the training to the third computing node. send,
  • the process of locally training the first model and the second model by each first computing node is similar to the above-mentioned step 801. For details, please refer to the above-mentioned step 801, which will not be repeated here.
  • the third computing node aggregates all G i (ie, G 1 , ..., G n ) to obtain a second aggregation parameter value G-all.
  • the third computing node since the number of first computing nodes is n, the third computing node will receive G 1 , . . . , G n sent by each first computing node, and receive each first computing node T 1 , ..., T n sent by the node, and aggregate G 1 , ..., G n to obtain the second aggregation parameter value G-all.
  • G 1 there are many ways to aggregate G 1 , ..., G n , for example, it can be a simple average at the model parameter level, which can be specifically shown in the following formula (8) :
  • is used to characterize the model parameters of the first model
  • ⁇ G is the second aggregation parameter value G-all
  • G i is the model parameter value G i of the first model on the first computing node i .
  • the manner of aggregating G 1 , ⁇ , G n may also be weighted average, or other more complex aggregation methods, which are not specifically limited here. Since the present application aggregates model parameter values of a model, and only transmits model parameter values or aggregated parameter values, and does not involve the transmission of original data or data features, data privacy can be protected.
  • the first model on the second computing node may be initialized with the G-all.
  • each first computing node keeps its respective G i and T i unchanged (which may be referred to as fixed G i and T i ), locally train the third model to obtain a model of the third model parameter value D i , and each sends D i to the third computing node.
  • the process of locally training the third model is similar to the above-mentioned step 403 .
  • the above-mentioned step 403 please refer to the above-mentioned step 403 , which will not be repeated here.
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t , and send D t to the third computing node.
  • Step 605 is similar to the above-mentioned step 404.
  • Step 605 please refer to the above-mentioned step 404, which will not be repeated here.
  • all the first computing nodes may The domain labels of are set to 0, and different domain labels can also be assigned to them, so that the third model can also distinguish which first computing node the features of the input data come from.
  • step 604 and step 605 are executed simultaneously, which is not specifically limited here.
  • the third computing node aggregates all D i (ie D 1 , . . . , D n ) and D t to obtain a first aggregation parameter value D-all.
  • the third computing node After the third computing node respectively receives D 1 , . . . , D n sent by each first computing node and D t sent by the second computing node, it aggregates all D i and D t to obtain the Aggregate parameter value D-all. In this way, when the model parameter of the third model is assigned the first aggregated parameter value D-all, the third model has the ability to identify the data features on the first data set and the data features on the second data set at the same time.
  • the model parameters used to characterize the third model is the first aggregation parameter value D-all, is the model parameter value D i of the third model on the first computing node i . is the model parameter value D t of the third model on the second computing node.
  • the manner of aggregating D 1 , . . . , D n and D t may also be weighted average, or other more complex aggregation methods, which are not specifically limited here. Since the present application aggregates the model parameter values of the model, and only transmits the model parameter values or aggregated parameter values, and does not involve the transmission of original data or data features, data privacy can be protected.
  • Each first computing node updates D i to D i ' , and in the case of keeping the model parameter values of the third model on each first computing node unchanged (that is, fixing D i '), locally
  • the first model and the second model are retrained, and the model parameter value G i ' of the first model and the model parameter value T i ' of the second model obtained by the respective training are respectively sent to the third computing node.
  • G i ' and T i ' as new G i and T i respectively (that is, take G 1 ', ..., G n ' as new G 1 , ..., G n , set T 1 ', ..., G n , respectively , T n ′ as new T 1 , .
  • Each first computing node uses the respective G i ' and T i ' as new G i and T i , respectively.
  • the process of repeating the above steps 602-608 is similar to the above step 408. For details, please refer to the above step 408, but not here. To repeat.
  • steps 602-608 are the adversarial training process, and the adversarial training process is repeated continuously until the iteration termination condition is reached, and finally the features of multiple source domains and target domains are basically aligned.
  • the second computing node uses Gal-new and T t-new to execute the target task.
  • the second computing node After obtaining G all-new and T t-new , the second computing node will use the G all-new and T t-new as the final model parameter values of the first model and the second model on the second computing node respectively (because The features of the source domain and the target domain have been aligned, which makes sense), and perform the target task according to the first model and the second model on the second computing node.
  • the target task can be target detection task, classification Tasks, speech recognition tasks, semantic segmentation tasks, etc., as long as the tasks that can be performed by the neural network, can be used as the target tasks that can be performed by the second computing node of the present application.
  • steps 610 and 611 may also be unnecessary.
  • the embodiment of the present application realizes the collaborative training of the model and improves the performance of the model under the condition of taking into account the domain alignment and user data privacy.
  • this embodiment of the present application uses the local data of multiple source domain devices (that is, multiple first computing nodes) (each first computing node has its own first data set) and tags to assist a local data
  • the target domain device that is, the second computing node
  • the model parameter values of the model can be obtained based on various types of training data. Therefore, after training The model accuracy is higher.
  • the adversarial training process is only performed on the first computing node.
  • the adversarial training process can also be performed on the second computing node. Therefore, this embodiment of the present application also provides a model training method based on federated transfer learning. Please refer to FIG. 7 for details.
  • FIG. 7 Another schematic flowchart of the model training method for transfer learning, the difference between the embodiment corresponding to FIG. 7 and the embodiment corresponding to FIG. 6 above is that the embodiment corresponding to FIG. 7 also adds a confrontation training part in the second computing node part.
  • the method may include the following steps:
  • Each first computing node trains the first model and the second model locally, and sends the model parameter value G i of the first model and the model parameter value T i of the second model obtained from the training to the third computing node. send,
  • the third computing node aggregates all G i (ie, G 1 , ..., G n ) to obtain a second aggregation parameter value G-all.
  • each first computing node keeps its respective G i and T i unchanged (which may be referred to as fixed G i and T i ), locally train a third model to obtain a model of the third model parameter value D i , and each sends D i to the third computing node.
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t , and send D t to the third computing node.
  • the third computing node aggregates all D i (ie D 1 , . . . , D n ) and D t to obtain a first aggregation parameter value D-all.
  • Each first computing node updates D i to D i ′ , and under the condition that the model parameter values of the third model on each first computing node remain unchanged (that is, fixing D i ′), locally
  • the first model and the second model are retrained, and the model parameter value G i ' of the first model and the model parameter value T i ' of the second model obtained by the respective training are respectively sent to the third computing node.
  • Steps 701-708 are similar to the above-mentioned steps 601-608. For details, please refer to the above-mentioned steps 601-608, which will not be repeated here.
  • the second computing node updates D t to D t ', and under the condition that the model parameter values of the third model on the second computing node remain unchanged (that is, fixed D t '), locally train the first model and D t '. the second model, and send the model parameter value G t ' of the first model and the model parameter value T t ' of the second model obtained by training to the third computing node.
  • Step 709 is similar to the above-mentioned step 508. For details, please refer to the above-mentioned step 508, which will not be repeated here.
  • the third computing node aggregates all G i ' (ie G 1 ', ..., G n ') and G t ' to obtain an updated second aggregation parameter value G-all'.
  • the third computing node has received the model parameter value G t ' and the model parameter value T t ' from the first computing node. Next, the third computing node will further analyze all G i ' (ie G 1 ', ..., G n ) ') and G t ' to perform aggregation again to obtain the updated second aggregation parameter value G-all'.
  • the third computing node sends the fourth aggregation parameter value T-all and G-all' (which may be referred to as G all-new ) obtained by the last update to the second computing node.
  • the second computing node executes the target task using G-all' (ie, G all-new ) and T-all (ie, T t-new ) obtained by the last update.
  • Steps 711-714 are similar to the above-mentioned steps 510-513. For details, please refer to the above-mentioned steps 510-513, which will not be repeated here.
  • steps 712 to 714 may also be unnecessary.
  • an adversarial training process is also introduced on the second computing node as the target domain device, which can train models with better performance in some specific task scenarios.
  • the first computing node is one, and no new third computing node is deployed
  • the aggregation operation of the model is all completed on the new third computing node deployed. In some embodiments of the present application, it may also be completed by the second computing node serving as the target domain.
  • FIG. 8 is a schematic flowchart of a model training method based on federated transfer learning provided by this embodiment of the present application. The local data and labels of the first computing node) to assist a target domain device (ie, the second computing node) whose local data has no label or less label to improve the performance of the model.
  • the method may include the following steps:
  • the first computing node locally trains the first model and the second model to obtain the trained model parameter value G s of the first model and the model parameter value T s of the second model.
  • Step 801 is similar to the above-mentioned step 401, except that after step 801 obtains the model parameter value G s of the first model and the model parameter value T s of the second model after training, it is no longer uploaded to the third computing node, and the rest of the Please refer to the above step 401, which will not be repeated here.
  • the first model on the second computing node may be initialized with the G t .
  • the first computing node trains the third model locally while keeping G s and T s unchanged (which may be referred to as fixed G s and T s ) to obtain the model parameter value D s of the third model, And send D s to the second computing node.
  • Step 803 is similar to the above step 403, the difference is that after the model parameter value D s of the trained third model is obtained in step 803, it is not uploaded to the third computing node, but sent to the second computing node. For the rest, please refer to The above step 403 will not be repeated here.
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t .
  • Step 804 is similar to the above-mentioned step 404, the difference is that after the model parameter value D t of the trained third model is obtained in step 804, it does not need to be uploaded to the third computing node. For the rest, please refer to the above-mentioned step 404, which will not be repeated here. .
  • step 803 and step 804 are not limited. 803 and step 804 are executed simultaneously, which is not specifically limited here.
  • the second computing node aggregates D s and D t to obtain a first aggregation parameter value D-all.
  • Step 805 is similar to the above-mentioned step 405, except that the second computing node aggregates D s and D t to obtain the first aggregation parameter value D-all.
  • the second computing node aggregates D s and D t to obtain the first aggregation parameter value D-all.
  • the above-mentioned step 405 which is not omitted here. Repeat.
  • the first computing node updates D s to D s ', and retrains the first model locally while keeping the model parameter values of the third model on the first computing node unchanged (that is, fixing D s '). and the second model to obtain the model parameter value G s ' of the first model after training and the model parameter value T s ' of the second model.
  • Step 807 is similar to the above-mentioned step 407, except that after obtaining the model parameter value G s of the first model and the model parameter value T s of the second model after step 807, it is no longer uploaded to the third computing node, and the rest of the Please refer to the above step 407, which will not be repeated here.
  • Step 808 is similar to the above-mentioned step 408. For details, please refer to the above-mentioned step 408, which will not be repeated here.
  • the second computing node uses G t-new and T t-new to execute the target task.
  • Step 810 is similar to the above-mentioned step 410.
  • Step 810 please refer to the above-mentioned step 410, which will not be repeated here.
  • steps 809 and 810 may also be unnecessary.
  • the aggregation process of model parameter values is performed by the second computing node as the target domain device, which can reduce the number of computing nodes involved, reduce the time for data interaction between computing nodes, and improve the model. training efficiency.
  • the adversarial training process is only performed on the first computing node.
  • the adversarial training process can also be performed on the second computing node. Therefore, this embodiment of the present application also provides a model training method based on federated transfer learning. Please refer to FIG. 9 for details.
  • FIG. 9 Another schematic flowchart of the model training method of transfer learning, the difference between the embodiment corresponding to FIG. 9 and the embodiment corresponding to FIG. 8 above is that the embodiment corresponding to FIG. 9 also adds a confrontation training part in the second computing node part.
  • the method may include the following steps:
  • the first computing node locally trains the first model and the second model to obtain the trained model parameter value G s of the first model and the model parameter value T s of the second model.
  • the first computing node locally trains the third model to obtain the model parameter value D s of the third model, And send D s to the second computing node.
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t .
  • the second computing node aggregates D s and D t to obtain a first aggregation parameter value D-all.
  • the first computing node updates D s to D s ', and retrains the first model locally while keeping the model parameter values of the third model on the first computing node unchanged (that is, fixing D s '). and the second model to obtain the model parameter value G s ' of the first model after training and the model parameter value T s ' of the second model, and send G s ' to the second computing node.
  • Steps 901-907 are similar to the above-mentioned steps 801-807. For details, please refer to the above-mentioned steps 801-807, which will not be repeated here. The difference is that in step 807, the first computing node also needs to send G s ' to the second computing node.
  • the first model and the second model are locally trained to obtain the trained model parameter value G t ' of the first model and the model parameter value T t ' of the second model.
  • the second computing node aggregates G s ' and G t ' to obtain a second aggregation parameter value G-all.
  • Step 909 is similar to the above step 509, except that in step 909, the second computing node aggregates G s ' and G t ' to obtain the second aggregation parameter value G-all.
  • the second computing node aggregates G s ' and G t ' to obtain the second aggregation parameter value G-all.
  • Step 910 is similar to the above-mentioned step 510.
  • Step 910 is similar to the above-mentioned step 510.
  • the first computing node sends the T s (which may be referred to as T s-new ) obtained by the last update to the second computing node.
  • the second computing node aggregates T s-new and T t ' (which may be referred to as T t-new ) obtained by the last update to obtain a fourth aggregation parameter value T-all.
  • Step 912 is similar to the above-mentioned step 511, except that in step 912, the second computing node compares the T s (ie T s-new ) obtained by the last update with the T t ′ (ie, T t -new ) obtained by the last update. new ) to perform aggregation to obtain the fourth aggregation parameter value T-all.
  • the second computing node compares the T s (ie T s-new ) obtained by the last update with the T t ′ (ie, T t -new ) obtained by the last update. new ) to perform aggregation to obtain the fourth aggregation parameter value T-all.
  • the second computing node executes the target task using G-all and T-all obtained by the last update.
  • Step 913 is similar to the above-mentioned step 513.
  • Step 913 please refer to the above-mentioned step 513, which will not be repeated here.
  • steps 912 and 913 may also be unnecessary.
  • the aggregation process of model parameter values is performed by the second computing node as the target domain device, which can reduce the number of computing nodes involved, reduce the time for data interaction between computing nodes, and improve the model. training efficiency.
  • the adversarial training process is also introduced on the second computing node as the target domain device, which can train models with better performance in some specific task scenarios.
  • FIG. 8 and FIG. 9 describe the case where there is one first computing node and no new third computing node is deployed.
  • the first computing node is multiple , and the model training method based on federated transfer learning without deploying a new third computing node, please refer to FIG. 10 for details.
  • FIG. 10 for details.
  • FIG. 10 is another schematic flowchart of the model training method based on federated transfer learning provided by the embodiment of the present application , the scenario for this embodiment is to use the local data of multiple source domain devices (that is, multiple first computing nodes) (each first computing node has its own first data set) and labels to assist a local data Labeled or less labeled target domain devices (ie, second compute nodes) improve model performance.
  • the number of first computing nodes is n, and n ⁇ 2.
  • the method may include the following steps:
  • Each first computing node trains the first model and the second model locally to obtain the model parameter value G i of the first model and the model parameter value T i of the second model after the training respectively,
  • step 1001 obtains the model parameter value G i of the trained first model and the model parameter of the second model After the value T i is obtained, it will not be uploaded to the third computing node. For the rest, please refer to the above step 401, which will not be repeated here.
  • Each first computing node sends the respective obtained G i to the second computing node.
  • each first computing node After each first computing node obtains its own model parameter value G i , it will further send G i to the second computing node. In this way, the second computing node can receive G 1 , . . . , G n .
  • Step 1003 is similar to the above step 602, the difference is that the second computing node aggregates G 1 , ..., G n to obtain the second aggregation parameter value G-all, and uses G-all as the second calculation node Model parameter value G t of the first model on the node.
  • the second computing node aggregates G 1 , ..., G n to obtain the second aggregation parameter value G-all, and uses G-all as the second calculation node Model parameter value G t of the first model on the node.
  • each first computing node keeps its respective G i and T i unchanged (which may be referred to as fixed G i and T i ), locally train a third model to obtain a model of the third model parameter value D i , and each sends D i to the second computing node.
  • Step 1004 is similar to the above-mentioned step 604, except that in step 1004, after each first computing node obtains the model parameter value D i of the third model after training, it is not uploaded to the third computing node, but is sent to the second computing node. sent by the computing node. For the rest, please refer to the above step 604, which will not be repeated here.
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t .
  • Step 1005 is similar to the above step 605, the difference is that in step 1005, after the second computing node obtains the model parameter value D t of the trained third model, it does not need to be uploaded to the third computing node, and the rest please refer to the above step 605, It will not be repeated here.
  • the second computing node aggregates all D i (ie, D 1 , . . . , D n ) and D t to obtain a first aggregation parameter value D-all.
  • Step 1006 is similar to the above step 606, the difference is that the second computing node aggregates D 1 , . 606, which will not be repeated here.
  • Each first computing node updates D i to D i ' , and under the condition that the model parameter values of the third model on each first computing node are kept unchanged (that is, fixing D i '), locally Retrain the first model and the second model, respectively obtain the model parameter value G i ' of the first model after training and the model parameter value T i ' of the second model, and transfer the obtained G i ' and T i ' to the first model.
  • Two computing nodes send.
  • Each first computing node updates D i to D i ', and in the case of fixing D i ', the process of retraining the first model and the second model locally is similar to the above step 608, please refer to the above step 608 for details , which will not be repeated here. After that, each first computing node sends the respectively obtained G i ' and T i ' to the second computing node.
  • the second computing node aggregates the updated G 1 ', ..., G n ' to obtain the updated second aggregation parameter value G-all', and uses G-all' as the first value on the second computing node
  • Each first computing node sends the T i obtained by the last update to the second computing node.
  • each first computing node after reaching the iteration termination condition, each first computing node will use the last updated T i (that is, the last updated T 1 , . . . , T n ) ) are each sent to the second computing node.
  • G-all which can be called G all-new
  • the second computing node executes the target task using G-all (ie, G t-new ) and T t-new obtained by the last update.
  • Step 1013 is similar to the above-mentioned step 611. For details, please refer to the above-mentioned step 611, which will not be repeated here.
  • steps 1011 to 1013 may be unnecessary.
  • the embodiment of the present application realizes the collaborative training of the model and improves the performance of the model under the condition of taking into account the domain alignment and user data privacy.
  • this embodiment of the present application uses the local data of multiple source domain devices (that is, multiple first computing nodes) (each first computing node has its own first data set) and tags to assist a local data
  • the target domain device that is, the second computing node
  • the model parameter values of the model can be obtained based on various types of training data. Therefore, after training The model accuracy is higher.
  • the aggregation process of model parameter values is also performed by the second computing node as the target domain device, which can not only reduce the number of computing nodes involved, but also in some application scenarios without a server, the target domain
  • the device acts as a second computing node to aggregate the values of various model parameters, and also reduces the data interaction time between computing nodes and improves the efficiency of model training.
  • the adversarial training process is only performed on the first computing node.
  • the adversarial training process can also be performed on the second computing node. Therefore, this embodiment of the present application also provides a model training method based on federated transfer learning. Please refer to FIG. 11 for details. Another schematic flowchart of the model training method for transfer learning. The difference between the embodiment corresponding to FIG. 11 and the embodiment corresponding to FIG. 10 is that the embodiment corresponding to FIG. 11 also adds a confrontation training part in the second computing node part.
  • the method may include the following steps:
  • Each first computing node trains the first model and the second model locally to obtain the model parameter value G i of the first model and the model parameter value T i of the second model after the training respectively,
  • Each first computing node sends the respective obtained G i to the second computing node.
  • the second computing node aggregates all G i (that is, G 1 , ..., G n ) to obtain a second aggregation parameter value G-all.
  • Each first computing node trains a third model locally while keeping its respective G i and T i unchanged (which may be referred to as fixed G i and T i ) to obtain a model of the third model parameter value D i , and each sends D i to the second computing node.
  • the second computing node locally trains the third model to obtain a model of the third model under the condition that the model parameters of the first model on the second computing node keep G t unchanged (which may be referred to as fixed G t ). parameter value D t .
  • the second computing node aggregates all D i (ie D 1 , . . . , D n ) and D t to obtain a first aggregation parameter value D-all.
  • Each first computing node updates D i to D i ′ , and under the condition that the model parameter values of the third model on each first computing node are kept unchanged (that is, fixing D i ′), locally Retrain the first model and the second model, respectively obtain the trained model parameter value G i ' of the first model and the model parameter value T i ' of the second model, and respectively send G i ' to the second computing node.
  • Steps 1101-1108 are similar to the above-mentioned steps 1001-1008. For details, please refer to the above-mentioned steps 1001-1008, which will not be repeated here.
  • the second computing node updates D t to D t ' , and trains the first model and The second model is to obtain the model parameter value G t ' of the first model after training and the model parameter value T t ' of the second model.
  • Step 1109 is similar to the above step 709, the difference is that after the second computing node obtains the trained model parameter value G t ' and the model parameter value T t ' in step 1109, it does not need to be uploaded to the third computing node. For the rest, please refer to The above step 709 will not be repeated here.
  • the second computing node aggregates all G i ' (ie G 1 ', . . . , G n ') and G t ' to obtain an updated second aggregation parameter value G-all'.
  • Step 1110 is similar to the above-mentioned step 710, except that in step 1110, the second computing node aggregates G 1 ′, . all', for the rest, please refer to the above step 710, which will not be repeated here.
  • Step 1111 is similar to the above-mentioned step 711. For details, please refer to the above-mentioned step 711, which will not be repeated here.
  • Each first computing node sends the T i obtained by the last update (ie, T 1 , . . . , T n obtained by the last update) to the second computing node.
  • the second computing node aggregates the T i obtained by the last update of each first computing node and the T t ' (ie, T t-new ) obtained by the last update, to obtain a fourth aggregation parameter value T-all.
  • Step 1113 is similar to the above-mentioned step 712, except that in step 1113, the second computing node aggregates the T i obtained from the last update of each first computing node and the T t ' obtained from the last update to obtain the first
  • the four aggregation parameter values are T-all. For the rest, please refer to the above step 712, which will not be repeated here.
  • the second computing node executes the target task using G-all' (ie, G t-new ) and T-all obtained by the last update.
  • Step 1114 is similar to the above-mentioned step 714. For details, please refer to the above-mentioned step 714, which will not be repeated here.
  • steps 1112 to 1114 may be unnecessary.
  • the computing nodes may be various terminal devices or edge devices.
  • the computing nodes in the present application may include but are not limited to: smart phones (such as Laptop computer, personal computer (PC), tablet computer, tablet computer, ultrabook, wearable device (eg, smart bracelet, smart watch, smart glasses, head mount display , HMD), etc.), augmented reality (AR) devices, virtual reality (VR) devices, mixed reality (MR) devices, cellular phones, personal digital assistants (personal digital assistants) , PDA), digital broadcasting terminal, etc.
  • smart phones such as Laptop computer, personal computer (PC), tablet computer, tablet computer, ultrabook, wearable device (eg, smart bracelet, smart watch, smart glasses, head mount display , HMD), etc.), augmented reality (AR) devices, virtual reality (VR) devices, mixed reality (MR) devices, cellular phones, personal digital assistants (personal digital assistants) , PDA), digital broadcasting terminal, etc.
  • AR augmented reality
  • VR virtual reality
  • MR mixed reality
  • PDA personal digital assistants
  • the third computing node is generally a server, and the first computing node and the second computing node are generally edge devices.
  • FIG. 12 is a schematic flowchart of a data processing method provided by an embodiment of the present application. The method may specifically include the following steps:
  • the computer device acquires input data related to the target task.
  • the computer device acquires input data to be processed, which can be image data, audio data, or text data, and is specifically related to the target task to be performed.
  • input data can be image data, audio data, or text data
  • the target task is image-based classification task
  • the input data refers to the image data used for classification.
  • the computer device performs feature extraction on the input data through the trained first model to obtain a feature map.
  • the computer device After that, the computer device performs feature extraction on the input data through the trained first model to obtain a feature map corresponding to the input data.
  • the computer device processes the feature map through the trained second model to obtain output data.
  • the computer equipment processes the trained second model feature map to obtain output data, wherein the model parameter values of the trained first model and the model parameter values of the trained second model are described in the above embodiments method of training.
  • the target task is the target detection task
  • the target detection task is generally aimed at the detection of target objects in the image.
  • the input data generally refers to the input image.
  • the computer equipment first uses the trained first model to perform feature extraction on the input image, and then uses the trained first model.
  • the second model performs target detection on the extracted feature map to obtain the detection result, that is, the output data is the detection result.
  • the target task is a classification task
  • the classification task may be performed on images.
  • the input data refers to the input images.
  • the computer equipment first uses the trained first model to perform feature extraction on the input images, and then uses the trained first model to extract features from the input images.
  • the latter second model classifies the extracted feature map, and outputs the classification result, that is, the output data is the classification result of the image.
  • the classification task may be performed not only for images, but also for text or audio.
  • the input data refers to the corresponding text data or audio data
  • the output data refers to the classification of text. Result or classification result of audio.
  • FIG. 13 is a schematic structural diagram of a first computing node provided by an embodiment of the present application.
  • the first computing node 1300 includes a training module 1301 and an acquisition module 1302 .
  • the training module 1301 is used for the first model parameter values of the first model (eg, feature extractor) on the first computing node and the second model of the second model (eg, classifier) on the first computing node Under the condition that the parameter values remain unchanged, use the first data set on the first computing node to train a third model (eg, a domain discriminator, also referred to as a discriminator) on the first computing node to obtain The third model parameter value of the third model on the first computing node, wherein the first model parameter value is a model parameter value obtained after the first computing node trains the first model on the first computing node, and the second The model parameter value is a model parameter value obtained by the first computing node after training the second model on the first computing node.
  • a third model eg, a domain discriminator, also referred to as a discriminator
  • the first model is used to perform feature extraction on the input data; the second model is used to perform a target task based on the features extracted by the first model, for example, the target task may be a classification task (eg, a target detection task , semantic segmentation task, speech recognition task, etc.), or a regression task, which is not limited here; the third model is used to identify the source domain of the features extracted by the first model.
  • the computing node where the input data is located can be distinguished according to the data distribution of the source domain, for example, it is determined whether the acquired feature is from the source domain device or the target domain device.
  • the obtaining module 1302 is configured to receive a first aggregation parameter value, where the first aggregation parameter value is obtained based on a third model parameter value and a fourth model parameter value, and the fourth model parameter value is the third model on the second computing node
  • the third model on the second computing node is obtained by training the second computing node using the data set (which may be referred to as the second data set) on the second computing node.
  • the training module 1301 is further configured to update the original third model parameter value to the first aggregation parameter value, that is, update the model parameter value of the third model on the first computing node to the first aggregation parameter value, and in the Keeping the value of the first aggregation parameter unchanged, use the first data set to retrain the first model on the first computing node and the second model on the first computing node to obtain the first model on the first computing node.
  • the fifth model parameter value of the model and the sixth model parameter value of the second model on the first compute node is further configured to update the original third model parameter value to the first aggregation parameter value, that is, update the model parameter value of the third model on the first computing node to the first aggregation parameter value, and in the Keeping the value of the first aggregation parameter unchanged, use the first data set to retrain the first model on the first computing node and the second model on the first computing node to obtain the first model on the first computing node.
  • the first computing node 1300 may further include an iterative module 1303, which is configured to use the fifth model parameter value and the sixth model parameter value as the new first model parameter value and the new model parameter value.
  • the second model parameter value triggers the training module 1301 and the acquisition module 1302 to repeat their respective steps until the iteration termination condition is reached. It may be other set training termination conditions, which are not specifically limited here.
  • the obtaining module 1302 is specifically configured to: send the third model parameter value to the second computing node, so that the second computing node aggregates the third model parameter value and the fourth model parameter value, to obtain the first aggregation parameter value; after that, receive the first aggregation parameter value sent by the second computing node.
  • the first computing node 1300 further includes a sending module 1304, and the sending module 1304 is configured to send the third model parameter value to the second computing node, so that the second computing node sends the third model parameter value to the second computing node.
  • the third model parameter value and the fourth model parameter value are aggregated to obtain the first aggregated parameter value; the acquiring module 1302 is specifically configured to receive the first aggregated parameter value from the second computing node.
  • the sending module 1304 may also be configured to send the updated first model parameter value and the updated second model parameter value to the second computing node.
  • the sending module 1304 may also be configured to send the third model parameter value to the third computing node, so that the third computing node sends the third model parameter value and the third model parameter value from the second computing node Aggregate the fourth model parameter value obtained from the third computing node to obtain the first aggregation parameter value; the obtaining module 1302 is specifically configured to obtain and receive the first aggregation parameter value sent by the third computing node.
  • the sending module 1304 may also be configured to: send the updated first model parameter value and the updated second model parameter value to the third computing node.
  • the information exchange, execution process and other contents among the modules/units in the first computing node 1300 provided in FIG. 13 are the same as those performed by the first computing node in the method embodiments corresponding to FIGS. 4 to 11 in this application.
  • the steps are based on the same idea, and the specific content can refer to the descriptions in the method embodiments shown in the foregoing application, which will not be repeated here.
  • FIG. 14 is a schematic structural diagram of a second computing node provided by an embodiment of the present application.
  • the second computing node 1400 includes: a first acquisition module 1401 and a training module 1402, wherein the first acquisition module 1401 is used to acquire a second aggregation parameter value, the second aggregation parameter value is based on the respectively trained on one or more first computing nodes.
  • the first model parameter value of the first model is obtained, wherein each first computing node uses its own first data set to train its own first model, and the first data set may be a labeled data set.
  • the training module 1402 is configured to, when the model parameter of the first model on the second computing node takes the value of the second aggregation parameter, use the second data set on the second computing node to perform the training of the first model on the second computing node.
  • the three models are trained to obtain the fourth model parameter values of the third model on the second computing node, wherein the first model is used to extract features from the input data, and the third model is used to identify the data extracted by the first model.
  • the source domain of the feature As an example, the computing node where the input data is located can be distinguished according to the data distribution of the source domain, for example, it is determined whether the acquired feature is from the source domain device or the target domain device.
  • the second computing node 1400 may further include an iterative module 1404, and the iterative module 1404 is configured to compare the first model parameter value and the second model parameter based on the first aggregated parameter value at the first computing node When the value is updated, trigger the first acquisition module 1401 and the training module 1402 to repeat their respective steps until the iteration termination condition is reached.
  • the iteration termination condition can be a preset training round or a loss function. Convergence may also be other set training termination conditions, which are not specifically limited here.
  • the second computing node 1400 may further include a second obtaining module 1403, where the second obtaining module 1403 is configured to: obtain a first aggregation parameter value, where the first aggregation parameter value is based on the third model parameter value and the fourth model parameter value are obtained, and the third model parameter value is the first computing node when the first model parameter value and the second model parameter value remain unchanged.
  • the model parameter value obtained by the model training, the second model parameter value is the model parameter value obtained by the first computing node using the first data set to train the second model on the first computing node, wherein the second model uses
  • the target task may be a classification task (eg, target detection task, semantic segmentation task, speech recognition task, etc.), or a regression task, which is not limited here.
  • the training module 1402 is specifically configured to update the value of the fourth model parameter to the value of the first aggregation parameter, and use the second data under the condition that the value of the fourth model parameter remains the same as the value of the first aggregation parameter Set to train the first model and the second model on the second computing node, and update the model parameter value of the first model on the second computing node and the model parameter value of the second model on the second computing node , the seventh model parameter value of the first model on the second computing node and the eighth model parameter value of the second model on the second computing node can be obtained.
  • the iteration module 1404 is specifically configured to trigger the first acquisition module 1401 , the training module 1402 and the second acquisition module 1403 to repeatedly perform their respective steps until the iteration termination condition is reached.
  • the first obtaining module 1401 is specifically configured to: receive the updated first model parameter values sent by each of the one or more first computing nodes, and use the seventh model parameter value (ie, The updated model parameter value of the first model on the second computing node) and each updated first model parameter value are aggregated to obtain the second aggregated parameter value.
  • the second computing node 1400 further includes an execution module 1405, and the execution module 1405 is configured to: update the second aggregation parameter value based on the updated first model parameter value; Each updated second model parameter value sent by the first computing node, and each updated second model parameter value and the last updated eighth model parameter value (that is, the updated second model parameter value) The model parameter value of the second model on the node) is aggregated to obtain a fourth aggregated parameter value; the target task is performed according to the first model on the second computing node and the second model on the second computing node, wherein, The model parameter value of the first model on the second computing node is the second aggregation parameter value obtained by the last update, and the model parameter value of the second model on the second computing node is the fourth aggregation parameter value.
  • the first obtaining module 1401 is further configured to: send the seventh model parameter value to the third computing node, and receive the second aggregated parameter value from the third computing node, the first The second aggregated parameter value is aggregated by the third computing node to the seventh model parameter value and each updated first model parameter value from one or more of the first computing nodes.
  • the execution module 1405 may also be configured to: send the value of the eighth model parameter obtained by the last update to the third computing node, so that the third computing node has the same value for the eighth model parameter value and the second model parameter value obtained from each last update received from one or more of the first computing nodes, respectively, to obtain a fourth aggregated parameter value; receiving the fourth computing node from the third computing node Aggregate parameter value; perform the target task according to the first model and the second model on the second computing node, wherein the model parameter value of the first model on the second computing node is the second aggregation parameter obtained by the last update The value of the model parameter of the second model on the second computing node is the value of the fourth aggregation parameter.
  • the information exchange, execution process, etc. among the modules/units in the second computing node 1400 provided in FIG. 14 are the same as those performed by the first computing node in the method embodiments corresponding to FIGS. 4 to 11 in this application.
  • the steps are based on the same idea, and the specific content can refer to the descriptions in the method embodiments shown in the foregoing application, which will not be repeated here.
  • FIG. 15 is a schematic structural diagram of the computer device provided by the embodiment of the present application.
  • the computer device 1500 includes an acquisition module 1501 , a feature extraction module 1502 and a processing module. 1503, wherein, the acquisition module 1501 is used to acquire the input data related to the target task; the feature extraction module 1502 is used to perform feature extraction on the input data through the trained first model to obtain a feature map; the processing module 1503, For processing the feature map by the trained second model to obtain output data, wherein the model parameter value of the trained first model and the model parameter value of the trained second model can be obtained from the above Figure 4
  • the model training method corresponding to Figure 11 is obtained by training.
  • FIG. 16 is a schematic structural diagram of a device provided by an embodiment of the present application. For the convenience of description, only the part related to the embodiment of the present application is shown. If the specific technical details are not disclosed, please refer to the implementation of the present application. Example Methods section.
  • the modules described in the corresponding embodiment of FIG. 13 may be deployed on the device 1600 to implement the functions of the first computing node 1300 in the corresponding embodiment of FIG.
  • the device 1600 when used as a second computing node, the device 1600 may be deployed with the modules described in the corresponding embodiment of FIG. 14 to implement the functions of the second computing node 1400 in the corresponding embodiment of FIG. 14; when the device 1600 is used as a computer device , the modules described in the corresponding embodiment of FIG. 15 may be deployed on the device 1600 to implement the functions of the computer device 1500 in the corresponding embodiment of FIG. 15 .
  • the device 1600 is implemented by one or more servers, and the device 1600 may vary greatly due to different configurations or performances, and may include one or more central processing units (CPU) 1622 and memory 1632.
  • One or one or more storage media 1630 eg, one or more mass storage devices that store applications 1642 or data 1644.
  • the memory 1632 and the storage medium 1630 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1630 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the device 1600 .
  • the central processing unit 1622 may be configured to communicate with the storage medium 1630 to execute a series of instruction operations in the storage medium 1630 on the device 1600.
  • Device 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input and output interfaces 1658, and/or, one or more operating systems 1641, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and many more.
  • operating systems 1641 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM and many more.
  • the central processing unit 1622 when the device 1600 is used as the first computing node, the central processing unit 1622 is configured to perform the steps performed by the first computing node in the embodiments corresponding to FIG. 4 to FIG. 11 .
  • the central processing unit 1622 may be configured to: a first model parameter value of a first model (eg, a feature extractor) on a first computing node and a first model parameter value of a second model (eg, a classifier) on the first computing node Under the condition that the parameter values of the second model remain unchanged, the first data set on the first computing node is used to train the third model (eg, the domain discriminator, which may also be referred to as a discriminator for short) on the first computing node, to obtain a third model parameter value of the third model on the first computing node, where the first model parameter value is a model parameter value obtained after the first computing node has trained the first model on the first computing node,
  • the second model parameter value is a model parameter value obtained by the first computing node
  • the first model is used to perform feature extraction on the input data; the second model is used to perform a target task based on the features extracted by the first model, for example, the target task may be a classification task (eg, a target detection task , semantic segmentation task, speech recognition task, etc.), or a regression task, which is not limited here; the third model is used to identify the source domain of the features extracted by the first model.
  • the target task may be a classification task (eg, a target detection task , semantic segmentation task, speech recognition task, etc.), or a regression task, which is not limited here
  • the third model is used to identify the source domain of the features extracted by the first model.
  • a first aggregation parameter value is received, where the first aggregation parameter value is obtained based on a third model parameter value and a fourth model parameter value, where the fourth model parameter value is a model parameter value of the third model on the second computing node , and the third model on the second computing node is trained by the second computing node using the second data set on the second computing node.
  • the original third model parameter value to the first aggregation parameter value, that is, update the model parameter value of the third model on the first computing node to the first aggregation parameter value, and keep the first aggregation parameter
  • the first data set is used to retrain the first model on the first computing node and the second model on the first computing node to obtain the fifth model of the first model on the first computing node.
  • the parameter value and the sixth model parameter value of the second model on the first compute node is used to retrain the first model on the first computing node and the second model on the first computing node to obtain the fifth model of the first model on the first computing node.
  • the fifth model parameter value and the sixth model parameter value are used as the new first model parameter value and the new second model parameter value, and the above steps are triggered to be repeated until the iteration termination condition is reached, and the iteration termination condition may be reached.
  • the preset training rounds may also be to make the loss function converge, or may be other set training termination conditions, which are not specifically limited here.
  • central processing unit 1622 can also be used to execute any step performed by the first computing node in the method embodiments corresponding to FIG. 4 to FIG. 11 in this application.
  • the central processing unit 1622 can also be used to execute any step performed by the first computing node in the method embodiments corresponding to FIG. 4 to FIG. 11 in this application.
  • the central processing unit 1622 can also be used to execute any step performed by the first computing node in the method embodiments corresponding to FIG. 4 to FIG. 11 in this application.
  • the central processing unit 1622 can also be used to execute any step performed by the first computing node in the method embodiments corresponding to FIG. 4 to FIG. 11 in this application.
  • the central processing unit 1622 when the device 1600 is used as the second computing node, the central processing unit 1622 is configured to perform the steps performed by the second computing node in the embodiments corresponding to FIG. 4 to FIG. 11 .
  • the central processing unit 1622 may be configured to: obtain a second aggregation parameter value, where the second aggregation parameter value is obtained based on the first model parameter values of the first models trained on the one or more first computing nodes, wherein, Each first computing node uses its own first data set to train its own first model, and the first data set may be a labeled data set.
  • the model parameter of the first model on the second computing node takes the value of the second aggregation parameter
  • use the second data set on the second computing node to train the third model on the second computing node , to obtain the fourth model parameter value of the third model on the second computing node, where the first model is used for feature extraction on the input data, and the third model is used to identify the source domain of the features extracted by the first model .
  • the iteration termination condition can be reaching a preset training round, or making the loss function converge, or other set training termination conditions. Do limit.
  • central processing unit 1622 can also be used to execute any step performed by the second computing node in the method embodiments corresponding to FIGS. 4 to 11 in this application.
  • the central processing unit 1622 can also be used to execute any step performed by the second computing node in the method embodiments corresponding to FIGS. 4 to 11 in this application.
  • the central processing unit 1622 can also be used to execute any step performed by the second computing node in the method embodiments corresponding to FIGS. 4 to 11 in this application.
  • the central processing unit 1622 can also be used to execute any step performed by the second computing node in the method embodiments corresponding to FIGS. 4 to 11 in this application.
  • the central processing unit 1622 when the device 1600 is used as a computer device, the central processing unit 1622 is configured to execute the steps performed by the computer device in the embodiment corresponding to FIG. 12 .
  • the central processing unit 1622 can be used to: obtain input data to be processed, the input data is related to the target task to be performed, for example, when the target task is a classification task, the input data refers to the data used for classification.
  • feature extraction is performed on the input data by the trained first model to obtain a feature map
  • the feature map is processed by the trained second model to obtain output data, wherein the trained first model
  • the model parameter value of , and the model parameter value of the trained second model are obtained by training using the method described in any one of the above-mentioned FIG. 4 to FIG. 11 .
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • U disk U disk
  • mobile hard disk ROM
  • RAM random access memory
  • disk or CD etc.
  • a computer device which can be a personal computer, training equipment, or network equipment, etc. to execute the methods described in the various embodiments of the present application.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be retrieved from a website, computer, training device, or data
  • the center transmits to another website site, computer, training equipment, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a training device, a data center, or the like that includes an integration of one or more available media.
  • the available media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, high-density digital video discs (DVDs)), or semiconductor media (eg, solid state disks) , SSD)) etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种基于联邦迁移学习的模型训练方法及计算节点,可应用于人工智能领域中,该方法包括:在各源域上通过本地有标签数据训练各自的特征提取器的模型参数G和子任务模型(如分类器)的模型参数T,再将所有G发送给目标域,并在各源域上训练各自域鉴别器的模型参数D1,在目标域上训练域鉴别器的模型参数D2,将所有D1和D2在服务器端或目标域端聚合得到聚合参数值D,并将D发送给各个源域,每个源域通过各自的特征提取器和鉴别器进行多次迭代的对抗训练。本申请通过对抗训练过程实现域对齐,并且域之间互相传递的仅是模型参数值,不传递数据或数据特征,保护了数据隐私,在兼顾域对齐和数据隐私的情况下实现了对模型的协同训练。

Description

一种基于联邦迁移学习的模型训练方法及计算节点
本申请要求于2021年3月31日提交中国专利局、申请号为202110350001.9、申请名称为“一种基于联邦迁移学习的模型训练方法及计算节点”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种基于联邦迁移学习的模型训练方法及计算节点。
背景技术
联邦学习(federated learning,FL)又称为联邦机器学习、联合学习、联盟学习等,其能有效帮助多个计算节点在满足用户隐私保护、数据安全和政府法规的要求下,进行数据使用和机器学习建模;迁移学习(transfer learning,TL)是把为任务A开发的模型作为初始点,重新使用在为任务B开发模型的过程中,也就是把基于已有任务训练好的模型学习到的知识迁移到新的任务中来帮助该模型进行再训练。
业界目前已有几种基于联邦学习/迁移学习的模型训练方法,一种联邦学习方法称为联邦平均(federated averaging,FedAvg),FedAvg结构上一般包括一个服务器和一些客户端,适用于各个客户端的数据都有标签的场景,技术流程主要包括模型下发和模型聚合过程,在模型下发过程,客户端从服务器下载模型,在本地数据上训练,训练到一定程度后上传模型到服务器;模型聚合过程,服务器会收集各个客户端上传的模型,并进行模型融合,这两个过程会反复迭代直至模型收敛。一种迁移学习方法则称为对抗判别域适应(adversarial discriminative domain adaptation,ADDA),ADDA的特点是从源域数据提取的特征和从目标域数据提取的特征会合并到一起作为训练数据的特征,结构上一般包括特征提取器和一个鉴别器,其中,特征提取器负责提取训练数据的特征,希望提取到的特征能够迷惑鉴别器,使鉴别器无法分辨该特征属于源域还是目标域,鉴别器则需要尽量保证自己能区分来自特征提取器的特征是属于哪个域,两者相互对抗,在迭代训练过程中互相提高,从而实现对源域和目标域的对齐。
但上述方案都存在缺陷,FedAvg能够很好地保护用户隐私,但是因为没有对源域数据和目标域数据做对齐,所以当不同域之间的数据存在分布差异时,模型性能比较差;ADDA则与FedAvg相反,其考虑了域对齐,但由于将从不同域提取的数据特征合并到一起再训练,传递的特征本身还是在一定程度上泄露了数据隐私。基于此,一种既能实现域对齐,又能保护数据隐私的联邦迁移学习的模型训练方法亟待推出。
发明内容
本申请实施例提供了一种基于联邦迁移学习的模型训练方法及计算节点,用于利用第一计算节点上第一数据集辅助第二计算节点上的第二数据集对模型进行训练,实现域对齐,并且在计算节点之间传递的仅仅是模型的参数值,不传递数据或数据特征,充分保护 了用户数据隐私,因此在兼顾域对齐和用户数据隐私的情况下,本申请实施例实现了对模型的协同训练,提高了模型的性能。
基于此,本申请实施例提供以下技术方案:
第一方面,本申请实施例首先提供一种基于联邦迁移学习的模型训练方法,可用于人工智能领域中,例如计算机视觉领域,该方法包括:首先,第一计算节点在第一计算节点上第一模型(如,特征提取器)的第一模型参数值(可用G s表示)和第一计算节点上的第二模型(如,分类器)的第二模型参数值(可用T s表示)保持不变的情况下(也可称为固定G s和固定T s),采用该第一计算节点上的第一数据集对第一计算节点上的第三模型(如,域鉴别器,也可简称为鉴别器)进行训练,以得到该第一计算节点上的第三模型的第三模型参数值(可用D s表示),其中,该第一模型参数值G s为第一计算节点对第一模型训练后得到的模型参数值,第二模型参数值T s为第一计算节点对第二模型训练后得到的模型参数取值。在本申请实施例中,第一模型用于对输入数据进行特征提取;第二模型用于基于第一模型提取出的特征执行目标任务,例如,目标任务可以是分类任务(如,目标检测任务、语义分割任务、语音识别任务等),也可以是回归任务,此处不做限定;第三模型用于鉴别由第一模型提取出的特征的源域。作为一种示例,根据源域的数据分布可以区分特征对应的输入数据所位于的计算节点,例如,判断获取到的特征是来自源域设备,还是来自目标域设备。之后,第一计算节点将接收第一聚合参数值(可用D-all表示),该第一聚合参数值D-all是基于第三模型参数值D s和第四模型参数值(可用D t表示)得到,该第四模型参数值D t为第二计算节点上的第三模型的模型参数取值,该第二计算节点上的第三模型由该第二计算节点采用第二计算节点上的数据集(可称为第二数据集)训练得到。第一计算节点在得到了第一聚合参数值D-all后,会将原来的第三模型参数值D s更新为该第一聚合参数值D-all,并在保持第三模型参数值D s取第一聚合参数值D-all不变的情况下(可称为固定D s=D-all),采用第一数据集对第一计算节点上的第一模型和第一计算节点上的第二模型再进行训练,得到第一计算节点上的第一模型的第五模型参数值(可用G s’表示)和第一计算节点上的第二模型的第六模型参数值(可用T s’表示)。最后,第一计算节点将第五模型参数值G s’和第六模型参数值T s’作为新的第一模型参数值G s和新的第二模型参数值T s
需要注意的是,在本申请的一些实施方式中,第一计算节点还可以重复执行上述步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
还需要注意的是,在本申请实施例中,第一计算节点上的第一数据集可以是有标签的数据集,第二计算节点上的第二数据集是无标签或少标签的数据集。此外,在本申请实施例中,第一计算节点可以是一个,也可以是多个,当第一计算节点是多个的情况时,那么每个计算节点上进行的操作是类似的,此处不予赘述。
在本申请上述实施方式中,具体阐述了第一计算节点侧训练各个模型并得到各模型的模型参数值的过程,在该训练过程中,第一计算节点与第二计算节点之间互相传递的仅是模型参数值,不传递数据或数据特征,保护了数据隐私;并且基于传递的模型参数值,使 得第一计算节点上的第一模型从第一数据集提取的特征与第二计算节点上的第一模型从第二数据集提取的特征的分布差异在迭代训练过程中逐步缩小,从而实现域对齐。因此,本申请上述实施方式在兼顾域对齐和数据隐私的情况下实现了对模型的协同训练。
在第一方面的一种可能的设计中,第一计算节点接收第一聚合参数值D-all,且第一聚合参数值D-all基于第三模型参数值D s以及第四模型参数值D t得到的过程可以是:首先,第一计算节点将第三模型参数值D s向第二计算节点发送,以使得第二计算节点将第三模型参数值D s和第四模型参数值D t进行聚合,以得到第一聚合参数值D-all;之后,第一计算节点再接收由第二计算节点发送的第一聚合参数值D-all。需要注意的是,如果第一计算节点为一个,第一聚合参数值D-all就是由一个第三模型参数值D s与第四模型参数值D t聚合得到;如果第一计算节点为多个,第一聚合参数值D-all就是由多个第三模型参数值D s(即每个第一计算节点各自有一个对应的第三模型参数值D s,可分别用D 1、……、D n表示)与第四模型参数值D t进行聚合。
在本申请上述实施方式中,第三模型参数值D s与第四模型参数值D t的聚合过程由第二计算节点执行,这种情况下无需额外部署新的计算节点,节约了成本,扩大了可应用本申请所提供的模型训练方法的场景。
在第一方面的一种可能的设计中,第二聚合参数值G-all基于第一模型参数值G s得到的过程可以基于第一计算节点是一个还是多个分为两种情况:1)在第一计算节点为一个的情况下,第一计算节点会将该第一计算节点上的该第一模型参数值G s(即一个G s)向第二计算节点发送,这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,那么每个第一计算节点会将各自得到的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)向第二计算节点发送,这种情况下,第二计算节点会对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
在本申请上述实施方式中,具体阐述了在不部署新的计算节点的情况下,当第一计算节点分别为一个或多个的情况时,第二聚合参数值G-all是如何得到的,具备灵活性。
在第一方面的一种可能的设计中,该方法还包括:第一计算节点将更新得到的第一模型参数值和更新得到的第二模型参数值向该第二计算节点发送。若第一计算节点是重复执行上述步骤,直至达到迭代终止条件,那么在第一计算节点重复执行上述步骤,直至达到迭代终止条件后,该方法就还包括:第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第二模型参数值向该第二计算节点发送。
需要注意的是,在本申请实施例中,由于第一计算节点可以是一个,也可以是多个,若是迭代多次的情况,那么第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第二模型参数值向第二计算节点发送具体可分为两种情况:1)在第一计算节点为一个的情况下,第一计算节点会将最后一次更新得到的第二模型参数值T s向第二计算节点发送,以使得该第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为最后一次更新 得到的第二模型参数值T s;2)在第一计算节点为多个的情况下,每个第一计算节点各自将最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示)向第二计算节点发送,以使得第二计算节点对各个最后一次更新得到的第二模型参数值T 1、……、T n进行聚合,以得到第三聚合参数值(可用Ts-all表示),并进一步使得第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第三聚合参数值Ts-all。
在本申请上述实施方式中,在不部署新的计算节点的情况下,第一计算节点完成迭代训练后,会将最后一次更新得到的第二模型参数值向第二计算节点发送,从而可使得第二计算节点基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
在第一方面的一种可能的设计中,第一计算节点接收第一聚合参数值D-all,且第一聚合参数值D-all基于第三模型参数值D s以及第四模型参数值D t得到的过程还可以是:第一计算节点将第三模型参数值D s向第三计算节点发送,同时第二计算节点也会将第四模型参数值D t向第三计算节点发送,以使得第三计算节点将第三模型参数值D s以及来自第二计算节点的第四模型参数值D t进行聚合,以得到第一聚合参数值D-all;之后,第一计算节点接收由第三计算节点发送的第一聚合参数值D-all。
在本申请上述实施方式中,第三模型参数值D s与第四模型参数值D t的聚合过程可以由额外部署的一个第三计算节点执行,降低了第二计算节点的计算开销,提高了第二计算节点的计算速度。
在第一方面的一种可能的设计中,第二聚合参数值G-all基于第一模型参数值G s得到的过程可以根据第一计算节点是一个还是多个分为两种情况:1)在第一计算节点为一个的情况下,第一计算节点会将该第一计算节点上的该第一模型参数值G s(即一个G s)向第三计算节点发送,再由该第三计算节点将该第一模型参数值G s向第二计算节点发送,这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,那么每个第一计算节点会将各自得到的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)向第三计算节点发送,以使得该第三计算节点对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all,并由该第三计算节点将得到的第二聚合参数值G-all向第二计算节点发送。
在本申请上述实施方式中,具体阐述了在部署新的计算节点(即第三计算节点)的情况下,当第一计算节点分别为一个或多个的情况时,第二聚合参数值G-all是如何得到的,具备灵活性。
在第一方面的一种可能的设计中,第一计算节点将更新得到的第一模型参数值和更新得到的第二模型参数值向第三计算节点发送。若第一计算节点是重复执行上述步骤,直至达到迭代终止条件,那么在第一计算节点重复执行上述步骤,达到迭代终止条件后,该方法还包括:第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第 二模型参数值向第三计算节点发送。需要注意的是,在本申请实施例中,由于第一计算节点可以是一个,也可以是多个,若是迭代多次的情况,那么第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第二模型参数值向第三计算节点发送具体可分为两种情况:1)在第一计算节点为一个的情况下,第一计算节点会将最后一次更新得到的第二模型参数值T s向第三计算节点发送,再由该第三计算节点将最后一次更新得到的第二模型参数值T s向第二计算节点发送,以使得该第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为最后一次更新得到的第二模型参数值T s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,每个第一计算节点各自将最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示)向第三计算节点发送,以使得第三计算节点对各个最后一次更新得到的第二模型参数值T 1、……、T n进行聚合,以得到第三聚合参数值Ts-all,再由该第三计算节点将第三聚合参数值Ts-all向第二计算节点发送,使得第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第三聚合参数值Ts-all。
在本申请上述实施方式中,在部署有新的计算节点(即第三计算节点)的情况下,第一计算节点完成迭代训练后,会将最后一次更新得到的第二模型参数值向第三计算节点发送,由第三计算节点直接转发或聚合后转发给第二计算节点,从而使得第二计算节点基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点各自利用本地数据集协同训练得到的,因此模型性能得到了提高。
第二方面,本申请实施例还提供一种基于联邦迁移学习的模型训练方法,可用于人工智能领域中,例如计算机视觉领域,该方法包括:首先,第二计算节点获取第二聚合参数值G-all,该第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值G s得到,其中,每个第一计算节点各自采用自身上的第一数据集对自身上的第一模型进行训练,第一数据集可以是有标签的数据集,第一模型用于对输入数据进行特征提取。之后,第二计算节点在第二计算节点上的第一模型的模型参数取值为第二聚合参数值G-all的情况下,采用第二计算节点上的第二数据集对第二计算节点上的第三模型进行训练,以得到第二计算节点上的第三模型的第四模型参数值D t,其中,第三模型用于鉴别由第一模型提取出的特征的源域。作为一种示例,根据源域的数据分布可以区分所述输入数据所位于的计算节点,例如,判断获取到的特征是来自源域设备,还是来自目标域设备。
需要注意的是,在本申请的一些实施方式中,第二计算节点还可以重复执行上述步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
需要注意的是,在本申请实施例中,当第一计算节点为一个时,第一模型参数值可直 接用G s表示,第二模型参数值可直接用T s表示;当第一计算节点为多个(假设为n个,n≥2)时,那么每个第一计算节点会得到各自对应第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)以及各自对应的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示)。
在本申请上述实施方式中,具体阐述了第二计算节点侧训练各个模型并得到各模型的模型参数值的过程,在该训练过程中,第二计算节点与第一计算节点之间互相传递的仅是模型参数值,不传递数据或数据特征,保护了数据隐私;并且基于传递的模型参数值,使得第二计算节点上的第一模型从第二数据集提取的特征与第一计算节点上的第一模型从第一数据集提取的特征的分布差异在迭代训练过程中逐步缩小,从而实现域对齐。因此,本申请上述实施方式在兼顾域对齐和数据隐私的情况下实现了对模型的协同训练,可以得到在目标任务上表现优异的模型。
在第二方面的一种可能的设计中,该方法还可以包括:第二计算节点还可以进一步获取第一聚合参数值D-all,该第一聚合参数值D-all基于第三模型参数值D s(一个或多个)以及第四模型参数值D t得到,第三模型参数值D s为每个第一计算节点各自采用自身上的第一数据集对自身上的第三模型进行训练得到的模型参数取值。之后,第二计算节点将第四模型参数值D t更新为第一聚合参数值D-all,并在保持第一聚合参数值D-all不变的情况下(即固定D-all),采用第二数据集对第二计算节点上的第一模型和第二计算节点上的第二模型进行训练,以得到第二计算节点上的第一模型的第七模型参数值(可用G t’表示)和第二计算节点上的第二模型的第八模型参数值(可用T t’表示)。
需要注意的是,若是迭代多次的情况,那么上述过程是在第二计算节点重复执行上述步骤,直至达到迭代终止条件之前执行。
在本申请实施例中,在不部署新的计算节点的情况下,阐述了第二计算节点侧也可以在固定第一聚合参数值D-all的情况下对该第二计算节点上的第一模型和第二模型进行训练,也就是不仅在第一计算节点进行对抗训练,在第二计算节点上也进行对抗训练,从而使得从第一数据集上提取的特征和从第二数据集上提取的特征更快、更好地实现域对齐,提高训练速度和效果。
在第二方面的一种可能的设计中,第二计算节点获取第一聚合参数值D-all,且第一聚合参数值D-all基于第三模型参数值D s以及第四模型参数值D t得到的过程可以是:首先,第二计算节点接收由一个或多个第一计算节点各自发送的第三模型参数值D s(一个或多个);之后,第二计算节点再将第四模型参数值D t和每个第三模型参数值D s进行聚合,以得到第一聚合参数值D-all。
在本申请上述实施方式中,具体阐述了在不部署新的计算节点的情况下,从第二计算节点侧说明第二聚合参数值G-all是如何得到的,具备灵活性。
在第二方面的一种可能的设计中,第二计算节点获取第二聚合参数值G-all,且第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值G s得到的过程可以是:第二计算节点接收由一个或多个第一计算节点各自发送的更新后的第一模型参数值G s(一个或多个),并将第七模型参数值G t’(也就是更新后的第二计算节 点上的第一模型的模型参数值)和每个更新的第一模型参数值G s进行聚合,以得到第二聚合参数值G-all。由于第一计算节点可以是一个,也可以是多个,因此可分为两种情况:1)若第一计算节点为一个,则第二计算节点将第七模型参数值G t’和该一个第一模型参数值G s进行聚合,以得到第二聚合参数值G-all;若第二计算节点为多个(假设为n个,n≥2),则第二计算节点将第七模型参数值G t’和每个第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
在本申请上述实施方式中,在不部署新的计算节点且由第二计算节点进行对抗训练的的情况下,以得到第二聚合参数值的聚合过程是由该第二计算节点执行的,具备灵活性。
在第二方面的一种可能的设计中,该方法还可以包括:第二计算节点基于更新后的第一模型参数值,更新第二聚合参数值,并接收由一个或多个第一计算节点发送的更新后的第二模型参数值,并将每个更新后的第二模型参数值和更新后的第八模型参数值T t’(即更新后的第二计算节点上的第二模型的模型参数值)进行聚合,以得到第四聚合参数值(可用T-all表示)。之后,第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第四聚合参数值T-all。
需要注意的是,若是迭代多次的情况,那么该方法还可以包括:第二计算节点基于更新后的第一模型参数值,更新第二聚合参数值,并接收由一个或多个第一计算节点发送的最后一次更新得到的第二模型参数值,并将每个最后一次更新得到的第二模型参数值和更新后的第八模型参数值T t’(即更新后的第二计算节点上的第二模型的模型参数值)进行聚合,以得到第四聚合参数值(可用T-all表示)。具体地,1)在第一计算节点为一个的情况下,第二计算节点接收第一计算节点发送的最后一次更新得到的第二模型参数值T s,并将该最后一次更新得到的第二模型参数值T s和最后一次更新得到的第八模型参数值T t’进行聚合,以得到第四聚合参数值T-all;2)在第一计算节点为多个的情况下,第二计算节点接收每个第一计算节点各自发送的最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示),第二计算节点再将各个最后一次更新得到的第二模型参数值T 1、……、T n以及T t’进行聚合,以得到第四聚合参数值T-all。之后,第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第四聚合参数值T-all。
在本申请上述实施方式中,在不部署新的计算节点的情况下,具体阐述了在第二计算节点也进行了对抗训练后,第二计算节点会接收第一计算节点发送的最后一次更新得到的第二模型参数值,从而使得第二计算节点基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
在第二方面的一种可能的设计中,第二计算节点获取第二聚合参数值,且该第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过 程可以是:接收由每个第一计算节点各自发送的第一模型参数值G s,并对接收到的每个第一模型参数值进行聚合,以得到所述第二聚合参数值G-all。同样地,由于第一计算节点可以是一个,也可以是多个,因此可分为两种情况:1)在第一计算节点为一个的情况下,第二计算节点接收由第一计算节点发送的第一模型参数值G s(即一个G s),这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收由每个第一计算节点各自发送的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示),这种情况下,第二计算节点会对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
在本申请上述实施方式中,具体阐述了在不部署新的计算节点的情况下,从第二计算节点侧阐述当第一计算节点分别为一个或多个的情况时,第二聚合参数值G-all可以是由第二计算节点得到,具备灵活性。
在第二方面的一种可能的设计中,该方法还包括:1)在第一计算节点为一个的情况下,第二计算节点接收由第一计算节点发送的最后一次更新得到的第二模型参数值T s,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为最后一次更新得到的第二模型参数值T s;2)在第一计算节点为多个的情况下,第二计算节点接收由每个第一计算节点各自发送的最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示),并对每个最后一次更新得到的第二模型参数值T 1、……、T n进行聚合,以得到第三聚合参数值Ts-all,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为第三聚合参数值Ts-all。
需要注意的是,若是迭代多次的情况,那么上述过程是在第二计算节点重复执行上述步骤,直至达到迭代终止条件之后执行。
在本申请上述实施方式中,在不部署新的计算节点的情况下,第二计算节点完成迭代训练后,会接收由第一计算节点发送的最后一次更新得到的第二模型参数值,第二计算节点会基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点各自利用各自的本地数据集协同训练得到的,因此在保护数据隐私的同时,模型性能得到了提高。
在第二方面的一种可能的设计中,第二计算节点获取第一聚合参数值D-all,且该第一聚合参数值D-all基于第三模型参数值D s以及第四模型参数值D t得到过程还可以是:首先,第二计算节点向第三计算节点发送该第四模型参数值D t,之后,第二计算节点接收来自第三计算节点的第一聚合参数值D-all,该第一聚合参数值D-all由第三计算节点对来自一个或多个第一计算节点的每个第三模型参数值D s和来自第二计算节点的第四模型参数值D t聚合得到。
在本申请上述实施方式中,阐述第三模型参数值D s与第四模型参数值D t的聚合过程 由额外部署的一个第三计算节点执行,降低了第二计算节点的计算开销,提高了第二计算节点的计算速度。
在第二方面的一种可能的设计中,第二计算节点获取第二聚合参数值G-all,且第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过程可以是:首先,第二计算节点向第三计算节点发送该第七模型参数值G t’(即更新后的第二计算节点上的第一模型的模型参数值),之后,该第二计算节点接收来自该第三计算节点的第二聚合参数值G-all,该第二聚合参数值G-all由第三计算节点对该第七模型参数值G t’以及来自一个或多个第一计算节点的每个更新的第一模型参数值G s聚合得到。由于第一计算节点可以是一个,也可以是多个,因此可分为两种情况:1)在第一计算节点为一个的情况下,第二计算节点接收由第三计算节点转发的第一模型参数值G s(由第一计算节点先向第三计算节点发送,再由第三计算节点向第二计算节点转发),这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收由第三计算节点转发的第二聚合参数值G-all,所述第二聚合参数值由所述第三计算节点对每个第一模型参数值聚合得到,其中,每个第一模型参数值由每个第一计算节点各自向第三计算节点发送,即每个第一计算节点会将各自得到的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)向第三计算节点发送,该第三计算节点对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。在本申请实施例中,以得到的该G-all可进一步向第二计算节点发送。
在本申请上述实施方式中,具体阐述了在部署新的计算节点(即第三计算节点)且由第二计算节点进行对抗训练的情况下,以得到第二聚合参数值的聚合过程是由该第三计算节点执行的,具备灵活性。
在第二方面的一种可能的设计中,该方法还可以包括:第二计算节点将更新得到的第八模型参数值T t’(即更新后的第二计算节点上的第二模型的模型参数值)向第三计算节点发送,同时每个第一计算节点也各自将更新后的第二模型参数值向第三计算节点发送,由该第三计算节点将每个更新后的第二模型参数值和更新得到的第八模型参数值T t’进行聚合,以得到第四聚合参数值T-all。
若是多次迭代的情况,那么该方法还可以是:第二计算节点将最后一次更新得到的第八模型参数值T t’(即更新后的第二计算节点上的第二模型的模型参数值)向第三计算节点发送,同时每个第一计算节点也各自将最后一次更新得到的第二模型参数值向第三计算节点发送,由该第三计算节点将每个最后一次更新得到的第二模型参数值和最后一次更新得到的第八模型参数值T t’进行聚合,以得到第四聚合参数值T-all。具体地,1)在第一计算节点为一个的情况下,第三计算节点接收第一计算节点发送的最后一次更新得到的第二模型参数值T s,同时第三计算节点接收第二计算节点发送的最后一次更新得到的第八模型参数值T t’,并将该最后一次更新得到的第二模型参数值T s和最后一次更新得到的第八模型参数值T t’进行聚合,以得到第四聚合参数值T-all;2)在第一计算节点为多个的情况下,第三计算节点接收每个第一计算节点各自发送的最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示),同时第三计算节点接收第二计算节点发送的最 后一次更新得到的第八模型参数值T t’,第三计算节点再将各个最后一次更新得到的第二模型参数值T 1、……、T n以及T t’进行聚合,以得到第四聚合参数值T-all。之后,第二计算节点再接收由第三计算节点发送的第四聚合参数值T-all,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第四聚合参数值T-all。
在本申请上述实施方式中,在部署新的计算节点(即第三计算节点)的情况下,具体阐述了在第二计算节点也进行了对抗训练后,第二计算节点会接收第一计算节点发送的最后一次更新得到的第二模型参数值,从而使得第二计算节点基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
在第二方面的一种可能的设计中,第二计算节点获取第二聚合参数值G-all,且该第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过程可以是:第二计算节点接收来自第三计算节点的第二聚合参数值G-all,该第二聚合参数值G-all由第三计算节点对来自于一个或多个第一计算节点的每个第一模型参数值G s聚合得到。同样地,由于第一计算节点可以是一个,也可以是多个,因此可分为两种情况:1)在第一计算节点为一个的情况下,第二计算节点接收由第三计算节点转发的第一模型参数值G s(由第一计算节点先向第三计算节点发送,再由第三计算节点向第二计算节点转发),这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收由第三计算节点转发的第二聚合参数值G-all,所述第二聚合参数值由所述第三计算节点对每个第一模型参数值聚合得到,其中,每个第一模型参数值由每个第一计算节点各自向第三计算节点发送,即每个第一计算节点会将各自得到的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)向第三计算节点发送,该第三计算节点对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all,并由该第三计算节点将得到的第二聚合参数值G-all向第二计算节点发送。
在本申请上述实施方式中,具体阐述了在不部署新的计算节点的情况下,从第二计算节点侧阐述当第一计算节点分别为一个或多个的情况时,第二聚合参数值G-all可以是由第三计算节点得到,具备灵活性。
在第二方面的一种可能的设计中,该方法还包括:1)在第一计算节点为一个的情况下,第二计算节点接收由第三计算节点发送的最后一次更新得到的第二模型参数值T s,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,最后一次更新得到的第二模型参数值T s由第三计算节点从第一计算节点获取到,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为最后一次更新得到的第二模型参数值T s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收由第三计算节点发送的第三聚合 参数值Ts-all,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,第三聚合参数值Ts-all由第三计算节点对从每个第一计算节点各自接收到的最后一次更新得到的第二模型参数值聚合T 1、……、T n聚合得到,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第三聚合参数值Ts-all。
需要注意的是,若是迭代多次的情况,那么上述过程是在第二计算节点重复执行上述步骤,直至达到迭代终止条件之后执行。
在本申请上述实施方式中,在部署有新的计算节点(即第三计算节点)的情况下,第二计算节点完成迭代训练后,第一计算节点会将最后一次更新得到的第二模型参数值向第三计算节点发送,由第三计算节点直接转发或聚合后转发给第二计算节点,从而使得第二计算节点基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
第三方面,本申请实施例还提供一种基于联邦迁移学习的模型训练方法,可用于人工智能领域中,例如计算机视觉领域,该方法包括:首先,第一计算节点(可以是一个,也可以是多个)在第一计算节点上第一模型(如,特征提取器)的第一模型参数值(可用G s表示)和第一计算节点上的第二模型(如,分类器)的第二模型参数值(可用T s表示)保持不变的情况下(也可称为固定G s和固定T s),采用该第一计算节点上的第一数据集对第一计算节点上的第三模型(如,域鉴别器,也可简称为鉴别器)进行训练,以得到该第一计算节点上的第三模型的第三模型参数值(可用D s表示),其中,该第一模型参数值G s为第一计算节点对第一模型训练后得到的模型参数值,第二模型参数值T s为第一计算节点对第二模型训练后得到的模型参数取值,该第一数据集可以是有标签的数据集。在本申请实施例中,第一模型用于对输入数据进行特征提取;第二模型用于基于第一模型提取出的特征执行目标任务,例如,目标任务可以是分类任务(如,目标检测任务、语义分割任务、语音识别任务等),也可以是回归任务,此处不做限定;第三模型用于鉴别由第一模型提取出的特征的源域。作为一种示例,根据源域的数据分布可以区分输入数据所位于的计算节点,例如,判断获取到的特征是来自源域设备,还是来自目标域设备。之后,第二计算节点会获取第二聚合参数值(可用G-all表示),该第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值Gs得到,并且,第二计算节点还将在第二计算节点上的第一模型的模型参数取值为第二聚合参数值G-all的情况下,采用第二计算节点上的第二数据集对第二计算节点上的第三模型进行训练,以得到第二计算节点上的第三模型的第四模型参数值Dt。之后,第一计算节点将接收第一聚合参数值(可用D-all表示),该第一聚合参数值D-all是基于第三模型参数值Ds和第四模型参数值Dt得到。第一计算节点在得到了第一聚合参数值D-all后,会将原来的第三模型参数值Ds更新为该第一聚合参数值D-all,也就是将第一计算节点上第三模型的模型参数取值更新为第一聚合参数值D-all,并在保持第一聚合参数值D-all不变的情况下(可称为固定D-all),采用第一数据集对第一计算节点上的第一模型和第一计算节点上的第二模型再进行训练,以得到第一 计算节点上的第一模型的第五模型参数值(可用Gs’表示)和第一计算节点上的第二模型的第六模型参数值(可用Ts’表示)。最后,第一计算节点将第五模型参数值Gs’和第六模型参数值Ts’作为新的第一模型参数值和新的第二模型参数值。
需要注意的是,在本申请的一些实施方式中,还可以重复执行上述步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。为便于阐述,在本申请第三方面以及第三方面的任一种可能的实现方式中,均以该方法重复执行上述步骤,直至达到迭代终止条件的情况为例进行说明,以下不再赘述。
需要注意的是,在本申请实施例中,当第一计算节点为一个时,第一模型参数值可直接用G s表示,第二模型参数值可直接用T s表示;当第一计算节点为多个(假设为n个,n≥2)时,那么每个第一计算节点会得到各自对应第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)以及各自对应的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示)。
在本申请上述实施方式中,具体阐述了第一计算节点以及第二计算节点所组成的系统训练各个模型并得到各模型的模型参数值的过程,在该训练过程中,第一计算节点与第二计算节点之间互相传递的仅是模型参数值,不传递数据或数据特征,保护了数据隐私;并且基于传递的模型参数值,使得第一计算节点上的第一模型从第一数据集提取的特征与第二计算节点上的第一模型从第二数据集提取的特征的分布差异在迭代训练过程中逐步缩小,从而实现域对齐。因此,本申请上述实施方式在兼顾域对齐和数据隐私的情况下实现了对模型的协同训练。
在第三方面的一种可能的设计中,在重复执行上述步骤,直至达到迭代终止条件之前,该方法还可以包括:第二计算节点还可以进一步获取第一聚合参数值D-all,该第一聚合参数值D-all基于第三模型参数值D s(一个或多个)以及第四模型参数值D t得到,第三模型参数值D s为每个第一计算节点各自采用自身上的第一数据集对自身上的第三模型进行训练得到的模型参数取值。之后,第二计算节点将第四模型参数值D t更新为第一聚合参数值D-all,并在保持第一聚合参数值D-all不变的情况下(即固定D-all),采用第二数据集对第二计算节点上的第一模型和第二计算节点上的第二模型进行训练,以得到第二计算节点上的第一模型的第七模型参数值(可用G t’表示)和第二计算节点上的第二模型的第八模型参数值(可用T t’表示)。
在本申请实施例中,在不部署新的计算节点的情况下,第二计算节点也可以在固定第一聚合参数值D-all的情况下对该第二计算节点上第一模型和第二模型进行训练,也就是不仅在第一计算节点进行对抗训练,在第二计算节点上也进行对抗训练,从而使得从第一数据集上提取的特征和从第二数据集上提取的特征更快实现域对齐,提高训练速度。
在第三方面的一种可能的设计中,第一计算节点接收第一聚合参数值D-all,且第一聚合参数值D-all基于第三模型参数值D s以及第四模型参数值D t得到的过程可以是:首先,第一计算节点将第三模型参数值D s(一个或多个)向第二计算节点发送,第二计算节点再将第三模型参数值D s和第四模型参数值D t进行聚合,以得到第一聚合参数值D-all。需要 注意的是,如果第一计算节点为一个,第二计算节点得到第一聚合参数值D-all的过程就是:将来自该第一计算节点的第三模型参数值D s与第四模型参数值D t聚合,以得到第一聚合参数值D-all;如果第一计算节点为多个,第二计算节点得到第一聚合参数值D-all的过程就是:将来自每个第一计算节点各自的第三模型参数值D s(即每个第一计算节点各自有一个对应的第三模型参数值D s,可分别用D 1、……、D n表示)与第四模型参数值D t聚合,以得到第一聚合参数值D-all。最后,第二计算节点将聚合的第一聚合参数值D-all向第一计算节点发送。
在本申请上述实施方式中,第三模型参数值Ds与第四模型参数值Dt的聚合过程由第二计算节点执行,这种情况下无需额外部署新的计算节点,节约了成本。
在第三方面的一种可能的设计中,第一计算节点接收第一聚合参数值D-all,且第一聚合参数值D-all基于第三模型参数值Ds以及第四模型参数值Dt得到的过程还可以是:第一计算节点将第三模型参数值Ds向第三计算节点发送,同时第二计算节点也会将第四模型参数值Dt向第三计算节点发送,第三计算节点将第三模型参数值Ds以及第四模型参数值Dt进行聚合,以得到第一聚合参数值D-all;之后,第三计算节点将第一聚合参数值D-all向第一计算节点发送。需要注意的是,如果第一计算节点为一个,第三计算节点得到第一聚合参数值D-all的过程就是:将来自该第一计算节点的第三模型参数值Ds与第四模型参数值Dt聚合,以得到第一聚合参数值D-all;如果第一计算节点为多个,第三计算节点得到第一聚合参数值D-all的过程就是:将来自每个第一计算节点各自的第三模型参数值Ds(即每个第一计算节点各自有一个对应的第三模型参数值Ds,可分别用D1、…、Dn表示)与第四模型参数值Dt聚合,以得到第一聚合参数值D-all,最后第三计算节点再将第一聚合参数值D-all向第二计算节点发送。
在本申请上述实施方式中,第三模型参数值D s与第四模型参数值D t的聚合过程可以由额外部署的一个第三计算节点执行,降低了第二计算节点的计算开销,提高了第二计算节点的计算速度。
在第三方面的一种可能的设计中,第二计算节点获取第二聚合参数值G-all,且第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值G s得到的过程可以是:第二计算节点接收由一个或多个第一计算节点各自发送的更新后的第一模型参数值G s(一个或多个),并将第七模型参数值G t’和每个更新的第一模型参数值G s进行聚合,以得到第二聚合参数值G-all。由于第一计算节点可以是一个,也可以是多个,因此可分为两种情况:1)若第一计算节点为一个,则第二计算节点将第七模型参数值G t’和该一个第一模型参数值G s进行聚合,以得到第二聚合参数值G-all;若第二计算节点为多个(假设为n个,n≥2),则第二计算节点将第七模型参数值G t’和每个第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
在本申请上述实施方式中,在不部署新的计算节点且由第二计算节点进行对抗训练的的情况下,以得到第二聚合参数值的聚合过程是由该第二计算节点执行的,具备可实现性。
在第三方面的一种可能的设计中,在重复执行上述步骤,达到迭代终止条件后,该方法还包括:第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第 二模型参数值向该第二计算节点发送。第二计算节点接收到由一个或多个第一计算节点发送的最后一次更新得到的第一模型参数值和第二模型参数值后,首先,会将每个最后一次更新得到的第二模型参数值和最后一次更新得到的第八模型参数值Tt’进行聚合,以得到第四聚合参数值(可用T-all表示)。具体地,1)在第一计算节点为一个的情况下,第二计算节点接收第一计算节点发送的最后一次更新得到的第二模型参数值Ts,并将该最后一次更新得到的第二模型参数值Ts和最后一次更新得到的第八模型参数值Tt’进行聚合,以得到第四聚合参数值T-all;2)在第一计算节点为多个的情况下,第二计算节点接收每个第一计算节点各自发送的最后一次更新得到的第二模型参数值Ts(即多个Ts,分别用T1、…、Tn表示),第二计算节点再将各个最后一次更新得到的第二模型参数值T1、…、Tn以及Tt’进行聚合,以得到第四聚合参数值T-all。最后,第二计算节点根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第四聚合参数值T-all。
在本申请上述实施方式中,在不部署新的计算节点的情况下,具体阐述了在第二计算节点也进行了对抗训练后,第二计算节点会接收第一计算节点发送的最后一次更新得到的第一模型参数值和第二模型参数值,从而使得第二计算节点基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
在第三方面的一种可能的设计中,在第一计算节点为一个的情况下,第二计算节点获取第二聚合参数值,且该第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过程可以是:第二计算节点接收由第一计算节点发送的第一模型参数值G s(即一个G s),这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点获取第二聚合参数值,且该第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过程可以是:第二计算节点接收由每个第一计算节点各自发送的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示),这种情况下,第二计算节点会对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
在本申请上述实施方式中,具体阐述了在不部署新的计算节点的情况下,当第一计算节点分别为一个或多个的情况时,第二聚合参数值G-all可以是由第二计算节点得到,具备灵活性。
在第三方面的一种可能的设计中,在重复执行上述步骤,直至达到迭代终止条件后,该方法还包括:第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第二模型参数值向该第二计算节点发送。由于第一计算节点可以是一个,也可以是多个,因此这里分为两种情况进行介绍:1)在第一计算节点为一个的情况下,第二计算节点接收到第一计算节点发送的最后一次更新得到的第二模型参数值T s后,会根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模 型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为最后一次更新得到的第二模型参数值T s;2)在第一计算节点为多个的情况下,第二计算节点接收到由每个第一计算节点各自发送的最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示)后,会对每个最后一次更新得到的第二模型参数值T 1、……、T n进行聚合,以得到第三聚合参数值Ts-all,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为第三聚合参数值Ts-all。
在本申请上述实施方式中,在不部署新的计算节点的情况下,完成迭代训练后,第二计算节点会接收由第一计算节点发送的最后一次更新得到的第一模型参数值和第二模型参数值,第二计算节点会基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
在第三方面的一种可能的设计中,第二计算节点获取第一聚合参数值D-all,且该第一聚合参数值D-all基于第三模型参数值D s以及第四模型参数值D t得到过程还可以是:首先,第二计算节点向第三计算节点发送该第四模型参数值D t,第一计算节点也将第三模型参数值D s(一个或多个)向第三计算节点发送,第三计算节点再将第三模型参数值D s和第四模型参数值D t进行聚合,以得到第一聚合参数值D-all。需要注意的是,如果第一计算节点为一个,第三计算节点得到第一聚合参数值D-all的过程就是:将来自该第一计算节点的第三模型参数值D s与第四模型参数值D t聚合,以得到第一聚合参数值D-all;如果第一计算节点为多个,第三计算节点得到第一聚合参数值D-all的过程就是:将来自每个第一计算节点各自的第三模型参数值D s(即每个第一计算节点各自有一个对应的第三模型参数值D s,可分别用D 1、……、D n表示)与第四模型参数值D t聚合,以得到第一聚合参数值D-all。最后,第三计算节点将聚合的第一聚合参数值D-all向第二计算节点发送。
在本申请上述实施方式中,阐述第三模型参数值D s与第四模型参数值D t的聚合过程由额外部署的一个第三计算节点执行,降低了第二计算节点的计算开销,提高了第二计算节点的计算速度。
在第三方面的一种可能的设计中,第二计算节点获取第二聚合参数值G-all,且第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过程可以是:首先,第二计算节点向第三计算节点发送该第七模型参数值G t’,之后,该第三计算节点对该第七模型参数值G t’以及来自一个或多个第一计算节点的每个更新的第一模型参数值G s进行聚合,以得到第二聚合参数值G-all,第三计算节点再将该第二聚合参数值G-all向第二计算节点发送。由于第一计算节点可以是一个,也可以是多个,因此聚合过程可分为两种情况:1)在第一计算节点为一个的情况下,第二计算节点接收由第三计算节点转发的第一模型参数值G s(由第一计算节点先向第三计算节点发送,再由第三计算节点向第二计算节点转发),这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收 由第三计算节点转发的第二聚合参数值G-all,第二聚合参数值G-all由第三计算节点对每个第一模型参数值聚合得到,其中,每个第一模型参数值由每个第一计算节点各自向第三计算节点发送,即每个第一计算节点会将各自得到的第一模型参数值G s(即多个G s,分别用G 1、……、G n表示)向第三计算节点发送,该第三计算节点对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
在本申请上述实施方式中,具体阐述了在部署新的计算节点(即第三计算节点)且由第二计算节点进行对抗训练的情况下,以得到第二聚合参数值的聚合过程是由该第三计算节点执行的,具备灵活性。
在第三方面的一种可能的设计中,在重复执行上述步骤,达到迭代终止条件后,该方法还包括:第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第二模型参数值向第三计算节点发送,此外,第二计算节点也将最后一次更新得到的第八模型参数值T t’向第三计算节点发送,由该第三计算节点将每个最后一次更新得到的第二模型参数值和最后一次更新得到的第八模型参数值T t’进行聚合,以得到第四聚合参数值T-all。具体地,1)在第一计算节点为一个的情况下,第三计算节点接收第一计算节点发送的最后一次更新得到的第二模型参数值T s,同时第三计算节点接收第二计算节点发送的最后一次更新得到的第八模型参数值T t’,并将该最后一次更新得到的第二模型参数值T s和最后一次更新得到的第八模型参数值T t’进行聚合,以得到第四聚合参数值T-all;2)在第一计算节点为多个的情况下,第三计算节点接收每个第一计算节点各自发送的最后一次更新得到的第二模型参数值T s(即多个T s,分别用T 1、……、T n表示),同时第三计算节点接收第二计算节点发送的最后一次更新得到的第八模型参数值T t’,第三计算节点再将各个最后一次更新得到的第二模型参数值T 1、……、T n以及T t’进行聚合,以得到第四聚合参数值T-all。之后,第二计算节点再接收由第三计算节点发送的第四聚合参数值T-all,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第四聚合参数值T-all。
在第三方面的一种可能的设计中,第二计算节点获取第二聚合参数值G-all,且该第二聚合参数值G-all基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到的过程可以是:第二计算节点接收来自第三计算节点的第二聚合参数值G-all,该第二聚合参数值G-all由第三计算节点对来自于一个或多个第一计算节点的每个第一模型参数值G s聚合得到。同样地,由于第一计算节点可以是一个,也可以是多个,因此可分为两种情况:1)在第一计算节点为一个的情况下,第二计算节点接收由第三计算节点转发的第一模型参数值G s(由第一计算节点先向第三计算节点发送,再由第三计算节点向第二计算节点转发),这种情况下,第二聚合参数值G-all实质就是该第一模型参数值G s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收由第三计算节点转发的第二聚合参数值G-all,所述第二聚合参数值由所述第三计算节点对每个第一模型参数值聚合得到,其中,每个第一模型参数值由每个第一计算节点各自向第三计算节点发送,即每个第一计算节点会将各自得到的第一模型参数值G s(即多个G s,分别用G 1、……、 G n表示)向第三计算节点发送,该第三计算节点对接收到的这些第一模型参数值G 1、……、G n进行聚合,以得到第二聚合参数值G-all,并由该第三计算节点将得到的第二聚合参数值G-all向第二计算节点发送。
在本申请上述实施方式中,具体阐述了在不部署新的计算节点的情况下,从第二计算节点侧阐述当第一计算节点分别为一个或多个的情况时,第二聚合参数值G-all可以是由第三计算节点得到,具备灵活性。
在第三方面的一种可能的设计中,在重复执行上述步骤,直至达到迭代终止条件后,该方法还包括:第一计算节点将最后一次更新得到的第一模型参数值和最后一次更新得到的第二模型参数值向第三计算节点发送。由于第一计算节点可以是一个,也可以是多个,因此这里分为两种情况进行介绍:1)在第一计算节点为一个的情况下,第二计算节点接收由第三计算节点发送的最后一次更新得到的第二模型参数值T s,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,最后一次更新得到的第二模型参数值T s由第三计算节点从第一计算节点获取到,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为最后一次更新得到的第二模型参数值T s;2)在第一计算节点为多个(假设为n个,n≥2)的情况下,第二计算节点接收由第三计算节点发送的第三聚合参数值Ts-all,并根据第二计算节点上的第一模型和第二计算节点上的第二模型执行目标任务,第三聚合参数值Ts-all由第三计算节点对从每个第一计算节点各自接收到的最后一次更新得到的第二模型参数值聚合T 1、……、T n聚合得到,其中,第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值G-all,第二计算节点上的第二模型的模型参数取值为该第三聚合参数值Ts-all。
在本申请上述实施方式中,在部署有新的计算节点(即第三计算节点)的情况下,完成迭代训练后,第一计算节点会将最后一次更新得到的第二模型参数值向第三计算节点发送,由第三计算节点直接转发或聚合后转发给第二计算节点,第二计算节点会基于第一模型的最新模型参数值和第二模型的最新模型参数值执行目标任务,由于第二计算节点上的第一模型和第二模型各自最终的模型参数值是经由第一计算节点和第二计算节点协同训练得到的,因此模型性能得到了提高。
第四方面,本申请实施例还提供一种数据处理方法,该方法包括:首先,计算机设备获取待处理的输入数据,该输入数据与待执行的目标任务相关,例如,当目标任务是分类任务,那么输入数据就是指用于进行分类的数据。之后,计算机设备通过训练后的第一模型对该输入数据进行特征提取,以得到特征图,并通过训练后的第二模型对特征图进行处理,以得到输出数据,其中,该训练后的第一模型的模型参数值和该训练后的第二模型的模型参数值由上述第一方面或第一方面任意一种可能实现方式的方法,或,上述第二方面或第二方面任意一种可能实现方式的方法,或,上述第三方面或第三方面任意一种可能实现方式的方法训练得到。
需要注意的是,在本申请实施例中,根据目标任务的不同,输入数据的类型也不同,这里对几种典型的目标任务的应用场景进行阐述:
1)目标任务是目标检测任务
目标检测任务一般针对图像中的目标物体的检测,在这种情况下,输入数据一般是指输入的图像,计算机设备首先利用训练后的第一模型对输入的图像进行特征提取,再利用训练后的第二模型对提取的特征图进行目标检测,以得到检测结果,即输出数据是检测结果。
2)目标任务是分类任务
一种实施例中,分类任务可以是针对图像进行的,在这种情况下,输入数据是指输入的图像,计算机设备首先利用训练后的第一模型对输入的图像进行特征提取,再利用训练后的第二模型对提取的特征图进行分类,输出分类结果,即输出数据是图像的分类结果。
另一种实施例中,分类任务除了可以是针对图像进行的,还可以是针对文本或音频,在这种情况下,输入数据就是指对应的文本数据或音频数据,输出数据则是文本的分类结果或音频的分类结果。
以上仅是针对几种场景的目标任务进行说明,在不同的目标任务中,输入数据和输出数据是与该目标任务相关的,具体此处不在示例。
本申请实施例第五方面提供一种计算节点,该计算节点作为第一计算节点,具有实现上述第一方面或第一方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
本申请实施例第六方面提供一种计算节点,该计算节点作为第二计算节点,具有实现上述第二方面或第二方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
本申请实施例第七方面提供一种计算系统,该计算系统包括第一计算节点和第二计算节点,该计算系统具有实现上述第三方面或第三方面任意一种可能实现方式的方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
本申请实施例第八方面提供一种计算节点,该计算节点作为第一计算节点,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于调用该存储器中存储的程序以执行本申请实施例第一方面或第一方面任意一种可能实现方式的方法。
本申请实施例第九方面提供一种计算节点,该计算节点作为第二计算节点,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于调用该存储器中存储的程序以执行本申请实施例第二方面或第二方面任意一种可能实现方式的方法。
本申请实施例第十方面提供一种计算机设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于调用该存储器中存储的程序以执行本申请实施例第四方面或第四方面任意一种可能实现方式的方法。
本申请实施例第十一方面提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当该指令在计算机上运行时,使得计算机可以执行上述第一方面或第一方面任 意一种可能实现方式的方法,或,使得计算机可以执行上述第二方面或第二方面任意一种可能实现方式的方法。
本申请实施例第十二方面提供了一种包括指令的计算机程序或计算机程序产品,当该计算机程序或计算机程序产品在计算机上运行时,使得计算机执行上述第一方面或第一方面任意一种可能实现方式的方法,或,使得计算机可以执行上述第二方面或第二方面任意一种可能实现方式的方法。
本申请实施例第十三方面提供了一种芯片,该芯片包括至少一个处理器和至少一个接口电路,该接口电路和该处理器耦合,至少一个接口电路用于执行收发功能,并将指令发送给至少一个处理器,至少一个处理器用于运行计算机程序或指令,其具有实现如上述第一方面或第一方面任意一种可能实现方式的方法的功能,该功能可以通过硬件实现,也可以通过软件实现,还可以通过硬件和软件组合实现,该硬件或软件包括一个或多个与上述功能相对应的模块。此外,该接口电路用于与该芯片之外的其它模块进行通信,例如,该接口电路可将芯片上训练得到的各个模型的模型参数值发送给目标设备。
附图说明
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;
图2为本申请实施例提供的联邦迁移学习系统的一个示意图;
图3为本申请实施例提供的联邦迁移学习系统的另一个示意图;
图4为本申请实施例提供的基于联邦迁移学习的模型训练方法的一种流程示意图;
图5为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图6为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图7为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图8为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图9为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图10为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图11为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图;
图12为本申请实施例提供的数据处理方法的一个流程示意图;
图13为本申请实施例提供的第一计算节点的一个结构示意图;
图14为本申请实施例提供的第二计算节点的一个结构示意图;
图15为本申请实施例提供的计算机设备的一个结构示意图;
图16为本申请实施例提供的设备的一种结构示意图。
具体实施方式
本申请实施例提供了一种基于联邦迁移学习的模型训练方法及计算节点,用于利用第一计算节点上第一数据集辅助第二计算节点上的第二数据集对模型进行训练,实现域对齐,并且在计算节点之间传递的仅仅是模型的模型参数值,不传递数据或数据特征,充分保护了用户数据隐私,因此在兼顾域对齐和用户数据隐私的情况下,本申请实施例实现了对模 型的协同训练,提高了模型的性能。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
本申请实施例涉及了许多联邦学习、迁移学习、模型训练等的相关知识,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的相关术语和概念进行介绍。应理解的是,相关的概念解释可能会因为本申请实施例的具体情况有所限制,但并不代表本申请仅能局限于该具体情况,在不同实施例的具体情况可能也会存在差异,具体此处不做限定。
(1)神经网络
神经网络可以是由神经单元组成的,具体可以理解为具有输入层、隐含层、输出层的神经网络,一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层,神经网络中的每一层可称为神经网络层。其中,具有很多层隐含层的神经网络则称为深度神经网络(deep neural network,DNN)。神经网络中的每一层的工作可以用数学表达式
Figure PCTCN2022082380-appb-000001
来描述,从物理层面,神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由
Figure PCTCN2022082380-appb-000002
完成,4的操作由“+b”完成,5的操作则由“a()”来实现。这里之所以用“空间”二字来表述是因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合,其中,W是神经网络各层的权重矩阵,该矩阵中的每一个值表示该层的一个神经元的权重值。该矩阵W决定着上文所述的输入空间到输出空间的空间变换,即神经网络每一层的W控制着如何变换空间。训练神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵。
需要注意的是,在本申请实施例中,基于机器学习(如,联邦学习、迁移学习、联邦迁移学习等)任务所采用的学习模型(也可称为学习器、模型等)或其他类型的机器模型,本质都是神经网络。
(2)损失函数(loss function)
在训练神经网络的过程中,因为希望神经网络的输出尽可能的接近真正想要预测的值,可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重矩阵(当然,在第一次更新之前通常会有初始化的过程,即为神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重矩阵让它预测低一些,不断的调整,直到神经网络能够预测出真正想要的目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数或目标函数(objective  function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么神经网络的训练就变成了尽可能缩小这个loss的过程。例如,在分类任务中,损失函数用于表征预测类别与真实类别之间的差距,交叉熵损失函数(cross entropy loss)则是分类任务中常用的损失函数。
在神经网络的训练过程中,可以采用误差反向传播(back propagation,BP)算法修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中的参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
(3)特征、标签和样本
特征是指输入变量,即简单线性回归中的x变量,简单的机器学习任务可能会使用单个特征,而比较复杂的机器学习任务可能会使用数百万个特征。
标签是简单线性回归中的y变量,标签可以是小麦未来的价格、图片中显示的动/植物品种、音频剪辑的含义或任何事物。在本申请的一些实施例中,标签可以是指图片的分类类别。比如说有一张猫的图片,人们都知道它是只猫,但是计算设备不知道它是只猫,怎么办呢?那么给这张图片打上一个标签,该标签就用于向计算设备指示该图片蕴含的信息是“猫”,然后计算设备就知道它是只猫,计算设备根据这个标签对所有的猫进行学习就能通过这一只猫认识所有的猫。因此,给数据打标签,就是告诉计算设备,输入变量的多个特征描述的是什么(即y),y可以称之为label,也可以称之为target(即目标值)。
样本是指数据的特定实例,一个样本x代表的是一个对象,样本x通常用一个特征向量x=(x 1,x 2,...,x d)∈R d表示,其中,d代表样本x的维度(即特征个数),样本分为有标签样本和无标签样本,有标签样本同时包含特征和标签,无标签样本包含特征但不包含标签,机器学习的任务往往就是学习输入的d维训练样本集(可简称为训练集)中潜在的模式。
(4)模型
在本申请实施例中,基于机器学习(如,联邦学习、迁移学习、联邦迁移学习等)任务所采用的学习模型,本质都是神经网络。模型定义了特征与标签之间的关系,模型的应用一般包括训练和推理两个阶段,训练阶段用于根据训练集对模型进行训练,以得到训练后的模型的模型参数取值(与上述所述的神经网络每层的权重矩阵类似),在本申请实施例中,第一数据集、第二数据集等数据集就是作为训练集对本申请所述涉及到的各个模型进行训练;推理阶段用于将训练后的模型对真实的无标签实例进行标签预测,而预测准确率是衡量一个模型训练的好坏的重要指标之一。
(5)特征提取器、分类器和鉴别器
在深度学习领域中,由于神经网络是由神经单元组成的,一个神经网络一般包含多个神经网络层,因此,如果根据神经网络层的具体功能对神经网络进行划分,可以得到各种具有特定功能的神经网络模块,这里介绍本申请实施例涉及到的几种神经网络模块。
特征提取器:神经网络从输入层到某一中间层的部分,用于对输入数据(如,样本) 进行特征提取,具体为通过一些运算(如,卷积操作)将原始的输入数据(如,图片、文本等)提取出一些重要特征。在本申请实施例中,第一模型可以是特征提取器。
分类器:根据待执行任务的不同,在特征提取器之后的部分神经网络层可具备不同的功能,这部分神经网络层可称为子任务模型,用于对提取出的特征进行分类、回归或其他下游子任务等,例如,下游子任务可以是目标检测任务、分类任务、语音识别任务、语义分割任务等。下面以子任务模型用在分类任务中为例进行说明:当该子任务模型用在分类任务中时,就用于对特征提取器提取出的特征进行分类,以得到预测的标签。在本申请实施例中,第二模型可以是子任务模型,用于基于第一模型提取出的特征执行目标任务,只要是神经网络可执行的任务,都可作为本申请第二模型能够执行的目标任务。例如,第二模型可以是分类器。为便于阐述,后续实施例中均以第二模型为分类器为例进行示意。
鉴别器:结构上为接在特征提取器之后的部分神经网络层,用于对特征提取器提取出的特征所属的域进行鉴别,可以理解成是域分类器(一种特殊的分类器),只不过此时不是对输入数据进行分类,而是对输入数据的源域进行区分。在本申请实施例中,第三模型可以是鉴别器。
(6)对抗训练(adversarial training)
对抗训练是增强神经网络鲁棒性的重要方式,在本申请实施例中,若第一模型为特征提取器,第三模型为鉴别器,这种情况下的对抗训练指的是特征提取器与鉴别器之间的对抗训练,具体地,一方面,需训练鉴别器尽量能区分出某一提取出的特征是来自于目标域还是源域;另一方面,需训练特征提取器提取出足够迷惑鉴别器的特征,在两者相互对抗的过程中,双方都得到了有效训练。
(7)联邦学习(federated learning,FL)
联邦学习是一种用于保护用户隐私的机器学习方法。在机器学习领域的一些实际应用场景中,由于单个设备上的数据特征不充分或者样本数量较少等限制,很难单独训练出较好的机器学习模型,因此需要融合多个设备的数据在一起进行训练,从而得到一个质量较好的模型;在融合多个设备上的数据进行训练的同时还需要保证用户的数据隐私,即数据不能传送出用户的设备,只能在本地用于进行模型训练,联邦学习就是基于这一要求应运而生,其能有效帮助多个计算节点在满足用户隐私保护、数据安全和政府法规的要求下,进行数据使用和机器学习建模。
(8)迁移学习(transfer learning,TL)
迁移学习是一种机器学习方法,就是把为任务A开发的模型作为初始点,重新使用在为任务B开发模型的过程中。也就是说,把基于已有任务(如所述的任务A)训练好的模型学习到的知识迁移到新的任务(如所述的任务B)中来帮助该模型进行再训练,通过迁移学习将已经学到的知识(蕴含在模型参数中)通过某种方式来分享给新任务从而加快并优化模型的学习效率,这样模型不用从零开始学习。例如,在目标检测任务中,使用在ImageNet数据集上训练好的模型作为新任务的模型可以明显的提升训练效率。
(9)源域和目标域
在迁移学习中,源域是指知识迁出的一方,目标域是指知识迁入的一方。
(10)联邦迁移学习(federated transfer learning,FTL)
联邦迁移学习是一种结合了联邦学习和迁移学习的机器学习方法,即在不共享隐私数据的情况下,对模型(或神经网络)进行多任务协同训练。
(11)独立同分布(independently identically distribution,IID)与非独立同分布(not independently identically distribution,Non-IID)
在概率统计理论中,独立同分布是指一组随机变量中每个变量的概率分布都相同,且这些随机变量互相独立。一组随机变量独立同分布并不意味着它们的样本空间中每个事件发生概率都相同。例如,投掷非均匀骰子得到的结果序列是独立同分布的,但掷出每个面朝上的概率并不相同。
在机器学习领域中,独立同分布是指输入空间X的所有样本服从一个隐含未知的分布,训练数据所有样本都是独立地从这个分布上采样而得;而非独立同分布是指训练数据不是从同一个分布中采样的,或者训练数据之间不是独立进行采样的。
(12)域对齐
在机器学习领域的一些实际应用场景中,源域上的数据一般为带标签数据,目标域上的数据一般为无/少标签的数据。由于目标域已有数据缺少标签,想要直接完成相关的机器学习任务非常困难,常常需要源域数据的辅助来提高模型性能从而完成相关任务。由于不同域之间的数据常常是非独立同分布的,这样的分布差异使得直接迁移知识的效果较差,因此常常需要采用一定的方法对源域和目标域进行域对齐,一般来说,域对齐是对齐不同域之间的数据分布,从而提升迁移学习的迁移效果,在本申请实施例中,域对齐则是指对齐从不同域提取的数据特征的分布。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、 语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
[根据细则91更正 28.04.2022] 
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、智慧城市等。
本申请实施例可以应用在机器学习中各种模型的训练方法优化上,而通过本申请的基于联邦迁移学习的模型训练方法训练得到的模型具体可以应用于人工智能领域中的各个细分领域中,如,计算机视觉领域中的图像处理领域,具体的,结合图1来讲,本申请实施例中基础设施获取的数据可以是本申请实施例各个计算节点上的本地数据集,如,第一计算节点上的第一数据集、第二计算节点上的第二数据集等,各数据集中的数据具体可以是视频数据(如,可由监控系统拍摄得到)、图像数据(如,可由移动终端的摄像头拍摄得到)、文本数据(如,通过终端设备由用户输入的文本信息)等,具体此处对各数据集内的数据类型不做限定,其中,第一计算节点作为源域设备,其上的第一数据集为有标签数据集,第二计算节点作为目标域设备,其上的第二数据集则为无标签数据或具有少量标签的数据。
接下来对本申请实施例提供的基于联邦迁移学习的模型训练方法的基本原理进行介绍,具体请参阅图2,图2为本申请实施例提供的联邦迁移学习系统的一个示意图,在一种实现方式中,该系统可以包括n个第一计算节点(可记为S 1,S 2,…,S n)和一个第二计算节点(可记为T),其中,n为大于等于1的整数,即第一计算节点可以是一个,也可以是多个,此处不做限定。在本申请实施例中,第一计算节点作为源域设备,第二计算节点作为目标域设备,并且,每个计算节点上都各自具有本地数据集,每个第一计算节点上的本地数据集可称为第一数据集,第二计算节点上的本地数据集可称为第二数据集,每个第一数据集均为有标签数据集,第二数据集则为无标签或少标签的数据集。此外,每个计算节点上的数据集也都有自己的数据分布D,如图2中的各个第一数据集的数据分布为D 1,D 2,…,D n,第二数据集的数据分布为D T,从图2中的坐标图中可以看出,各个计算节点之间的数据分布存在很大差异,并且,各个本地的数据集本身不能传出本计算节点,如图 2中的
Figure PCTCN2022082380-appb-000003
符号表示的就是指本地数据集不能出其所在的计算节点。
此外,每个计算节点上具备同样的初始模型结构,不同域之间(包括所有源域设备和目标域设备)通过对抗训练的方式实现域对齐,对抗训练使得各个计算节点上的初始模型的模型参数值可能不同,此时每个第一计算节点上的模型的模型参数值可分别记为M 1,M 2,…,M n,第二计算节点上的模型参数值可记为M T,具体可如图2所示。然后通过一个新部署的第三计算节点(如,服务器)将各个域上的对应模型(如,特征提取器、鉴别器、分类器等)的模型参数值汇聚为M(汇聚的方式有很多种,例如可以是做模型参数层面的简单平均,也可以是引入一些加权平均或者其他更复杂的汇聚方式),之后再将所述汇聚好的模型参数值M赋值给所有计算节点上的模型的模型参数,以上的整个过程称为一轮训练迭代。之后,通过多轮迭代,直到达到预设的迭代轮次,或者其他设定的训练停止条件。具体地,可以包括如下步骤:步骤1、在各个第一计算节点上通过自身有标签的第一数据集对模型(包括第一模型和第二模型)做有监督训练,其中,第一模型可以是特征提取器,第二模型可以是子任务模型(如,分类器);步骤2、在各个第一计算节点保持各自第一模型的模型参数值和第二模型的模型参数值不变的情况下,利用各自的本地数据集训练各自的第三模型,第三模型可以是鉴别器,此外,第二计算节点也保持自身第一模型的模型参数值(由各个第一计算节点的第一模型的模型参数值聚合得到)不变的情况下,利用第二计算节点上的本地数据集训练自身的第三模型;步骤3、将每个第一计算节点上训练后的第三模型的模型参数值与第二计算节点上训练后的第三模型的模型参数值进行聚合,形成第三模型的聚合参数值;步骤4、将得到的第三模型的聚合参数值赋值给各个第一计算节点各自的第三模型的模型参数,并由各个第一计算节点再次利用各自的本地数据集对各自的第一模型和第二模型进行训练;步骤5、不断迭代步骤2-4,直至达到迭代终止条件。
需要说明的是,在上述实施例中,是通过一个新部署的一个第三计算节点(如,服务器)将各个域上的对应模型(如,特征提取器、鉴别器、分类器等)的模型参数值进行汇聚,在本申请的另一些实施方式中,还可以由作为目标域设备的第二计算节点实现将各个域上的对应模型的模型参数值进行汇聚的功能,在这种情况下,联邦迁移学习系统可以不包括第三计算节点,具体如图3所示,图3为本申请实施例提供的联邦迁移学习系统的另一个示意图,在图3中,第三计算节点的对模型参数值进行汇聚的功能由作为目标域设备的第二计算节点承载。需要注意的是,在本申请实施例中,第二计算节点作为目标域设备,一般为一个,若有多个第二计算节点,则按照类似的方式依次对每个第二计算节点上的模型进行训练,此处不予赘述。
还需要说明的是,在本申请的另一些实施方式中,还可以是由作为源域设备的第一计算节点实现将各个域上的对应模型的模型参数值进行汇聚的功能,具体的汇聚过程与上述第二计算节点或第三计算节点类似,此处不予赘述。需要注意的是,在本申请实施例中,若有多个第一计算节点,则可以是任意选择一个第一计算节点作为模型参数值汇聚的执行主体,也可以是用户根据实际需求选择的符合条件的第一计算节点作为模型参数值汇聚的执行主体,选择方式可以有多种,具体此处不做限定。
在本申请实施例中,由于第一计算节点作为源域设备,可以是一个,也可以是多个,当第一计算节点的数量不同,本申请实施例提供的基于联邦迁移学习的模型训练方法也略有不同,此外,通过部署新的第三计算节点将模型参数值进行汇聚以进行模型的训练,与将第三计算节点的汇聚功能承载在第二计算节点上进行模型训练在方法流程上也略有不同,下面从第一计算节点分别为一个或多个、是否部署有新的第三计算节点的角度,对本申请实施例提供的基于联邦迁移学习的模型训练方法进行介绍。
需要说明的是,在本申请下述实施例中,第一计算节点上的本地数据集为第一数据集,第二计算节点上的本地数据集为第二数据集,各个计算节点均是采用各自的本地数据集对各个模型进行训练,以下不再赘述,此外,在本申请实施例中,s代表源域,t代表目标域,第一模型至第三模型的模型参数分别用G、T、D表示。记来自于n个第一计算节点各自第一数据集的数据和标签,以及对应的数据分布和标签分布为下述式(1)所示:
Figure PCTCN2022082380-appb-000004
其中,
Figure PCTCN2022082380-appb-000005
为第i个第一计算节点上的数据,
Figure PCTCN2022082380-appb-000006
为数据
Figure PCTCN2022082380-appb-000007
对应的标签,
Figure PCTCN2022082380-appb-000008
为第i个第一计算节点上第一数据集的数据分布,
Figure PCTCN2022082380-appb-000009
为第i个第一计算节点上第一数据集的标签分布。
此外,记来自于第二计算节点上第二数据集的数据以及对应的数据分布为下述式(2)所示:
Figure PCTCN2022082380-appb-000010
其中,x t为第二计算节点上的数据,
Figure PCTCN2022082380-appb-000011
为第二计算节点上第二数据集的数据分布。
并且,记第一模型、第二模型、第三模型分别为:g(·)、c(·)、d(·)。在此基础上,记来自于第i个第一计算节点的第一模型、第二模型、第三模型分别为:
Figure PCTCN2022082380-appb-000012
记来自于第二计算节点的第一模型、第二模型、第三模型分别为:g t(·)、c t(·)、d t(·)。
一、第一计算节点为一个,且部署有新的第三计算节点
具体请参阅图4,图4为本申请实施例提供的基于联邦迁移学习的模型训练方法的一种流程示意图,该实施例针对的场景是利用单个源域设备(即单个第一计算节点)的本地数据和标签来辅助一个本地数据无标签或少标签的目标域设备(即第二计算节点)提升模型性能。具体地,该方法可以包括如下步骤:
401、第一计算节点在本地训练第一模型和第二模型,并将训练得到的第一模型的模型 参数值G s和第二模型的模型参数值T s向第三计算节点发送。
首先,第一计算节点采用自身的第一数据集对第一计算节点上的第一模型和第一计算节点上的第二模型做有监督训练,从而得到第一模型的模型参数值G s(G s可称为第一模型参数值)和第二模型的模型参数值T s(T s可称为第二模型参数值),并将得到的模型参数值G s和模型参数值T s向第三计算节点发送。
在本申请实施例中,该第一模型用于对输入数据进行特征提取,因此也可将第一模型称为特征提取器,该第二模型用于基于第一模型提取出的特征执行目标任务(如,目标检测任务、语音识别任务、语义分割任务等),因此可将第二模型称为子任务模型(如,分类任务中的分类器)。具体来说,第一计算节点先将第一数据集中的训练数据输入第一模型,由该第一模型从训练数据中提取出相应特征,然后第一模型将提取出的特征传递至第二模型以执行目标任务,例如,当第二模型为分类任务中的分类器时,提取的特征将会输入分类器进行预测,以得到预测的类别标签,再通过损失函数可刻画预测的类别标签与真实标签之间的差异。在分类任务中,一种典型的损失函数为交叉熵损失函数,可以表示为下述式(3)所示:
Figure PCTCN2022082380-appb-000013
其中,
Figure PCTCN2022082380-appb-000014
表示对所有训练数据取平均,q k表示标签y的编码的第k位,δ k是分类器输出结果在softmax之后的第k个元素,因此在本申请实施例中,第一计算节点为一个,因此i=1。
需要注意的是,式(3)仅为本申请实施例中一种损失函数的示意,可根据实际应用需求自行选择合适的损失函数,此处不做限定。
需要说明的是,在本申请实施例中,第一模型和第二模型可以属于同一个神经网络的不同部分,例如,在一种应用场景中,第一模型可以作为特征提取器用于对输入数据进行特征提取,第二模型可以作为标签分类器用于对第一模型提取的特征进行标签识别,在这种情况下,第一模型和第二模型可以一起进行训练,以得到的第一模型的模型参数值和第二模型的模型参数值就可以是一起固定、一起训练、一起上传;在另一些应用场景中,第一模型和第二模型也可以是分开进行训练,在这种情况下,以得到的第一模型的模型参数值和第二模型的模型参数值就不是必须一起固定、一起训练、一起上传。也就是说,在本申请的一些实施方式中,第一计算节点也可以只是将训练得到的第一模型的模型参数值G s向第三计算节点发送。在整个训练达到迭代终止条件的情况下,第一计算节点再将最后一次更新得到的第二模型的模型参数值T s向第三计算节点发送。
402、第三计算节点将G s向第二计算节点发送,G s=G t
第三计算节点接收由第一计算节点发送的模型参数值G s,会将该模型参数值G s向第二计算节点发送,G s=G t。在本申请实施例中,第二计算节点上的第一模型可以用该G t做初始化。
403、第一计算节点在保持G s和T s不变(可称为固定G s和T s)的情况下,在本地训练 第三模型,以得到该第三模型的模型参数值D s,并将D s向第三计算节点发送。
第一计算节点在本地训练完第一计算节点上的第一模型和第二模型后,会在保持G s和T s不变的情况下,在本地训练第一计算节点上的第三模型,从而得到第一计算节点上第三模型的模型参数值D s(D s可称为第三模型参数值),并将D s向第三计算节点发送。在本申请实施例中,第三模型用于对第一模型提取出的特征所属的域进行鉴别,可以理解成是域分类器(一种特殊的分类器),只不过此时不是对输入数据进行分类,而是对输入数据的源域进行区分。
在本申请实施例中,第三模型的目标就是尽量区分出传入的特征是来自于源域还是来自于目标域。在本申请实施例中,不妨假定源域的域标签为0,目标域的域标签为1,所以第一计算节点上的第三模型要尽量的输出预测标签0,一种典型的损失函数可以表示为下述式(4)所示:
Figure PCTCN2022082380-appb-000015
相应符号的含义与上述同理,此处不予赘述。并且同样需要注意的是,式(4)仅为本申请实施例中一种损失函数的示意,可根据实际应用需求自行选择合适的损失函数,此处不做限定。
404、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t,并将D t向第三计算节点发送。
第二计算节点接收到由第三计算节点发送的模型参数值G s(即G t)之后,也会在第二计算节点上第一模型的模型参数保持G t不变的情况下,在本地训练第二计算节点上的第三模型,从而得到第二计算节点上第三模型的模型参数值D t(D t可称为第四模型参数值),并将模型参数值D t向第三计算节点发送。
类似地,在本申请实施例中,第三模型的目标是尽量区分出传入的特征是来自于源域还是来自于目标域。当假定源域的域标签为0,目标域的域标签为1,那么第二计算节点上的第三模型要尽量的输出预测标签1,一种典型的损失函数可以表示为下述式(5)所示:
Figure PCTCN2022082380-appb-000016
相应符号的含义与上述同理,此处不予赘述。并且同样需要注意的是,式(5)仅为本申请实施例中一种损失函数的示意,可根据实际应用需求自行选择合适的损失函数,此处不做限定。
需要说明的是,在本申请实施例中,对步骤403与步骤404的执行顺序不做限定,可以先执行步骤403再执行步骤404,也可以先执行步骤404再执行步骤403,还可以是步骤403和步骤404同时执行,具体此处不做限定。
405、第三计算节点将D s和D t进行聚合,以得到第一聚合参数值D-all。
第三计算节点分别接收到第一计算节点发送的D s和第二计算节点发送的D t后,将对D s和D t进行聚合,以得到第一聚合参数值D-all。这样当第三模型的模型参数被赋值为第一聚合参数值D-all时,该第三模型就同时具备了识别第一数据集上的数据特征和第二数据 集上的数据特征的能力。
需要说明的是,在本申请实施例中,将D s和D t进行聚合的方式有多种,例如可以是做模型参数层面的简单平均,例如,D-all=(D s+D t)/2,也可以是引入一些加权平均,例如,D-all=x*D s+y*D t,其中,x和y可根据需求自行设置,且x+y=1,或者其他更复杂的聚合方式,具体此处不做限定。由于本申请是对模型的模型参数值进行聚合,并且传递的也仅是模型参数值或聚合参数值,并没有涉及到原始数据或者数据特征的传输,所以能够保护数据隐私。
406、第三计算节点将D-all分别向第一计算节点和第二计算节点发送,使得第一计算节点得到D s’、第二计算节点得到D t’,D-all=D s’=D t’。
第三计算节点聚合得到第一聚合参数值D-all后,会将第一聚合参数值D-all分别向第一计算节点和第二计算节点发送,使得第一计算节点得到D s’、第二计算节点得到D t’,D-all=D s’=D t’。
407、第一计算节点将D s更新为D s’,并在保持第一计算节点上第三模型的模型参数值不变(即固定D s’)的情况下,在本地再训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G s’和第二模型的模型参数值T s’向第三计算节点发送。
第一计算节点接收到由第三计算节点发送的第一聚合参数值D-all(即D s’)后,将D s更新为D s’(即将第三模型参数值更新为第一聚合参数值),并在保持第一计算节点上第三模型的模型参数值不变(即固定D s’)的情况下,在本地再训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G s’(G s’可称为第五模型参数值)和第二模型的模型参数值T s’(T s’可称为第六模型参数值)向第三计算节点发送。
在本申请实施例中,第一计算节点固定D s’并在本地再训练第一模型和第二模型的目的是让第一模型能够提取到足够迷惑第三模型的特征,也就是尽量对齐源域和目标域的特征,在这一步中,一种典型的损失函数可以表示为下述式(6)所示:
Figure PCTCN2022082380-appb-000017
相应符号的含义与上述同理,此处不予赘述。并且同样需要注意的是,式(6)仅为本申请实施例中一种损失函数的示意,可根据实际应用需求自行选择合适的损失函数,此处不做限定。
这里还需要说明的是,式(6)中有个“1-”,这个部分是把域标签反置,也就是0变成1,1变成0。这也就是为了迷惑第三模型,使其将源域的预测成目标域的,且将目标域的预测成源域的。
408、将G s’和T s’分别作为新的G s和T s,重复执行上述步骤402-407,直至达到迭代终止条件。
接下来,第一计算节点会进一步将G s’和T s’分别作为新的G s和T s(即将第五模型参数值和第六模型参数值作为新的第一模型参数值和新的第二模型参数值),重复上述步骤402-407,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。这里需要说明的是,在本申请实施例中,不限定判断迭代终止条件的执行主体是哪个计算节点,例如, 可以是第一计算节点,也可以是第三计算节点。作为一个示例,假设迭代终止条件为达到预设的训练轮次(如,100次),那么判断迭代终止条件的执行主体可以是第三计算节点,也可以是第一计算节点,如,当第三计算节点第100次(如,可由部署于第三计算节点上的计数器计数得到)接收到由第一计算节点上传的G s和T s,则第三计算节点确定此时达到迭代终止条件,又如,当第一计算节点第100次(类似的,可由部署于第一计算节点上的计数器计数得到)在本地完成对第一模型和第二模型的训练,则第一计算节点确定此时达到迭代终止条件。这里需要注意的是,由某个计算节点(如,第一计算节点)确定了当前训练轮次为最后一个训练轮次后,该计算节点会进一步将判断结果(即确定达到迭代终止条件)发送给其他计算节点(如,第三计算节点)。在本申请下述其他实施例中,如何判断是否达到迭代终止条件的方式与此类似,具体下述不再赘述。
本申请实施例中,步骤402-407就是对抗训练的过程,不断重复这个对抗训练的过程直到达到迭代终止条件,在最后源域和目标域的特征就基本被对齐。
409、第三计算节点将最后一次更新得到的T s(可称为T s-new)和G s(可称为G s-new)向第二计算节点发送,T s-new=T t-new,G s-new=G t-new
需要说明的是,在本申请的一些实施方式中,在达到迭代终止条件后,第三计算节点在步骤407中会接收到第一计算节点发送的最后一次更新得到的模型参数值G s(可称为G s-new)和最后一次更新得到的模型参数值T s(可称为T s-new),因此,第三计算节点会将G s-new和T s-new向第二计算节点发送,使得第二计算节点得到G t-new和T t-new,T s-new=T t-new,G s-new=G t-new
这里需要注意的是,在本申请的一些实施方式中,假设是由第一计算节点基于模型(如,第一模型或第二模型)的损失函数的取值情况来判断是否达到迭代终止条件,并假设在当前训练轮次(例如,第60次)的步骤407中,第一计算节点在本地训练第一模型和第二模型时对应损失函数的取值比上一轮次(即第59次)的取值大,说明在上一个轮次的训练过程中,模型的损失函数已实现收敛,在这种情况下,最后一次更新得到的模型参数值G s和模型参数值T s不是当前训练轮次得到的模型参数值G s和模型参数值T s,而是将上一个训练轮次得到的模型参数值G s和模型参数值T s作为最后一次更新得到的G s-new和T s-new,这种情况下,第二计算节点已经在上一个训练轮次的步骤402接收到了由第三计算节点向第二计算节点发送的最后一次更新得到的G s-new,G s-new=G t-new,由于在步骤402中,只是将最后一次更新得到的G s-new向第二计算节点发送了,因此在步骤409中,第三计算节点就只需将T s-new向第二计算节点发送即可。需要注意的是,在本申请下述其他实施例中,若是由第一计算节点基于模型(如,第一模型或第二模型)的损失函数的取值情况来判断是否达到迭代终止条件的情况都与此执行类似的操作,具体下述不再赘述。
410、第二计算节点使用G t-new和T s-new执行目标任务。
第二计算节点在得到G t-new和T t-new之后,会将该G t-new和T t-new分别作为第二计算节点上第一模型和第二模型的最终模型参数值(因为源域和目标域的特征已经对齐了,这样做才有意义),并根据第二计算节点上的该第一模型和该第二模型执行目标任务,如,目标任务可以是目标检测任务、分类任务、语音识别任务、语义分割任务等,只要是神经网络可 执行的任务,都可作为本申请第二计算节点能够执行的目标任务。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤409和步骤410。
由本申请上述实施例可知,一方面,本申请上述实施例通过步骤402-407的对抗训练过程实现了域对齐,从而缩小了从源域数据上提取的特征和从目标域数据上提取的特征的分布差异,即缩小了从第一数据集提取的数据特征与从第二数据集提取的数据之间的分布差异,相比于传统的不做域对齐的联邦学习而言,能够更好地利用第一计算节点上第一数据集辅助第二计算节点上的第二数据集对模型进行训练,训练得到的模型性能会更好;第二方面,由于本申请是对模型的模型参数值进行聚合,并且传递的也仅是模型参数值或聚合参数值,并没有涉及到原始数据或者数据特征的传输,这与传统的迁移学习以及现有的基于特征传输的联邦迁移学习有着本质的区别,因此能够起到保护隐私的作用。
综上所述,在兼顾域对齐和用户数据隐私的情况下,本申请实施例提供的方法实现了对模型的协同训练,提高了模型的性能。
需要说明的是,在本申请上述图4对应的实施例中,对抗训练过程仅是在第一计算节点上进行,实际上,在本申请的一些实施方式中,为了更好的提升模型性能,也可以在第二计算节点上进行对抗训练过程,因此,本申请实施例还提供了一种基于联邦迁移学习的模型训练方法,具体请参阅图5,图5为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图,图5对应的实施例与上述图4对应的实施例的区别在于,图5对应的实施例在第二计算节点部分也加入了对抗训练部分。具体地,该方法可以包括如下步骤:
501、第一计算节点在本地训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G s和第二模型的模型参数值T s向第三计算节点发送。
502、第三计算节点将G s向第二计算节点发送,G s=G t
503、第一计算节点在保持G s和T s不变(可称为固定G s和T s)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D s,并将D s向第三计算节点发送。
504、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t,并将D t向第三计算节点发送。
505、第三计算节点将D s和D t进行聚合,以得到第一聚合参数值D-all。
506、第三计算节点将D-all分别向第一计算节点和第二计算节点发送,使得第一计算节点得到D s’、第二计算节点得到D t’,D-all=D s’=D t’。
507、第一计算节点将D s更新为D s’,并在保持第一计算节点上第三模型的模型参数值不变(即固定D s’)的情况下,在本地再训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G s’和第二模型的模型参数值T s’向第三计算节点发送。
步骤501-507与上述步骤401-407类似,具体请参阅上述步骤401-407,此处不予赘述。
508、第二计算节点将D t更新为D t’,并在保持第二计算节点上第三模型的模型参数值不变(即固定D t’)的情况下,在本地训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G t’和第二模型的模型参数值T t’向第三计算节点发送。
第二计算节点接收到由第三计算节点发送的第一聚合参数值D-all(即D s’)后,将第二计算节点上的第三模型的模型参数值D t更新为D s’(即将第四模型参数值更新为第一聚合参数值),并在保持第二计算节点上第三模型的模型参数值不变(即固定D s’)的情况下,在本地训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G t’(G t’可称为第七模型参数值)和第二模型的模型参数值T t’(T t’可称为第八模型参数值)向第三计算节点发送。
类似地,在本申请实施例中,第二计算节点固定D t’并在本地训练第一模型和第二模型的目的也是为了让第一模型能够提取到足够迷惑第三模型的特征,也就是尽量对齐源域和目标域的特征,在这一步中,一种典型的损失函数可以表示为下述式(7)所示:
Figure PCTCN2022082380-appb-000018
相应符号的含义与上述同理,此处不予赘述。并且同样需要注意的是,式(7)仅为本申请实施例中一种损失函数的示意,可根据实际应用需求自行选择合适的损失函数,此处不做限定。
这里还需要说明的是,式(7)中有个“1-”,这个部分是把域标签反置,也就是0变成1,1变成0。这也就是为了迷惑第三模型,使其将源域的预测成目标域的,且将目标域的预测成源域的。
509、第三计算节点将G s’和G t’进行聚合,以得到第二聚合参数值G-all。
这时,第三计算节点从第一计算节点接收到了模型参数值G s’和模型参数值T s’,并且从第二计算节点接收到了模型参数值G t’和模型参数值T t’,接下来,第三计算节点会进一步将G s’和G t’进行聚合,以得到第二聚合参数值G-all。
510、将G-all和T s’分别作为新的G s和T s,重复执行上述步骤502-509,直至达到迭代终止条件。
之后,第一计算节点会进一步将G-all和T s’分别作为新的G s和T s,重复上述步骤502-509,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
511、第三计算节点将最后一次更新得到的T s(可称为T s-new)与最后一次更新得到的T t’(可称为T t-new)进行聚合,以得到第四聚合参数值T-all。
需要说明的是,在本申请的一些实施方式中,在达到迭代终止条件后,第三计算节点在步骤507中会接收到第一计算节点发送的最后一次更新得到的模型参数值G s(可称为G s-new)和模型参数值T s(可称为T s-new),并且第三计算节点还会在步骤508中接收到第二计算节点发送的最后一次更新得到的模型参数值G t’(可称为G t-new)和模型参数值T t’(可称为T t-new),因此第三计算节点会将T s-new与T t-new进行聚合,以得到第四聚合参数值T-all。
512、第三计算节点将第四聚合参数值T-all和最后一次更新得到的G-all向第二计算节点发送。
第三计算节点进一步将第四聚合参数值T-all和最后一次更新得到的G-all向第二计算节点发送。
513、第二计算节点使用最后一次更新得到的G-all和T-all执行目标任务。
第二计算节点在得到最后一次更新的G-all和T-all之后,会将该G-all和T-all分别作为第二计算节点上第一模型和第二模型的最终模型参数值(因为源域和目标域的特征已经对齐了,这样做才有意义),并根据第二计算节点上的该第一模型和该第二模型执行目标任务,如,目标任务可以是目标检测任务、分类任务、语音识别任务、语义分割任务等,只要是神经网络可执行的任务,都可作为本申请第二计算节点能够执行的目标任务。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤511至步骤513。
在本申请上述实施方式中,在作为目标域设备的第二计算节点上也引入了对抗训练过程,这在一些特定的任务场景下能够训练出性能更好的模型。
二、第一计算节点为多个,且部署有新的第三计算节点
上述图4、图5对应的实施例阐述的都是第一计算节点为一个,且部署有新的第三计算节点的情况,在本申请实施例中,将继续介绍第一计算节点为多个,且部署有新的第三计算节点的情况下基于联邦迁移学习的模型训练方法,具体请参阅图6,图6为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图,该实施例针对的场景是利用多个源域设备(即多个第一计算节点)的本地数据(每个第一计算节点上都有各自的第一数据集)和标签来辅助一个本地数据无标签或少标签的目标域设备(即第二计算节点)提升模型性能。在本申请实施例中,假定第一计算节点的数量为n,n≥2。具体地,该方法可以包括如下步骤:
601、每个第一计算节点各自在本地训练第一模型和第二模型,并将各自训练得到的第一模型的模型参数值G i和第二模型的模型参数值T i向第三计算节点发送,
Figure PCTCN2022082380-appb-000019
每个第一计算节点在本地训练第一模型和第二模型的过程与上述步骤801类似,具体请参阅上述步骤801,此处不予赘述。
602、第三计算节点对所有G i(即G 1、……、G n)进行聚合,以得到第二聚合参数值G-all。
在本申请实施例中,由于第一计算节点的数量为n,因此第三计算节点会接收到每个第一计算节点发送的G 1、……、G n,并接收到每个第一计算节点发送的T 1、……、T n,并对G 1、……、G n进行聚合,以得到第二聚合参数值G-all。
需要说明的是,在本申请实施例中,将G 1、……、G n进行聚合的方式有多种,例如可以是做模型参数层面的简单平均,具体可以如下述式(8)所示:
Figure PCTCN2022082380-appb-000020
其中,θ用于表征第一模型的模型参数,θ G为第二聚合参数值G-all,
Figure PCTCN2022082380-appb-000021
为第一计算节点i上第一模型的模型参数值G i
此外,将G 1、……、G n进行聚合的方式也可以是进行加权平均,或者其他更复杂的聚合方式,具体此处不做限定。由于本申请是对模型的模型参数值进行聚合,并且传递的也仅是模型参数值或聚合参数值,并没有涉及到原始数据或者数据特征的传输,所以能够保护数据隐私。
603、第三计算节点将G-all向第二计算节点发送,G-all=G t
第三计算节点将聚合得到的G-all向第二计算节点发送,G-all=G t。在本申请实施例中, 第二计算节点上的第一模型可以用该G-all做初始化。
604、每个第一计算节点在保持各自的G i和T i不变(可称为固定G i和T i)的情况下,在本地各自训练第三模型,以得到该第三模型的模型参数值D i,并各自将D i向第三计算节点发送。
每个第一计算节点在固定G i和T i的情况下,在本地训练第三模型的过程与上述步骤403类似,具体请参阅上述步骤403,此处不予赘述。
605、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t,并将D t向第三计算节点发送。
步骤605与上述步骤404类似,具体请参阅上述步骤404,此处不予赘述。
这里需要注意的是,在本申请实施例中,由于第一计算节点为n个,因此,在各个第一计算节点对各自的第三模型进行训练的过程中,既可以将所有第一计算节点的域标签定为0,也可以对它们赋予不同的域标签,以期第三模型还能够区分出输入数据的特征来源于哪个第一计算节点。
需要说明的是,在本申请实施例中,对步骤604与步骤605的执行顺序不做限定,可以先执行步骤604再执行步骤605,也可以先执行步骤605再执行步骤604,还可以是步骤604和步骤605同时执行,具体此处不做限定。
606、第三计算节点将所有D i(即D 1、……、D n)和D t进行聚合,以得到第一聚合参数值D-all。
第三计算节点分别接收到每个第一计算节点发送的D 1、……、D n,和第二计算节点发送的D t后,将对所有D i和D t进行聚合,以得到第一聚合参数值D-all。这样当第三模型的模型参数被赋值为第一聚合参数值D-all时,该第三模型就同时具备了识别第一数据集上的数据特征和第二数据集上的数据特征的能力。
需要说明的是,在本申请实施例中,将D 1、……、D n和D t进行聚合的方式有多种,例如可以是做模型参数层面的简单平均,具体可以如下述式(9)所示:
Figure PCTCN2022082380-appb-000022
其中,
Figure PCTCN2022082380-appb-000023
用于表征第三模型的模型参数,
Figure PCTCN2022082380-appb-000024
为第一聚合参数值D-all,
Figure PCTCN2022082380-appb-000025
为第一计算节点i上第三模型的模型参数值D i
Figure PCTCN2022082380-appb-000026
为第二计算节点上第三模型的模型参数值D t
此外,将D 1、……、D n和D t进行聚合的方式也可以是进行加权平均,或者其他更复杂的聚合方式,具体此处不做限定。由于本申请是对模型的模型参数值进行聚合,并且传递的也仅是模型参数值或聚合参数值,并没有涉及到原始数据或者数据特征的传输,所以能够保护数据隐私。
607、第三计算节点将D-all分别向每个第一计算节点和第二计算节点发送,使得每个第一计算节点各自得到D i’、第二计算节点得到D t’,D-all=D i’=D t’。
第三计算节点聚合得到第一聚合参数值D-all后,会将第一聚合参数值D-all分别向每个第一计算节点和第二计算节点发送,使得每个第一计算节点得到D i’、第二计算节点得到D t’,D-all=D i’=D t’。
608、每个第一计算节点各自将D i更新为D i’,并在保持每个第一计算节点上第三模型的模型参数值不变(即固定D i’)的情况下,在本地再训练第一模型和第二模型,并将各自训练得到的第一模型的模型参数值G i’和第二模型的模型参数值T i’分别向第三计算节点发送。
每个第一计算节点将D i更新为D i’,并在固定D i’的情况下,在本地再训练第一模型和第二模型的过程与上述步骤407类似,具体请参阅上述步骤407,此处不予赘述。
609、将G i’和T i’分别作为新的G i和T i(即将G 1’、……、G n’作为新的G 1、……、G n,将T 1’、……、T n’作为新的T 1、……、T n),重复执行上述步骤602-608,直至达到迭代终止条件。
每个第一计算节点将各自的G i’和T i’分别作为新的G i和T i重复执行上述步骤602-608的过程与上述步骤408类似,具体请参阅上述步骤408,此处不予赘述。
同样地,本申请实施例中,步骤602-608就是对抗训练的过程,不断重复这个对抗训练的过程直到达到迭代终止条件,在最后多个源域和目标域的特征就基本被对齐。
610、第三计算节点将最后一次更新得到的T i(即T 1、……、T n)进行聚合,以得到第三聚合参数值Ts-all,且将最后一次更新得到的G i(即G 1’、……、G n’)进行聚合,以得到最后一次更新得到的G-all(可称为G all-new),并将Ts-all和G all-new向第二计算节点发送,Ts-all=T t-new
需要说明的是,在本申请的一些实施方式中,在达到迭代终止条件后,第三计算节点在步骤608中会接收到每个第一计算节点各自发送的最后一次更新得到的模型参数值G i(可称为G i-new)和模型参数值T i(可称为T i-new),因此,第三计算节点会将各个T i-new(即T 1-new、……、T n-new)进行聚合,以得到第三聚合参数值Ts-all,且会将最后一次更新得到的G i进行聚合,以得到最后一次更新得到的G-all(即G all-new),并将该Ts-all和G all-new向第二计算节点发送,使得第二计算节点得到T t-new和G all-new,Ts-all=T t-new
611、第二计算节点使用G all-new和T t-new执行目标任务。
第二计算节点在得到G all-new和T t-new之后,会将该G all-new和T t-new分别作为第二计算节点上第一模型和第二模型的最终模型参数值(因为源域和目标域的特征已经对齐了,这样做才有意义),并根据第二计算节点上的该第一模型和该第二模型执行目标任务,如,目标任务可以是目标检测任务、分类任务、语音识别任务、语义分割任务等,只要是神经网络可执行的任务,都可作为本申请第二计算节点能够执行的目标任务。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤610和步骤611。
综上所述,本申请实施例在兼顾域对齐和用户数据隐私的情况下,实现了对模型的协同训练,提高了模型的性能。此外,本申请实施例是通过利用多个源域设备(即多个第一计算节点)的本地数据(每个第一计算节点上都有各自第一数据集)和标签来辅助一个本地数据无标签或少标签的目标域设备(即第二计算节点)提升模型性能,本申请实施例中 由于存在多个源域,模型的模型参数取值可基于多种类型的训练数据得到,因此训练后的模型精度更高。
需要说明的是,在本申请上述图6对应的实施例中,对抗训练过程仅是在第一计算节点上进行,实际上,在本申请的一些实施方式中,为了更好的提升模型性能,也可以在第二计算节点上进行对抗训练过程,因此,本申请实施例还提供了一种基于联邦迁移学习的模型训练方法,具体请参阅图7,图7为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图,图7对应的实施例与上述图6对应的实施例的区别在于,图7对应的实施例在第二计算节点部分也加入了对抗训练部分。具体地,该方法可以包括如下步骤:
701、每个第一计算节点各自在本地训练第一模型和第二模型,并将各自训练得到的第一模型的模型参数值G i和第二模型的模型参数值T i向第三计算节点发送,
Figure PCTCN2022082380-appb-000027
702、第三计算节点对所有G i(即G 1、……、G n)进行聚合,以得到第二聚合参数值G-all。
703、第三计算节点将G-all向第二计算节点发送,G-all=G t
704、每个第一计算节点在保持各自的G i和T i不变(可称为固定G i和T i)的情况下,在本地各自训练第三模型,以得到该第三模型的模型参数值D i,并各自将D i向第三计算节点发送。
705、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t,并将D t向第三计算节点发送。
706、第三计算节点将所有D i(即D 1、……、D n)和D t进行聚合,以得到第一聚合参数值D-all。
707、第三计算节点将D-all分别向每个第一计算节点和第二计算节点发送,使得每个第一计算节点各自得到D i’、第二计算节点得到D t’,D-all=D i’=D t’。
708、每个第一计算节点各自将D i更新为D i’,并在保持每个第一计算节点上第三模型的模型参数值不变(即固定D i’)的情况下,在本地再训练第一模型和第二模型,并将各自训练得到的第一模型的模型参数值G i’和第二模型的模型参数值T i’分别向第三计算节点发送。
步骤701-708与上述步骤601-608类似,具体请参阅上述步骤601-608,此处不予赘述。
709、第二计算节点将D t更新为D t’,并在保持第二计算节点上第三模型的模型参数值不变(即固定D t’)的情况下,在本地训练第一模型和第二模型,并将训练得到的第一模型的模型参数值G t’和第二模型的模型参数值T t’向第三计算节点发送。
步骤709与上述步骤508类似,具体请参阅上述步骤508,此处不予赘述。
710、第三计算节点对所有G i’(即G 1’、……、G n’)和G t’进行聚合,以得到更新的第二聚合参数值G-all’。
第三计算节点从第一计算节点接收到了模型参数值G t’和模型参数值T t’,接下来,第三计算节点会进一步对所有G i’(即G 1’、……、G n’)和G t’再进行聚合,以得到更新的 第二聚合参数值G-all’。
711、将G-all’、G i’和T i’分别作为新的G-all、G i和T i(即将G-all’作为新的G-all,将G 1’、……、G n’作为新的G 1、……、G n,将T 1’、……、T n’作为新的T 1、……、T n),重复执行上述步骤703-710,直至达到迭代终止条件。
712、第三计算节点将每个第一计算节点最后一次更新得到的T i与最后一次更新得到的T t’进行聚合,以得到第四聚合参数值T-all,T-all=T t-new
713、第三计算节点将第四聚合参数值T-all和最后一次更新得到的G-all’(可称为G all-new)向第二计算节点发送。
714、第二计算节点使用最后一次更新得到的G-all’(即G all-new)和T-all(即T t-new)执行目标任务。
步骤711-714与上述步骤510-513类似,具体请参阅上述步骤510-513,此处不予赘述。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤712至步骤714。
在本申请上述实施方式中,在作为目标域设备的第二计算节点上也引入了对抗训练过程,这在一些特定的任务场景下能够训练出性能更好的模型。
三、第一计算节点为一个,且不部署新的第三计算节点
在本申请上述实施例中,对模型的聚合操作都是在部署的新的第三计算节点上完成,在本申请的一些实施方式中,还可以是由作为目标域的第二计算节点来完成对模型的聚合操作,具体请参阅图8,图8为本申请实施例提供的基于联邦迁移学习的模型训练方法的一种流程示意图,该实施例针对的场景是利用单个源域设备(即单个第一计算节点)的本地数据和标签来辅助一个本地数据无标签或少标签的目标域设备(即第二计算节点)提升模型性能。具体地,该方法可以包括如下步骤:
801、第一计算节点在本地训练第一模型和第二模型,以得到训练后的第一模型的模型参数值G s和第二模型的模型参数值T s
步骤801与上述步骤401类似,不同之处在于步骤801得到训练后的第一模型的模型参数值G s和第二模型的模型参数值T s之后,不再上传至第三计算节点,其余部分请参阅上述步骤401,此处不予赘述。
802、第一计算节点将G s向第二计算节点发送,G s=G t
第一计算节点得到模型参数值G s后,会进一步将G s向第二计算节点发送,G s=G t。在本申请实施例中,第二计算节点上的第一模型可以用该G t做初始化。
803、第一计算节点在保持G s和T s不变(可称为固定G s和T s)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D s,并将D s向第二计算节点发送。
步骤803与上述步骤403类似,不同之处在于步骤803得到训练后的第三模型的模型参数值D s之后,不是上传至第三计算节点,而是向第二计算节点发送,其余部分请参阅上述步骤403,此处不予赘述。
804、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t
步骤804与上述步骤404类似,不同之处在于步骤804得到训练后的第三模型的模型 参数值D t之后,不用上传至第三计算节点,其余部分请参阅上述步骤404,此处不予赘述。
需要说明的是,在本申请实施例中,对步骤803与步骤804的执行顺序不做限定,可以先执行步骤803再执行步骤804,也可以先执行步骤804再执行步骤803,还可以是步骤803和步骤804同时执行,具体此处不做限定。
805、第二计算节点将D s和D t进行聚合,以得到第一聚合参数值D-all。
步骤805与上述步骤405类似,不同之处在于是由第二计算节点将D s和D t进行聚合,以得到第一聚合参数值D-all,其余部分请参阅上述步骤405,此处不予赘述。
806、第二计算节点向第一计算节点发送D-all,使得第一计算节点得到D s’,D-all=D s’。
第二计算节点聚合得到第一聚合参数值D-all后,会将第一聚合参数值D-all向第一计算节点发送,使得第一计算节点得到D s’,D-all=D s’。
807、第一计算节点将D s更新为D s’,并在保持第一计算节点上第三模型的模型参数值不变(即固定D s’)的情况下,在本地再训练第一模型和第二模型,以得到训练后的第一模型的模型参数值G s’和第二模型的模型参数值T s’。
步骤807与上述步骤407类似,不同之处在于步骤807得到训练后的第一模型的模型参数值G s和第二模型的模型参数值T s之后,不再上传至第三计算节点,其余部分请参阅上述步骤407,此处不予赘述。
808、将G s’和T s’分别作为新的G s和T s,重复执行上述步骤802-807,直至达到迭代终止条件。
步骤808与上述步骤408类似,具体请参阅上述步骤408,此处不予赘述。
809、第一计算节点将最后一次更新得到的T s(可称为T s-new)和最后一次更新的得到的G s(可称为G s-new)向第二计算节点发送,T s-new=T t-new,G s-new=G t-new
需要说明的是,在本申请的一些实施方式中,在达到迭代终止条件后,第一计算节点会将最后一次更新得到的T s(即T s-new)和最后一次更新的得到的G s(即G s-new)向第二计算节点发送,使得第二计算节点得到T t-new和G t-new,T s-new=T t-new,G s-new=G t-new
810、第二计算节点使用G t-new和T t-new执行目标任务。
步骤810与上述步骤410类似,具体请参阅上述步骤410,此处不予赘述。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤809和步骤810。
在本申请上述实施方式中,将模型参数值的聚合过程由作为目标域设备的第二计算节点执行,可减少计算节点的参与数量,同时减少了计算节点之间数据交互的时间,提高了模型训练的效率。
需要说明的是,在本申请上述图8对应的实施例中,对抗训练过程仅是在第一计算节点上进行,实际上,在本申请的一些实施方式中,为了更好的提升模型性能,也可以在第二计算节点上进行对抗训练过程,因此,本申请实施例还提供了一种基于联邦迁移学习的模型训练方法,具体请参阅图9,图9为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图,图9对应的实施例与上述图8对应的实施例的区别在于,图9对应的实施例在第二计算节点部分也加入了对抗训练部分。具体地,该方法可以包括如下步骤:
901、第一计算节点在本地训练第一模型和第二模型,以得到训练后的第一模型的模型参数值G s和第二模型的模型参数值T s
902、第一计算节点将G s向第二计算节点发送,G s=G t
903、第一计算节点在保持G s和T s不变(可称为固定G s和T s)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D s,并将D s向第二计算节点发送。
904、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t
905、第二计算节点将D s和D t进行聚合,以得到第一聚合参数值D-all。
906、第二计算节点将D-all向第一计算节点,使得第一计算节点得到D s’,D-all=D s’。
907、第一计算节点将D s更新为D s’,并在保持第一计算节点上第三模型的模型参数值不变(即固定D s’)的情况下,在本地再训练第一模型和第二模型,以得到训练后的第一模型的模型参数值G s’和第二模型的模型参数值T s’,并将G s’向第二计算节点发送。
步骤901-907与上述步骤801-807类似,具体请参阅上述步骤801-807,此处不予赘述。不同之处在于,步骤807中第一计算节点还需将G s’向第二计算节点发送。
908、第二计算节点将D t更新为D t’(D t’=D-all),并在保持第二计算节点上第三模型的模型参数值不变(即固定D t’)的情况下,在本地训练第一模型和第二模型,以得到训练后的第一模型的模型参数值G t’和第二模型的模型参数值T t’。
步骤908与上述步骤508类似,不同之处在于步骤908得到训练后的第一模型的模型参数值G s和第二模型的模型参数值T s之后,不再上传至第三计算节点,且在该步骤中,D t’=D-all,其余部分请参阅上述步骤508,此处不予赘述。
909、第二计算节点将G s’和G t’进行聚合,以得到第二聚合参数值G-all。
步骤909与上述步骤509类似,不同之处在于步骤909中是由第二计算节点将G s’和G t’进行聚合,以得到第二聚合参数值G-all,其余部分请参阅上述步骤509,此处不予赘述。
910、将G-all和T s’分别作为新的G s和T s,重复执行上述步骤902-909,直至达到迭代终止条件。
步骤910与上述步骤510类似,具体请参阅上述步骤510,此处不予赘述。
911、第一计算节点将最后一次更新得到的T s(可称为T s-new)向第二计算节点发送。
912、第二计算节点将T s-new与最后一次更新得到的T t’(可称为T t-new)进行聚合,以得到第四聚合参数值T-all。
步骤912与上述步骤511类似,不同之处在于步骤912中是由第二计算节点将最后一次更新得到的T s(即T s-new)与最后一次更新得到的T t’(即T t-new)进行聚合,以得到第四聚合参数值T-all,其余部分请参阅上述步骤511,此处不予赘述。
913、第二计算节点使用最后一次更新得到的G-all和T-all执行目标任务。
步骤913与上述步骤513类似,具体请参阅上述步骤513,此处不予赘述。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤912和步骤913。
在本申请上述实施方式中,将模型参数值的聚合过程由作为目标域设备的第二计算节 点执行,可减少计算节点的参与数量,同时减少了计算节点之间数据交互的时间,提高了模型训练的效率。此外,还在作为目标域设备的第二计算节点上也引入了对抗训练过程,这在一些特定的任务场景下能够训练出性能更好的模型。
四、第一计算节点为多个,且不部署新的第三计算节点
上述图8、图9对应的实施例阐述的都是第一计算节点为一个,且不部署新的第三计算节点的情况,在本申请实施例中,将继续介绍第一计算节点为多个,且不部署新的第三计算节点的情况下基于联邦迁移学习的模型训练方法,具体请参阅图10,图10为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图,该实施例针对的场景是利用多个源域设备(即多个第一计算节点)的本地数据(每个第一计算节点上都有各自第一数据集)和标签来辅助一个本地数据无标签或少标签的目标域设备(即第二计算节点)提升模型性能。在本申请实施例中,假定第一计算节点的数量为n,n≥2。具体地,该方法可以包括如下步骤:
1001、每个第一计算节点各自在本地训练第一模型和第二模型,以得到各自训练后的第一模型的模型参数值G i和第二模型的模型参数值T i
Figure PCTCN2022082380-appb-000028
每个第一计算节点在本地训练第一模型和第二模型的过程与上述步骤401类似,不同之处在于步骤1001得到训练后的第一模型的模型参数值G i和第二模型的模型参数值T i之后,不再上传至第三计算节点,其余部分请参阅上述步骤401,此处不予赘述。
1002、每个第一计算节点将各自得到的G i向第二计算节点发送。
每个第一计算节点得到各自的模型参数值G i后,会进一步将G i向第二计算节点发送。这样第二计算节点可接收到G 1、……、G n
1003、第二计算节点对所有G i(即G 1、……、G n)进行聚合,以得到第二聚合参数值G-all,G-all=G t
步骤1003与上述步骤602类似,不同之处在于是由第二计算节点将G 1、……、G n进行聚合,以得到第二聚合参数值G-all,并将G-all作为第二计算节点上第一模型的模型参数值G t。其余部分请参阅上述步骤602,此处不予赘述。
1004、每个第一计算节点在保持各自的G i和T i不变(可称为固定G i和T i)的情况下,在本地各自训练第三模型,以得到该第三模型的模型参数值D i,并各自将D i向第二计算节点发送。
步骤1004与上述步骤604类似,不同之处在于步骤1004中每个第一计算节点得到各自训练后的第三模型的模型参数值D i之后,不是上传至第三计算节点,而是向第二计算节点发送,其余部分请参阅上述步骤604,此处不予赘述。
1005、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t
步骤1005与上述步骤605类似,不同之处在于步骤1005中第二计算节点得到训练后的第三模型的模型参数值D t之后,不用上传至第三计算节点,其余部分请参阅上述步骤605,此处不予赘述。
1006、第二计算节点将所有D i(即D 1、……、D n)和D t进行聚合,以得到第一聚合 参数值D-all。
步骤1006与上述步骤606类似,不同之处在于是由第二计算节点将D 1、……、D n和D t进行聚合,以得到第一聚合参数值D-all,其余部分请参阅上述步骤606,此处不予赘述。
1007、第二计算节点将D-all分别向每个第一计算节点发送,使得每个第一计算节点各自得到D i’,D-all=D i’。
第二计算节点聚合得到第一聚合参数值D-all后,会将第一聚合参数值D-all向每个第一计算节点发送,使得每个第一计算节点各自得到D i’,D-all=D i’。
1008、每个第一计算节点各自将D i更新为D i’,并在保持每个第一计算节点上第三模型的模型参数值不变(即固定D i’)的情况下,在本地再训练第一模型和第二模型,各自得到训练后的第一模型的模型参数值G i’和第二模型的模型参数值T i’,并将得到的G i’和T i’向第二计算节点发送。
每个第一计算节点将D i更新为D i’,并在固定D i’的情况下,在本地再训练第一模型和第二模型的过程与上述步骤608类似,具体请参阅上述步骤608,此处不予赘述,之后,各个第一计算节点将各自得到的G i’和T i’向第二计算节点发送。
1009、第二计算节点将所有G i’(即G 1’、……、G n’)聚合,以得到更新的第二聚合参数值G-all’,G-all’=G-all=G t
第二计算节点将更新后的G 1’、……、G n’进行聚合,以得到更新后的第二聚合参数值G-all’,并将G-all’作为第二计算节点上第一模型的模型参数值G t,即G-all’=G-all=G t
1010、将G-all’、G i’和T i’分别作为新的G t、G i和T i(即将G-all’作为新的G t,将G 1’、……、G n’作为新的G 1、……、G n,将T 1’、……、T n’作为新的T 1、……、T n),重复执行上述步骤1004-1009,直至达到迭代终止条件。
1011、每个第一计算节点将最后一次更新得到的T i向第二计算节点发送。
需要说明的是,在本申请的一些实施方式中,在达到迭代终止条件后,每个第一计算节点会将最后一次更新得到的T i(即最后一次更新的T 1、……、T n)各自向第二计算节点发送。
1012、第二计算节点将最后一次更新得到的所有T i进行聚合,以得到第三聚合参数值Ts-all,Ts-all=T t-new
第二计算节点接收到每个第一计算节点各自发送的最后一次更新的T i(即最后一次更新的T 1、……、T n)后,会对最后一次更新的T 1、……、T n进行聚合,以得到第三聚合参数值Ts-all,Ts-all=T t-new
需要注意的是,由于第二计算节点会在步骤1008接收到由每个第一计算节点各自发送的最后一次更新的G i’(即最后一次更新的G 1’、……、G n’),第二计算节点会在步骤1009中对最后一次更新的G 1’、……、G n’进行聚合,以得到最后一次更新的第二聚合参数值G-all(可称为G all-new),G all-new=G t-new,因此在步骤1011中,每个第一计算节点只需将最后一次更新得到的T i向第二计算节点发送即可。
1013、第二计算节点使用最后一次更新得到的G-all(即G t-new)和T t-new执行目标任务。
步骤1013与上述步骤611类似,具体请参阅上述步骤611,此处不予赘述。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤1011至步骤1013。
综上所述,本申请实施例在兼顾域对齐和用户数据隐私的情况下,实现了对模型的协同训练,提高了模型的性能。此外,本申请实施例是通过利用多个源域设备(即多个第一计算节点)的本地数据(每个第一计算节点上都有各自第一数据集)和标签来辅助一个本地数据无标签或少标签的目标域设备(即第二计算节点)提升模型性能,本申请实施例中由于存在多个源域,模型的模型参数取值可基于多种类型的训练数据得到,因此训练后的模型精度更高。此外,本申请实施例还将模型参数值的聚合过程由作为目标域设备的第二计算节点执行,不仅可减少计算节点的参与数量,同时在一些不具备服务器的应用场景中,可以由目标域设备作为第二计算节点对各个模型参数值进行聚合,还减少了计算节点之间数据交互的时间,提高了模型训练的效率。
需要说明的是,在本申请上述图10对应的实施例中,对抗训练过程仅是在第一计算节点上进行,实际上,在本申请的一些实施方式中,为了更好的提升模型性能,也可以在第二计算节点上进行对抗训练过程,因此,本申请实施例还提供了一种基于联邦迁移学习的模型训练方法,具体请参阅图11,图11为本申请实施例提供的基于联邦迁移学习的模型训练方法的另一流程示意图,图11对应的实施例与上述图10对应的实施例的区别在于,图11对应的实施例在第二计算节点部分也加入了对抗训练部分。具体地,该方法可以包括如下步骤:
1101、每个第一计算节点各自在本地训练第一模型和第二模型,以得到各自训练后的第一模型的模型参数值G i和第二模型的模型参数值T i
Figure PCTCN2022082380-appb-000029
1102、每个第一计算节点将各自得到的G i向第二计算节点发送。
1103、第二计算节点对所有G i(即G 1、……、G n)进行聚合,以得到第二聚合参数值G-all。
1104、每个第一计算节点在保持各自的G i和T i不变(可称为固定G i和T i)的情况下,在本地各自训练第三模型,以得到该第三模型的模型参数值D i,并各自将D i向第二计算节点发送。
1105、第二计算节点在第二计算节点上第一模型的模型参数保持G t不变(可称为固定G t)的情况下,在本地训练第三模型,以得到该第三模型的模型参数值D t
1106、第二计算节点将所有D i(即D 1、……、D n)和D t进行聚合,以得到第一聚合参数值D-all。
1107、第二计算节点将D-all分别向每个第一计算节点发送,使得每个第一计算节点各自得到D i’,D-all=D t’=D i’。
1108、每个第一计算节点各自将D i更新为D i’,并在保持每个第一计算节点上第三模型的模型参数值不变(即固定D i’)的情况下,在本地再训练第一模型和第二模型,各自得到训练后的第一模型的模型参数值G i’和第二模型的模型参数值T i’,并各自将G i’向第二计算节点发送。
步骤1101-1108与上述步骤1001-1008类似,具体请参阅上述步骤1001-1008,此处不 予赘述。
1109、第二计算节点将D t更新为D t’,并在保持第二计算节点上第三模型的模型参数值不变(即固定D t’)的情况下,在本地训练第一模型和第二模型,以得到训练后的第一模型的模型参数值G t’和第二模型的模型参数值T t’。
步骤1109与上述步骤709类似,不同之处在于步骤1109中第二计算节点得到训练后的模型参数值G t’和模型参数值T t’之后,不用上传至第三计算节点,其余部分请参阅上述步骤709,此处不予赘述。
1110、第二计算节点将所有G i’(即G 1’、……、G n’)和G t’聚合,以得到更新的第二聚合参数值G-all’。
步骤1110与上述步骤710类似,不同之处在于步骤1110中是由第二计算节点将G 1’、……、G n’和G t’进行聚合,以得到更新的第二聚合参数值G-all’,其余部分请参阅上述步骤710,此处不予赘述。
1111、将G-all’、G i’和T i’分别作为新的G-all、G i和T i(即将G-all’作为新的G-all,将G 1’、……、G n’作为新的G 1、……、G n,将T 1’、……、T n’作为新的T 1、……、T n),重复执行上述步骤1104-1110,直至达到迭代终止条件。
步骤1111与上述步骤711类似,具体请参阅上述步骤711,此处不予赘述。
1112、每个第一计算节点各自将最后一次更新得到的T i(即最后一次更新得到的T 1、……、T n)向第二计算节点发送。
1113、第二计算节点将每个第一计算节点最后一次更新得到的T i与最后一次更新得到的T t’(即T t-new)进行聚合,以得到第四聚合参数值T-all。
步骤1113与上述步骤712类似,不同之处在于步骤1113中是由第二计算节点将每个第一计算节点最后一次更新得到的T i与最后一次更新得到的T t’进行聚合,以得到第四聚合参数值T-all,其余部分请参阅上述步骤712,此处不予赘述。
1114、第二计算节点使用最后一次更新得到的G-all’(即G t-new)和T-all执行目标任务。
步骤1114与上述步骤714类似,具体请参阅上述步骤714,此处不予赘述。
需要说明的是,在本申请的一些实施方式中,也可以不需要步骤1112至步骤1114。
还需要说明的是,在本申请上述各个实施例中,计算节点可以是各种终端设备或边缘设备,例如,本申请中的计算节点可以包括但不限于:智能电话(如,手机)、膝上型电脑(laptop computer)、个人电脑(personal computer,PC)、平板电脑、板式电脑、超级本、可佩戴装置(如,智能手环、智能手表、智能眼镜、头戴显示设备(head mount display,HMD)等)、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、混合现实(mixed reality,MR)设备、蜂窝电话(cellular phone)、个人数字助理(personal digital assistant,PDA)、数字广播终端等。当然,在以下实施例中,对第一计算和第二计算节点的具体形式不作任何限制。
还需要说明的是,在本申请一些实施例中,第三计算节点一般为服务器,第一计算节点和第二计算节点一般为边缘设备。
经由上述计算节点训练得到的第一模型和第二模型就可以进一步用于推理过程中,以 执行相关的目标任务。具体请参阅图12,图12为本申请实施例提供的数据处理方法的一个流程示意图,该方法具体可以包括如下步骤:
1201、计算机设备获取与目标任务相关的输入数据。
首先,计算机设备获取待处理的输入数据,该输入数据可以是图像数据,也可以是音频数据,还可以是文本数据,具体与待执行的目标任务相关,例如,当目标任务是基于图像的分类任务,那么输入数据就是指用于进行分类的图像数据。
1202、计算机设备通过训练后的第一模型对输入数据进行特征提取,以得到特征图。
之后,计算机设备通过训练后的第一模型对该输入数据进行特征提取,以得到该输入数据对应的特征图。
1203、计算机设备通过训练后的第二模型对特征图进行处理,以得到输出数据。
计算机设备通过训练后的第二模型特征图进行处理,以得到输出数据,其中,该训练后的第一模型的模型参数值和该训练后的第二模型的模型参数值由上述实施例所述的方法训练得到。
需要注意的是,在本申请实施例中,根据目标任务的不同,输入数据的类型也不同,这里对几种典型的目标任务的应用场景进行阐述:
1)目标任务是目标检测任务
目标检测任务一般针对图像中的目标物体的检测,在这种情况下,输入数据一般是指输入的图像,计算机设备首先利用训练后的第一模型对输入的图像进行特征提取,再利用训练后的第二模型对提取的特征图进行目标检测,以得到检测结果,即输出数据是检测结果。
2)目标任务是分类任务
一种实施例中,分类任务可以是针对图像进行的,在这种情况下,输入数据是指输入的图像,计算机设备首先利用训练后的第一模型对输入的图像进行特征提取,再利用训练后的第二模型对提取的特征图进行分类,输出分类结果,即输出数据是图像的分类结果。
另一种实施例中,分类任务除了可以是针对图像进行的,还可以是针对文本或音频,在这种情况下,输入数据就是指对应的文本数据或音频数据,输出数据则是文本的分类结果或音频的分类结果。
以上仅是针对几种场景的目标任务进行说明,在不同的目标任务中,输入数据和输出数据是与该目标任务相关的,具体此处不在示例。
在上述对应实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的计算节点。具体参阅图13,图13为本申请实施例提供的第一计算节点的一种结构示意图,该第一计算节点1300包括:训练模块1301、获取模块1302。其中,训练模块1301,用于在第一计算节点上第一模型(如,特征提取器)的第一模型参数值和第一计算节点上的第二模型(如,分类器)的第二模型参数值保持不变的情况下,采用该第一计算节点上的第一数据集对第一计算节点上的第三模型(如,域鉴别器,也可简称为鉴别器)进行训练,以得到该第一计算节点上的第三模型的第三模型参数值,其中,该第一模型参数值为第一计算节点对第一计算节点上的第一模型训练后得到的模型参数值,第 二模型参数值为第一计算节点对第一计算节点上的第二模型训练后得到的模型参数取值。在本申请实施例中,第一模型用于对输入数据进行特征提取;第二模型用于基于第一模型提取出的特征执行目标任务,例如,目标任务可以是分类任务(如,目标检测任务、语义分割任务、语音识别任务等),也可以是回归任务,此处不做限定;第三模型用于鉴别由第一模型提取出的特征的源域。作为一种示例,根据源域的数据分布可以区分输入数据所位于的计算节点,例如,判断获取到的特征是来自源域设备,还是来自目标域设备。获取模块1302,用于接收第一聚合参数值,该第一聚合参数值是基于第三模型参数值和第四模型参数值得到,该第四模型参数值为第二计算节点上的第三模型的模型参数取值,该第二计算节点上的第三模型由该第二计算节点采用第二计算节点上的数据集(可称为第二数据集)训练得到。训练模块1301,还用于将原来的第三模型参数值更新为该第一聚合参数值,也就是将第一计算节点上第三模型的模型参数取值更新为第一聚合参数值,并在保持第一聚合参数值不变的情况,采用第一数据集对第一计算节点上的第一模型和第一计算节点上的第二模型再进行训练,以得到第一计算节点上的第一模型的第五模型参数值和第一计算节点上的第二模型的第六模型参数值。
在一种可能的设计中,该第一计算节点1300还可以包括迭代模块1303,该迭代模块1303,用于将第五模型参数值和第六模型参数值作为新的第一模型参数值和新的第二模型参数值,触发训练模块1301和获取模块1302重复执行各自的步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
在一种可能的设计中,获取模块1302,具体用于:将第三模型参数值向第二计算节点发送,以使得第二计算节点将第三模型参数值和第四模型参数值进行聚合,以得到第一聚合参数值;之后,接收由该第二计算节点发送的该第一聚合参数值。
在一种可能的设计中,第一计算节点1300还包括发送模块1304,发送模块1304,用于将该第三模型参数值向该第二计算节点发送,以使得该第二计算节点将该第三模型参数值和该第四模型参数值进行聚合,以得到该第一聚合参数值;该获取模块1302,具体用于接收来自该第二计算节点的该第一聚合参数值。
在一种可能的设计中,发送模块1304还可以用于将更新得到的第一模型参数值和更新得到的第二模型参数值向该第二计算节点发送。
在一种可能的设计中,发送模块1304还可以用于将该第三模型参数值向第三计算节点发送,以使得该第三计算节点将该第三模型参数值以及来自该第二计算节点的该第四模型参数值进行聚合,以得到该第一聚合参数值;该获取模块1302,具体用于,以得到接收由该第三计算节点发送的该第一聚合参数值。
在一种可能的设计中,该发送模块1304,还可以用于:将更新得到的第一模型参数值和更新得到的第二模型参数值向第三计算节点发送。
需要说明的是,图13提供的第一计算节点1300中各模块/单元之间的信息交互、执行过程等内容,与本申请中图4至图11对应的方法实施例中第一计算节点执行的步骤基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种计算节点,该计算节点可作为第二计算节点,具体参阅图14,图14为本申请实施例提供的第二计算节点的一种结构示意图,该第二计算节点1400包括:第一获取模块1401和训练模块1402,其中,第一获取模块1401,用于获取第二聚合参数值,该第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到,其中,每个第一计算节点各自采用自身上的第一数据集对自身上的第一模型进行训练,第一数据集可以是有标签的数据集。训练模块1402,用于在第二计算节点上的第一模型的模型参数取值为第二聚合参数值的情况下,采用第二计算节点上的第二数据集对第二计算节点上的第三模型进行训练,以得到第二计算节点上的第三模型的第四模型参数值,其中,第一模型用于对输入数据进行特征提取,第三模型用于鉴别由第一模型提取出的特征的源域。作为一种示例,根据源域的数据分布可以区分输入数据所位于的计算节点,例如,判断获取到的特征是来自源域设备,还是来自目标域设备。
在一种可能的设计中,该第二计算节点1400还可以包括迭代模块1404,该迭代模块1404,用于在第一计算节点基于第一聚合参数值对第一模型参数值和第二模型参数值进行了更新的情况下,触发第一获取模块1401和训练模块1402重复执行各自的步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
在一种可能的设计中,该第二计算节点1400还可以包括第二获取模块1403,该第二获取模块1403用于:获取第一聚合参数值,该第一聚合参数值基于第三模型参数值以及第四模型参数值得到,第三模型参数值为第一计算节点保持第一模型参数值和第二模型参数值不变的情况下采用第一数据集对第一计算节点上的第三模型进行训练得到的模型参数取值,第二模型参数值为第一计算节点采用第一数据集对第一计算节点上的第二模型进行训练得到的模型参数取值,其中,第二模型用于基于第一模型提取出的特征执行目标任务,例如,目标任务可以是分类任务(如,目标检测任务、语义分割任务、语音识别任务等),也可以是回归任务,此处不做限定。
训练模块1402,具体用于将该第四模型参数值更新为该第一聚合参数值,并在保持该第四模型参数值为该第一聚合参数值不变的情况下,采用该第二数据集对该第二计算节点上的第一模型、第二模型进行训练,并更新该第二计算节点上的第一模型的模型参数值和该第二计算节点上的第二模型的模型参数值,即可得到该第二计算节点上的第一模型的第七模型参数值和该第二计算节点上的第二模型的第八模型参数值。
迭代模块1404,具体用于触发该第一获取模块1401、该训练模块1402和该第二获取模块1403重复执行各自的步骤,直至达到迭代终止条件。
在一种可能的设计中,第一获取模块1401,具体用于:接收由一个或多个该第一计算节点各自发送的更新的第一模型参数值,并将该第七模型参数值(即更新后的该第二计算节点上的第一模型的模型参数值)和每个更新的第一模型参数值进行聚合,以得到该第二聚合参数值。
在一种可能的设计中,该第二计算节点1400还包括执行模块1405,该执行模块1405用于:基于更新后的第一模型参数值,更新该第二聚合参数值;接收由一个或多个该第一 计算节点发送的更新得到的第二模型参数值,并将每个该更新得到的第二模型参数值和最后一次更新得到的第八模型参数值(即更新后的该第二计算节点上的第二模型的模型参数值)进行聚合,以得到第四聚合参数值;根据该第二计算节点上的第一模型和该第二计算节点上的第二模型执行目标任务,其中,该第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值,该第二计算节点上的第二模型的模型参数取值为该第四聚合参数值。
在一种可能的设计中,该第一获取模块1401,还用于:向第三计算节点发送该第七模型参数值,并接收来自该第三计算节点的该第二聚合参数值,该第二聚合参数值由该第三计算节点对该第七模型参数值以及来自一个或多个该第一计算节点的每个更新的第一模型参数值聚合得到。
在一种可能的设计中,该执行模块1405,还可以用于:将最后一次更新得到的第八模型参数值向该第三计算节点发送,以使得该第三计算节点对该第八模型参数值以及从一个或多个该第一计算节点各自接收到的每个最后一次更新得到的第二模型参数值进行聚合,以得到第四聚合参数值;接收来自该第三计算节点的该第四聚合参数值;根据该第二计算节点上的第一模型、第二模型执行目标任务,其中,该第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值,该第二计算节点上的第二模型的模型参数取值为该第四聚合参数值。
需要说明的是,图14提供的第二计算节点1400中各模块/单元之间的信息交互、执行过程等内容,与本申请中图4至图11对应的方法实施例中第一计算节点执行的步骤基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种计算机设备,具体参阅图15,图15为本申请实施例提供的计算机设备的一种结构示意图,该计算机设备1500包括:获取模块1501、特征提取模块1502以及处理模块1503,其中,获取模块1501,用于获取与目标任务相关的输入数据;特征提取模块1502,用于通过训练后的第一模型对该输入数据进行特征提取,以得到特征图;处理模块1503,用于通过训练后的第二模型对该特征图进行处理,以得到输出数据,其中,该训练后的第一模型的模型参数值和该训练后的第二模型的模型参数值可由上述图4至图11对应的模型训练方法训练得到。
需要说明的是,图15提供的计算机设备1500中各模块/单元之间的信息交互、执行过程等内容,与本申请中图12对应的方法实施例中计算机设备执行的步骤基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供了一种设备,该设备可作为第一计算节点,也可作为第二计算节点,还可作为计算机设备,具体此处不做限定。请参阅图16,图16是本申请实施例提供的设备的一种结构示意图,为便于说明,仅示出了与本申请实施例相关的部分,具体技术细节未揭示的,请参照本申请实施例方法部分。当该设备1600作为第一计算节点时,该设备1600上可以部署有图13对应实施例中所描述的模块,用于实现图13对应实施例中第一计算节点1300的功能;当该设备1600作为第二计算节点时,该设备1600上可以部署有图14对应实施例中所描述的模块,用于实现图14对应实施例中第二计算节点1400的功能; 当该设备1600作为计算机设备时,该设备1600上可以部署有图15对应实施例中所描述的模块,用于实现图15对应实施例中计算机设备1500的功能。具体的,设备1600由一个或多个服务器实现,设备1600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1622和存储器1632,一个或一个以上存储应用程序1642或数据1644的存储介质1630(例如一个或一个以上海量存储设备)。其中,存储器1632和存储介质1630可以是短暂存储或持久存储。存储在存储介质1630的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对设备1600中的一系列指令操作。更进一步地,中央处理器1622可以设置为与存储介质1630通信,在设备1600上执行存储介质1630中的一系列指令操作。
设备1600还可以包括一个或一个以上电源1626,一个或一个以上有线或无线网络接口1650,一个或一个以上输入输出接口1658,和/或,一个或一个以上操作系统1641,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,当设备1600作为第一计算节点时,中央处理器1622,用于执行图4至图11对应实施例中由第一计算节点执行的步骤。例如,中央处理器1622可以用于:在第一计算节点上第一模型(如,特征提取器)的第一模型参数值和第一计算节点上的第二模型(如,分类器)的第二模型参数值保持不变的情况下,采用该第一计算节点上的第一数据集对第一计算节点上的第三模型(如,域鉴别器,也可简称为鉴别器)进行训练,以得到该第一计算节点上的第三模型的第三模型参数值,其中,该第一模型参数值为第一计算节点对第一计算节点上的第一模型训练后得到的模型参数值,第二模型参数值为第一计算节点对第一计算节点上的第二模型训练后得到的模型参数取值。在本申请实施例中,第一模型用于对输入数据进行特征提取;第二模型用于基于第一模型提取出的特征执行目标任务,例如,目标任务可以是分类任务(如,目标检测任务、语义分割任务、语音识别任务等),也可以是回归任务,此处不做限定;第三模型用于鉴别由第一模型提取出的特征的源域。之后,接收第一聚合参数值,该第一聚合参数值是基于第三模型参数值和第四模型参数值得到,该第四模型参数值为第二计算节点上的第三模型的模型参数值,该第二计算节点上的第三模型由第二计算节点采用第二计算节点上的第二数据集训练得到。之后,将原来的第三模型参数值更新为该第一聚合参数值,也就是将第一计算节点上第三模型的模型参数取值更新为第一聚合参数值,并在保持第一聚合参数值不变的情况,采用第一数据集对第一计算节点上的第一模型和第一计算节点上的第二模型再进行训练,以得到第一计算节点上的第一模型的第五模型参数值和第一计算节点上的第二模型的第六模型参数值。最后,将第五模型参数值和第六模型参数值作为新的第一模型参数值和新的第二模型参数值,触发重复执行上述步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
需要说明的是,中央处理器1622还可以用于执行与本申请中图4至图11对应的方法实施例中由第一计算节点执行的任意一个步骤,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中,当设备1600作为第二计算节点时,中央处理器1622,用于用于执行图4至图11对应实施例中由第二计算节点执行的步骤。例如,中央处理器1622可以用于:获取第二聚合参数值,该第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到,其中,每个第一计算节点各自采用自身上的第一数据集对自身上的第一模型进行训练,第一数据集可以是有标签的数据集。之后,在第二计算节点上的第一模型的模型参数取值为第二聚合参数值的情况下,采用第二计算节点上的第二数据集对第二计算节点上的第三模型进行训练,以得到第二计算节点上的第三模型的第四模型参数值,其中,第一模型用于对输入数据进行特征提取,第三模型用于鉴别由第一模型提取出的特征的源域。最后触发重复执行上述步骤,直至达到迭代终止条件,该迭代终止条件可以是达到预设的训练轮次,也可以是使得损失函数收敛,还可以是其他设定的训练终止条件,具体此处不做限定。
需要说明的是,中央处理器1622还可以用于执行与本申请中图4至图11对应的方法实施例中由第二计算节点执行的任意一个步骤,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中,当设备1600作为计算机设备时,中央处理器1622,用于用于执行图12对应实施例中由计算机设备执行的步骤。例如,中央处理器1622可以用于:获取待处理的输入数据,该输入数据与待执行的目标任务相关,例如,当目标任务是分类任务,那么输入数据就是指用于进行分类的数据。之后,通过训练后的第一模型对该输入数据进行特征提取,以得到特征图,并通过训练后的第二模型对特征图进行处理,以得到输出数据,其中,该训练后的第一模型的模型参数值和该训练后的第二模型的模型参数值由上述图4至图11中任一项所述的方法训练得到。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施 例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。

Claims (34)

  1. 一种基于联邦迁移学习的模型训练方法,其特征在于,包括:
    在第一计算节点上的第一模型参数值和第二模型参数值保持不变的情况下,所述第一计算节点采用所述第一计算节点上的第一数据集训练所述第一计算节点上的第三模型,以得到所述第一计算节点上的第三模型参数值,所述第一模型参数值、所述第二模型参数值、所述第三模型参数值分别为所述第一计算节点上的第一模型、第二模型、第三模型的模型参数取值,其中,所述第一模型用于对输入数据进行特征提取,所述第二模型用于基于所述第一模型提取出的特征执行目标任务,所述第三模型用于鉴别由所述第一模型提取出的特征的源域;
    所述第一计算节点接收第一聚合参数值,所述第一聚合参数值基于所述第三模型参数值以及第四模型参数值得到,所述第四模型参数值为第二计算节点上的第三模型的模型参数取值,所述第二计算节点上的第三模型由所述第二计算节点采用所述第二计算节点上的第二数据集训练得到;
    所述第一计算节点将所述第三模型参数值更新为所述第一聚合参数值,并在保持所述第三模型参数值为所述第一聚合参数值不变的情况下,采用所述第一数据集对所述第一计算节点上的第一模型和第二模型再进行训练,并更新所述第一模型参数值和所述第二模型参数值。
  2. 根据权利要求1所述的方法,其特征在于,所述第一计算节点接收第一聚合参数值,所述第一聚合参数值基于所述第三模型参数值以及第四模型参数值得到包括:
    所述第一计算节点将所述第三模型参数值向所述第二计算节点发送,以使得所述第二计算节点将所述第三模型参数值和所述第四模型参数值进行聚合,以得到所述第一聚合参数值;
    所述第一计算节点接收来自所述第二计算节点的所述第一聚合参数值。
  3. 根据权利要求1-2中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一计算节点将更新得到的第一模型参数值和更新得到的第二模型参数值向所述第二计算节点发送。
  4. 根据权利要求1所述的方法,其特征在于,所述第一计算节点接收第一聚合参数值,所述第一聚合参数值基于所述第三模型参数值以及第四模型参数值得到包括:
    所述第一计算节点将所述第三模型参数值向第三计算节点发送,以使得所述第三计算节点将所述第三模型参数值以及来自所述第二计算节点的所述第四模型参数值进行聚合,以得到所述第一聚合参数值;
    所述第一计算节点接收来自所述第三计算节点的所述第一聚合参数值。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    所述第一计算节点将更新得到的第一模型参数值和更新得到的第二模型参数值向所述第三计算节点发送。
  6. 一种基于联邦迁移学习的模型训练方法,其特征在于,包括:
    第二计算节点获取第二聚合参数值,所述第二聚合参数值基于一个或多个第一计算节 点上各自训练后的第一模型的第一模型参数值得到,其中,每个第一计算节点上的第一模型由所述第一计算节点采用所述第一计算节点的第一数据集进行训练,所述第一模型用于对输入数据进行特征提取;
    在所述第二计算节点上的第一模型的模型参数取值为所述第二聚合参数值的情况下,所述第二计算节点采用所述第二计算节点上的第二数据集对所述第二计算节点上的第三模型进行训练,以得到所述第二计算节点上的第三模型的第四模型参数值,其中,所述第三模型用于鉴别由所述第一模型提取出的特征的源域。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    所述第二计算节点获取第一聚合参数值,所述第一聚合参数值基于第三模型参数值以及所述第四模型参数值得到,所述第三模型参数值为所述第一计算节点采用所述第一数据集对所述第一计算节点上的第三模型进行训练得到的模型参数取值;
    所述第二计算节点将所述第四模型参数值更新为所述第一聚合参数值,并在保持所述第四模型参数值为所述第一聚合参数值不变的情况下,采用所述第二数据集对所述第二计算节点上的第一模型、第二模型进行训练,并更新所述第二计算节点上的第一模型的模型参数值和所述第二计算节点上的第二模型的模型参数值。
  8. 根据权利要求7所述的方法,其特征在于,所述第二计算节点获取第一聚合参数值,所述第一聚合参数值基于第三模型参数值以及所述第四模型参数值得到包括:
    所述第二计算节点接收由一个或多个所述第一计算节点各自发送的第三模型参数值;
    所述第二计算节点将所述第四模型参数值和每个所述第三模型参数值进行聚合,以得到所述第一聚合参数值。
  9. 根据权利要求7或8所述的方法,其特征在于,所述第二计算节点获取第二聚合参数值,所述第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到包括:
    所述第二计算节点接收由一个或多个所述第一计算节点各自发送的第一模型参数值,并将每个所述第一模型参数值和所述第二计算节点上的第一模型的模型参数值进行聚合,以得到所述第二聚合参数值。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    所述第二计算节点基于来自一个或多个所述第一计算节点的最后一次更新的所述第一模型参数值,最后一次更新所述第二聚合参数值;
    所述第二计算节点接收由一个或多个所述第一计算节点发送的更新后的第二模型参数值,并将每个所述更新后的第二模型参数值和更新后的所述第二计算节点上的第二模型的模型参数值进行聚合,以得到第四聚合参数值;
    所述第二计算节点根据所述第二计算节点上的第一模型、第二模型执行目标任务,其中,所述第二计算节点上的第一模型的模型参数取值为所述最后一次更新得到的第二聚合参数值,所述第二计算节点上的第二模型的模型参数取值为所述第四聚合参数值。
  11. 根据权利要求6所述的方法,其特征在于,所述第二计算节点获取第二聚合参数值,所述第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模 型参数值得到包括:
    所述第二计算节点接收由每个第一计算节点各自发送的第一模型参数值,并对接收到的每个所述第一模型参数值进行聚合,以得到所述第二聚合参数值。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述第二计算节点基于来自一个或多个所述第一计算节点的最后一次更新的所述第一模型参数值,最后一次更新所述第二聚合参数值;所述第二计算节点接收由每个第一计算节点各自发送的最后一次更新得到的第二模型参数值,对每个所述最后一次更新得到的第二模型参数值进行聚合,以得到第三聚合参数值;
    所述第二计算节点根据所述第二计算节点上的第一模型、第二模型执行目标任务,其中,所述第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值,所述第二计算节点上的第二模型的模型参数取值为所述第三聚合参数值。
  13. 根据权利要求7所述的方法,其特征在于,所述第二计算节点获取第一聚合参数值,所述第一聚合参数值基于第三模型参数值以及所述第四模型参数值得到包括:
    所述第二计算节点向第三计算节点发送所述第四模型参数值;
    所述第二计算节点接收来自所述第三计算节点的第一聚合参数值,所述第一聚合参数值由所述第三计算节点对来自一个或多个所述第一计算节点的每个第三模型参数值和所述第四模型参数值聚合得到。
  14. 根据权利要求7所述的方法,其特征在于,所述第二计算节点获取第二聚合参数值,所述第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到包括:
    所述第二计算节点向第三计算节点发送更新后的所述第二计算节点上的第一模型的模型参数值;
    所述第二计算节点接收来自所述第三计算节点的所述第二聚合参数值,所述第二聚合参数值由所述第三计算节点对所述更新后的所述第二计算节点上的第一模型的模型参数值以及来自一个或多个所述第一计算节点的每个更新的第一模型参数值聚合得到。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    所述第二计算节点将更新后的所述第二计算节点上的第二模型的模型参数值向所述第三计算节点发送,以使得所述第三计算节点对所述更新后的所述第二计算节点上的第二模型的模型参数值以及从一个或多个所述第一计算节点各自接收到的每个最后一次更新得到的第二模型参数值进行聚合,以得到第四聚合参数值;
    所述第二计算节点接收来自所述第三计算节点的所述第四聚合参数值;
    所述第二计算节点根据所述第二计算节点上的第一模型、第二模型执行目标任务,其中,所述第二计算节点上的第一模型的模型参数取值为来自所述第三计算节点的最后一次更新得到的第二聚合参数值,所述第二计算节点上的第二模型的模型参数取值为所述第四聚合参数值。
  16. 根据权利要求6所述的方法,其特征在于,所述第二计算节点获取第二聚合参数值,所述第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模 型参数值得到包括:
    所述第二计算节点接收来自第三计算节点的所述第二聚合参数值,所述第二聚合参数值由所述第三计算节点对来自于一个或多个所述第一计算节点的每个第一模型参数值聚合得到。
  17. 根据权利要求16所述的方法,其特征在于,所述方法还包括:
    所述第二计算节点接收来自所述第三计算节点的第三聚合参数值,并根据所述第二计算节点上的第一模型、第二模型执行目标任务,所述第三聚合参数值由所述第三计算节点对从每个第一计算节点各自接收到的最后一次更新得到的第二模型参数值聚合得到,其中,所述第二计算节点上的第一模型的模型参数取值为最后一次更新得到的第二聚合参数值,所述第二计算节点上的第二模型的模型参数取值为所述第三聚合参数值。
  18. 一种数据处理方法,其特征在于,包括:
    获取与目标任务相关的输入数据;
    通过训练后的第一模型对所述输入数据进行特征提取,以得到特征图;
    通过训练后的第二模型对所述特征图进行处理,以得到输出数据,其中,所述训练后的第一模型的模型参数值和所述训练后的第二模型的模型参数值由权利要求1-17中任一项所述的方法训练得到。
  19. 根据权利要求18所述的方法,其特征在于,所述输入数据包括如下任意一项:
    图像数据、音频数据或文本数据。
  20. 一种计算节点,其特征在于,所述计算节点作为第一计算节点,包括:
    训练模块,用于在所述第一计算节点上的第一模型参数值和第二模型参数值保持不变的情况下,采用所述第一计算节点上的第一数据集训练所述第一计算节点上的第三模型,以得到所述第一计算节点上的第三模型参数值,所述第一模型参数值、所述第二模型参数值、所述第三模型参数值分别为所述第一计算节点上的第一模型、第二模型、第三模型的模型参数取值,其中,所述第一模型用于对输入数据进行特征提取,所述第二模型用于基于所述第一模型提取出的特征执行目标任务,所述第三模型用于鉴别由所述第一模型提取出的特征的源域;
    获取模块,用于接收第一聚合参数值,所述第一聚合参数值基于所述第三模型参数值以及第四模型参数值得到,所述第四模型参数值为第二计算节点上的第三模型的模型参数取值,所述第二计算节点上的第三模型由所述第二计算节点采用所述第二计算节点上的第二数据集训练得到;
    所述训练模块,还用于将所述第三模型参数值更新为所述第一聚合参数值,并在保持所述第三模型参数值为所述第一聚合参数值不变的情况下,采用所述第一数据集对所述第一计算节点上的第一模型和第二模型再进行训练,并更新所述第一模型参数值和所述第二模型参数值。
  21. 根据权利要求20所述的第一计算节点,其特征在于,所述第一计算节点还包括发送模块,所述发送模块,用于将所述第三模型参数值向所述第二计算节点发送,以使得所述第二计算节点将所述第三模型参数值和所述第四模型参数值进行聚合,以得到所述第一 聚合参数值;
    所述获取模块,具体用于接收来自所述第二计算节点的所述第一聚合参数值。
  22. 根据权利要求20-21中任一项所述的第一计算节点,其特征在于,所述第一计算节点还包括发送模块,所述发送模块,用于:
    将更新得到的第一模型参数值和更新得到的第二模型参数值向所述第二计算节点发送。
  23. 根据权利要求20所述的第一计算节点,其特征在于,所述第一计算节点还包括发送模块,所述发送模块,用于将所述第三模型参数值向第三计算节点发送,以使得所述第三计算节点将所述第三模型参数值以及来自所述第二计算节点的所述第四模型参数值进行聚合,以得到所述第一聚合参数值;
    所述获取模块,具体用于接收来自所述第三计算节点的所述第一聚合参数值。
  24. 根据权利要求23所述的第一计算节点,其特征在于,所述发送模块,还用于:
    将更新得到的第一模型参数值和更新得到的第二模型参数值向所述第三计算节点发送。
  25. 一种计算节点,其特征在于,所述计算节点作为第二计算节点,包括:
    第一获取模块,用于获取第二聚合参数值,所述第二聚合参数值基于一个或多个第一计算节点上各自训练后的第一模型的第一模型参数值得到,其中,每个第一计算节点上的第一模型由所述第一计算节点采用所述第一计算节点的第一数据集进行训练,所述第一模型用于对输入数据进行特征提取;
    训练模块,用于在所述第二计算节点上的第一模型的模型参数取值为所述第二聚合参数值的情况下,采用所述第二计算节点上的第二数据集对所述第二计算节点上的第三模型进行训练,以得到所述第二计算节点上的第三模型的第四模型参数值,其中,所述第三模型用于鉴别由所述第一模型提取出的特征的源域。
  26. 根据权利要求25所述的第二计算节点,其特征在于,所述第二计算节点还包括第二获取模块,所述第二获取模块,用于获取第一聚合参数值,所述第一聚合参数值基于第三模型参数值以及所述第四模型参数值得到,所述第三模型参数值为所述第一计算节点采用所述第一数据集对所述第一计算节点上的第三模型进行训练得到的模型参数取值;
    所述训练模块,还用于将所述第四模型参数值更新为所述第一聚合参数值,并在保持所述第四模型参数值为所述第一聚合参数值不变的情况下,采用所述第二数据集对所述第二计算节点上的第一模型、第二模型进行训练,并更新所述第二计算节点上的第一模型的模型参数值和所述第二计算节点上的第二模型的模型参数值。
  27. 根据权利要求26所述的第二计算节点,所述第一获取模块,具体用于:
    接收由一个或多个所述第一计算节点各自发送的第一模型参数值,并将每个所述第一模型参数值和所述第二计算节点上的第一模型的模型参数值进行聚合,以得到所述第二聚合参数值。
  28. 根据权利要求27所述的第二计算节点,其特征在于,所述第二计算节点还包括执行模块,所述执行模块,用于:
    基于来自一个或多个所述第一计算节点的最后一次更新的所述第一模型参数值,最后一次更新所述第二聚合参数值;
    接收由一个或多个所述第一计算节点发送的更新后的第二模型参数值,并将每个所述更新后的第二模型参数值和更新后的所述第二计算节点上的第二模型的模型参数值进行聚合,以得到第四聚合参数值;
    根据所述第二计算节点上的第一模型、第二模型执行目标任务,其中,所述第二计算节点上的第一模型的模型参数取值为所述最后一次更新得到的第二聚合参数值,所述第二计算节点上的第二模型的模型参数取值为所述第四聚合参数值。
  29. 根据权利要求26所述的第二计算节点,其特征在于,所述第一获取模块,具体还用于:
    向第三计算节点发送更新后的所述第二计算节点上的第一模型的模型参数值;
    接收来自所述第三计算节点的所述第二聚合参数值,所述第二聚合参数值由所述第三计算节点对所述更新后的所述第二计算节点上的第一模型的模型参数值以及来自一个或多个所述第一计算节点的每个更新的第一模型参数值聚合得到。
  30. 根据权利要求29所述的第二计算节点,其特征在于,还包括执行模块,所述执行模块,用于:
    将更新后的所述第二计算节点上的第二模型的模型参数值向所述第三计算节点发送,以使得所述第三计算节点对所述更新后的所述第二计算节点上的第二模型的模型参数值以及从一个或多个所述第一计算节点各自接收到的每个最后一次更新得到的第二模型参数值进行聚合,以得到第四聚合参数值;
    接收来自所述第三计算节点的所述第四聚合参数值;
    根据所述第二计算节点上的第一模型、第二模型执行目标任务,其中,所述第二计算节点上的第一模型的模型参数取值为来自所述第三计算节点的最后一次更新得到的第二聚合参数值,所述第二计算节点上的第二模型的模型参数取值为所述第四聚合参数值。
  31. 一种计算机设备,其特征在于,包括:
    获取模块,用于获取与目标任务相关的输入数据;
    特征提取模块,用于通过训练后的第一模型对所述输入数据进行特征提取,以得到特征图;
    处理模块,用于通过训练后的第二模型对所述特征图进行处理,以得到输出数据,其中,所述训练后的第一模型的模型参数值和所述训练后的第二模型的模型参数值由权利要求1-17中任一项所述的方法训练得到。
  32. 一种计算节点,包括处理器和存储器,所述处理器与所述存储器耦合,其特征在于,
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述计算节点执行如权利要求1-19中任一项所述的方法。
  33. 一种计算机可读存储介质,包括程序,当其在计算机上运行时,使得计算机执行 如权利要求1-19中任一项所述的方法。
  34. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1-19中任一项所述的方法。
PCT/CN2022/082380 2021-03-31 2022-03-23 一种基于联邦迁移学习的模型训练方法及计算节点 WO2022206498A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110350001.9 2021-03-31
CN202110350001.9A CN113159283B (zh) 2021-03-31 2021-03-31 一种基于联邦迁移学习的模型训练方法及计算节点

Publications (1)

Publication Number Publication Date
WO2022206498A1 true WO2022206498A1 (zh) 2022-10-06

Family

ID=76886083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/082380 WO2022206498A1 (zh) 2021-03-31 2022-03-23 一种基于联邦迁移学习的模型训练方法及计算节点

Country Status (2)

Country Link
CN (1) CN113159283B (zh)
WO (1) WO2022206498A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115987694A (zh) * 2023-03-20 2023-04-18 杭州海康威视数字技术股份有限公司 基于多域联邦的设备隐私保护方法、系统和装置
CN116226784A (zh) * 2023-02-03 2023-06-06 中国人民解放军92578部队 基于统计特征融合的联邦域适应故障诊断方法
CN116340833A (zh) * 2023-05-25 2023-06-27 中国人民解放军海军工程大学 基于改进领域对抗式迁移网络的故障诊断方法
CN117011945A (zh) * 2023-10-07 2023-11-07 之江实验室 动作能力评估方法、装置、计算机设备及可读存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159283B (zh) * 2021-03-31 2023-03-31 华为技术有限公司 一种基于联邦迁移学习的模型训练方法及计算节点
CN114118437B (zh) * 2021-09-30 2023-04-18 电子科技大学 一种面向微云中分布式机器学习的模型更新同步方法
CN113989595B (zh) * 2021-11-05 2024-05-07 西安交通大学 一种基于阴影模型的联邦多源域适应方法及系统
CN114841361A (zh) * 2022-03-26 2022-08-02 华为技术有限公司 一种模型训练方法及其相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020256732A1 (en) * 2019-06-21 2020-12-24 Siemens Aktiengesellschaft Domain adaptation and fusion using task-irrelevant paired data in sequential form
CN112257876A (zh) * 2020-11-15 2021-01-22 腾讯科技(深圳)有限公司 联邦学习方法、装置、计算机设备及介质
CN112288100A (zh) * 2020-12-29 2021-01-29 支付宝(杭州)信息技术有限公司 一种基于联邦学习进行模型参数更新的方法、系统及装置
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备
CN113159283A (zh) * 2021-03-31 2021-07-23 华为技术有限公司 一种基于联邦迁移学习的模型训练方法及计算节点

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11170320B2 (en) * 2018-07-19 2021-11-09 Adobe Inc. Updating machine learning models on edge servers
CN112182595B (zh) * 2019-07-03 2024-03-26 北京百度网讯科技有限公司 基于联邦学习的模型训练方法及装置
CN110516671B (zh) * 2019-08-27 2022-06-07 腾讯科技(深圳)有限公司 神经网络模型的训练方法、图像检测方法及装置
CN111724083B (zh) * 2020-07-21 2023-10-13 腾讯科技(深圳)有限公司 金融风险识别模型的训练方法、装置、计算机设备及介质
CN112348063A (zh) * 2020-10-27 2021-02-09 广东电网有限责任公司电力调度控制中心 一种物联网中基于联邦迁移学习的模型训练方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020256732A1 (en) * 2019-06-21 2020-12-24 Siemens Aktiengesellschaft Domain adaptation and fusion using task-irrelevant paired data in sequential form
CN112434462A (zh) * 2020-10-21 2021-03-02 华为技术有限公司 一种模型的获取方法及设备
CN112257876A (zh) * 2020-11-15 2021-01-22 腾讯科技(深圳)有限公司 联邦学习方法、装置、计算机设备及介质
CN112288100A (zh) * 2020-12-29 2021-01-29 支付宝(杭州)信息技术有限公司 一种基于联邦学习进行模型参数更新的方法、系统及装置
CN113159283A (zh) * 2021-03-31 2021-07-23 华为技术有限公司 一种基于联邦迁移学习的模型训练方法及计算节点

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116226784A (zh) * 2023-02-03 2023-06-06 中国人民解放军92578部队 基于统计特征融合的联邦域适应故障诊断方法
CN115987694A (zh) * 2023-03-20 2023-04-18 杭州海康威视数字技术股份有限公司 基于多域联邦的设备隐私保护方法、系统和装置
CN115987694B (zh) * 2023-03-20 2023-06-27 杭州海康威视数字技术股份有限公司 基于多域联邦的设备隐私保护方法、系统和装置
CN116340833A (zh) * 2023-05-25 2023-06-27 中国人民解放军海军工程大学 基于改进领域对抗式迁移网络的故障诊断方法
CN116340833B (zh) * 2023-05-25 2023-10-13 中国人民解放军海军工程大学 基于改进领域对抗式迁移网络的故障诊断方法
CN117011945A (zh) * 2023-10-07 2023-11-07 之江实验室 动作能力评估方法、装置、计算机设备及可读存储介质
CN117011945B (zh) * 2023-10-07 2024-03-19 之江实验室 动作能力评估方法、装置、计算机设备及可读存储介质

Also Published As

Publication number Publication date
CN113159283B (zh) 2023-03-31
CN113159283A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2022206498A1 (zh) 一种基于联邦迁移学习的模型训练方法及计算节点
WO2022042002A1 (zh) 一种半监督学习模型的训练方法、图像处理方法及设备
WO2021238281A1 (zh) 一种神经网络的训练方法、图像分类系统及相关设备
Singh et al. A deeply coupled ConvNet for human activity recognition using dynamic and RGB images
WO2022012407A1 (zh) 一种用于神经网络的训练方法以及相关设备
CN111695415B (zh) 图像识别方法及相关设备
CN112651511B (zh) 一种训练模型的方法、数据处理的方法以及装置
Basly et al. CNN-SVM learning approach based human activity recognition
WO2020094060A1 (zh) 推荐方法及装置
CN117456297A (zh) 图像生成方法、神经网络的压缩方法及相关装置、设备
CN112396106B (zh) 内容识别方法、内容识别模型训练方法及存储介质
CN113807399B (zh) 一种神经网络训练方法、检测方法以及装置
WO2022111387A1 (zh) 一种数据处理方法及相关装置
WO2023231954A1 (zh) 一种数据的去噪方法以及相关设备
US10970331B2 (en) Determining contextual confidence of images using associative deep learning
CN116862012A (zh) 机器学习模型训练方法、业务数据处理方法、装置及系统
WO2023231753A1 (zh) 一种神经网络的训练方法、数据的处理方法以及设备
CN114064928A (zh) 一种知识图谱的知识推理方法、装置、设备及存储介质
WO2022162677A1 (en) Distributed machine learning with new labels using heterogeneous label distribution
WO2023143570A1 (zh) 一种连接关系预测方法及相关设备
WO2022193412A1 (zh) 基于人体骨架点云交互学习的视频暴力识别方法、系统及介质
US20240127104A1 (en) Information retrieval systems and methods with granularity-aware adaptors for solving multiple different tasks
Jehan et al. An Optimal Reinforced Deep Belief Network for Detection of Malicious Network Traffic
Ali et al. A Novel Approach to Improving Distributed Deep Neural Networks over Cloud Computing.
CN111738403B (zh) 一种神经网络的优化方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778681

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778681

Country of ref document: EP

Kind code of ref document: A1