WO2022262557A1 - 模型训练方法及相关系统、存储介质 - Google Patents

模型训练方法及相关系统、存储介质 Download PDF

Info

Publication number
WO2022262557A1
WO2022262557A1 PCT/CN2022/095802 CN2022095802W WO2022262557A1 WO 2022262557 A1 WO2022262557 A1 WO 2022262557A1 CN 2022095802 W CN2022095802 W CN 2022095802W WO 2022262557 A1 WO2022262557 A1 WO 2022262557A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
parameters
subnet
server
client
Prior art date
Application number
PCT/CN2022/095802
Other languages
English (en)
French (fr)
Inventor
张琦
吴天诚
周培晨
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22824037.0A priority Critical patent/EP4354361A1/en
Publication of WO2022262557A1 publication Critical patent/WO2022262557A1/zh
Priority to US18/540,144 priority patent/US20240119368A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a model training method, a related system, and a storage medium.
  • Horizontal federated learning also known as feature-aligned federated learning, is to take out the client when the data features of each client overlap more (that is, the data features are aligned) and the user overlaps less.
  • the part of the data with the same data characteristics but not exactly the same users is subjected to joint machine learning.
  • the application scenarios of horizontal federated learning are divided into two categories: standard scenarios and disjoint scenarios.
  • the standard scenario means that the labeled data participating in model training is stored on the client, that is, standard supervised training is performed on the client.
  • the disjoint scenario means that the labeled data participating in model training is stored on the server, while a large amount of unlabeled data is stored on the client.
  • Disjoint scenarios are mainly due to the fact that many data labeling tasks need to be processed by personnel with relevant professional knowledge. For example, for mobile phone application software for yoga posture correction, it is difficult for ordinary people to confirm whether their yoga postures are completely correct. Therefore, even if users are willing to mark all the picture data for service providers, service providers can only hire professional yoga practitioners. Label relevant data.
  • the current horizontal federated learning usually assumes that the client has a large amount of labeled data, which can guarantee the use of the horizontal federated learning training mode for model training, but the actual situation is usually that the client has a small amount or even no labeled data. In fact, it is also difficult to require the client to label the data, so it is difficult to use the existing horizontal federated learning training mode to obtain a high-quality model.
  • the application discloses a model training method, a related system, and a storage medium, which can improve the feature extraction capability of the model on unlabeled data.
  • the embodiment of the present application provides a model training system
  • the model training system includes a server and a client, the server maintains data with labels, and the client maintains data with or without labels
  • the client is used to train the first model according to the unlabeled data to obtain the parameters of the first model
  • the client is also used to send the first subclass in the first model to the server network parameters, the first model also includes a second subnet
  • the server is used to train the second model according to the parameters of the first subnet reported by the client and the labeled data, to update the parameters of the second model, the second model includes the first subnet and the third subnet, the third subnet corresponds to the second subnet
  • the server is also used to sending the updated parameters of the first subnet and parameters of the third subnet to the client
  • the client is also configured to A target model is obtained from parameters of the third subnet, where the target model includes the first subnet and the third subnet.
  • the client performs training based on unlabeled data
  • the server performs training based on the parameters of the first subnet reported by the client and the labeled data, and sends the updated parameters of the first subnet to the client and the parameters of the third subnet, and then the client obtains the target model according to the parameters of the first subnet and the parameters of the third subnet.
  • this method ensures the security of the client’s private data, and at the same time improves the feature extraction capability of the model on unlabeled data, saving labor costs.
  • This solution can realize that there is only labeled data on the server side, and it can also perform horizontal federated learning when there is no labeled data on the client side, so as to adapt to the real scene of lack of labeled data.
  • the foregoing client trains the first model according to the unlabeled data to obtain parameters of the first model. It can be understood that the client trains the first model according to the unlabeled data to update the first model. A model parameter.
  • the above-mentioned first subnetwork may be used to perform feature extraction on data input into the subnetwork.
  • this solution can reduce the communication overhead in the training process to a certain extent due to less data to be transmitted.
  • the client in terms of sending parameters of the first model to the server, the client is configured to only send the first sub-parameter in the first model to the server. parameters of the net.
  • the client is further configured to send parameters in the first model other than parameters of the first subnet to the server.
  • the number of clients is K, and K is an integer greater than 1, and the server is further configured to aggregate parameters of K first subnets from the K clients processing to obtain the processed parameters of the first subnet; according to the parameters of the first subnet reported by the client and the labeled data, the second model of the server is trained to update the In terms of the parameters of the second model, the server is configured to train the second model of the server according to the processed parameters of the first subnet and the labeled data, so as to update the second model parameters.
  • the server performs training based on the parameters of the first subnet of multiple clients, which can effectively improve the feature extraction capability of the model on unlabeled data.
  • the third subnetwork of the second model is used to output the calculation results of the second model; the second subnetwork of the first model is used to output the calculation results of the first model A calculation result, wherein the third subnetwork of the second model has a different structure than the second subnetwork of the first model.
  • the third subnet is a Classifier subnet
  • the second subnet is an MLM subnet, and so on.
  • the parameters of the second subnetwork of the first model remain unchanged before and after training.
  • the second model further includes a fourth subnetwork, and parameters of the fourth subnetwork of the second model remain unchanged before and after training.
  • the embodiment of the present application provides a model training method, which is applied to the server, and the server maintains labeled data, and the method includes: according to the parameters of the first subnet reported by the client and the labeled data
  • the label data trains the second model to update the parameters of the second model, the second model includes the first subnet and the third subnet; and sends the updated first subnet to the client. parameters of the subnet and parameters of the third subnet.
  • the server performs training based on the parameters of the first subnet and the labeled data reported by the client, and then sends the updated parameters of the first subnet and parameters of the third subnet to the client.
  • the parameters of the first subnetwork reported by the client are obtained by the client through training based on unlabeled data.
  • this method ensures the security of the client’s private data, and at the same time improves the feature extraction capability of the model on unlabeled data, saving labor costs.
  • This solution can realize that there is only labeled data on the server side, and it can also perform horizontal federated learning when there is no labeled data on the client side, so as to adapt to the real scene of lack of labeled data.
  • the number of clients is K, and K is an integer greater than 1, and the method further includes: performing aggregation processing on parameters of K first subnets from the K clients , to obtain the processed parameters of the first subnet; the second model is trained according to the parameters of the first subnet reported by the client and the labeled data, so as to update the parameters of the second model, including : Train the second model according to the processed parameters of the first subnetwork and the labeled data, so as to update the parameters of the second model.
  • the server performs training based on the parameters of the first subnet of multiple clients, which can effectively improve the feature extraction capability of the model on unlabeled data.
  • the server also maintains label data, and trains the second model according to the parameters of the first subnet reported by the client and the labeled data, so as to update the
  • the parameters of the second model include: training the third model according to the parameters of the first subnet reported by the client and the unlabeled data, so as to update the parameters of the third model; according to the parameters of the third model and training a second model with the labeled data to update parameters of the second model.
  • horizontal federated learning can be implemented in the scenario where the server maintains labeled data and unlabeled data, which further improves the feature extraction capability of the model and saves labor costs.
  • the embodiment of the present application provides a model training method, which is applied to a client, and the client maintains data with or without labels.
  • the method includes: training the first model according to the unlabeled data to obtain The parameters of the first model; sending the parameters of the first subnet in the first model to the server, the first model also includes the second subnet; according to the first subnet from the server A target model is obtained from network parameters and third subnetwork parameters, wherein the target model includes the first subnetwork and the third subnetwork, and the third subnetwork corresponds to the second subnetwork.
  • the client performs training based on unlabeled data, reports the parameters of the first subnet to the server, and obtains the target model according to the parameters of the first subnet and the parameters of the third subnet from the server,
  • the parameters of the first subnet and the parameters of the third subnet from the server are obtained by the server through training based on the parameters of the first subnet and the labeled data reported by the client.
  • this method ensures the security of the client’s private data, and at the same time improves the feature extraction capability of the model on unlabeled data, saving labor costs.
  • This solution can realize that there is only labeled data on the server side, and it can also perform horizontal federated learning when there is no labeled data on the client side, so as to adapt to the real scene of lack of labeled data.
  • the client only sends the parameters of the first subnet in the first model to the server, and does not send the parameters of the first subnet in the first model to the server. Parameters other than those of the first subnet.
  • this solution can reduce the communication overhead in the training process to a certain extent due to less data to be transmitted.
  • the method further includes: sending parameters in the first model except parameters of the first subnet to the server.
  • the loss value used for the unsupervised training is obtained according to the unlabeled data of the client and the first data, and the first data is obtained by inputting the second data into the The first model is processed, and the second data is obtained by masking the unlabeled data.
  • the unlabeled data is masked, and the loss value is calculated based on the unlabeled data of the client and the masked data.
  • the feature extraction ability of the model on unlabeled data can be improved.
  • the embodiment of the present application provides a model training method, which is applied to a client, and the client maintains label-free data and labeled data, and the method includes: performing training on the first model according to the unlabeled data training to obtain the parameters of the first model; training the fourth model according to the parameters of the first model and the labeled data to obtain the parameters of the fourth model; sending the The parameters of the first subnet and the parameters of the second subnet in the fourth model; updating the fourth model according to the parameters of the first subnet and the parameters of the second subnet from the server.
  • horizontal federated learning can be implemented in the scenario where the client maintains labeled data and unlabeled data, which further improves the feature extraction capability of the model and saves labor costs.
  • the client only sends the parameters of the first subnet and the parameters of the second subnet in the fourth model to the server, but does not send
  • the terminal sends parameters in the fourth model except the parameters of the first subnet and the parameters of the second subnet.
  • the method further includes: sending parameters in the fourth model other than the parameters of the first subnet and the parameters of the second subnet to the server .
  • the embodiment of the present application provides a model training device, the device includes: a training module, configured to train the second model according to the parameters of the first subnet reported by the client and the labeled data, so as to Updating the parameters of the second model, the second model including the first subnet and the third subnet; a sending module, configured to send the updated parameters of the first subnet to the client and Parameters of the third subnet.
  • a training module configured to train the second model according to the parameters of the first subnet reported by the client and the labeled data, so as to Updating the parameters of the second model, the second model including the first subnet and the third subnet
  • a sending module configured to send the updated parameters of the first subnet to the client and Parameters of the third subnet.
  • the device further includes a processing module, configured to: process the K first subnets from the K clients The parameters are aggregated to obtain the parameters of the processed first subnetwork; the training module is also used to train the second model according to the processed parameters of the first subnetwork and the labeled data , to update the parameters of the second model.
  • a processing module configured to: process the K first subnets from the K clients The parameters are aggregated to obtain the parameters of the processed first subnetwork; the training module is also used to train the second model according to the processed parameters of the first subnetwork and the labeled data , to update the parameters of the second model.
  • the training module is further configured to: train the third model according to the parameters of the first subnet reported by the client and the unlabeled data, so as to update the Parameters: training the second model according to the parameters of the third model and the labeled data, so as to update the parameters of the second model.
  • an embodiment of the present application provides a model training device, the device comprising: a training module, configured to train a first model based on unlabeled data, so as to obtain parameters of the first model; a sending module, configured to In order to send the parameters of the first subnet in the first model to the server, the first model also includes the second subnet; an update module, configured to use the parameters of the first subnet from the server parameters and parameters of the third subnetwork to obtain a target model, wherein the target model includes the first subnetwork and the third subnetwork, and the third subnetwork corresponds to the second subnetwork.
  • the sending module is further configured to: send parameters in the first model other than parameters of the first subnet to the server.
  • the sending module is configured to: only send the parameters of the first subnet in the first model to the server without sending the parameters of the first subnet to the server. Parameters in the first model other than the parameters of the first subnetwork.
  • the embodiment of the present application provides a model training device, the device comprising: a training module, configured to train the first model according to the unlabeled data, so as to obtain parameters of the first model; according to the The parameters of the first model and the labeled data are used to train the fourth model to obtain the parameters of the fourth model; the sending module is configured to send the first subnet in the fourth model to the server parameters of the first subnet and parameters of the second subnet; an update module, configured to update the fourth model according to the parameters of the first subnet and the parameters of the second subnet from the server.
  • the sending module is further configured to: send the fourth model to the server except the parameters of the first subnet and the parameters of the second subnet parameters.
  • the sending module is configured to: only send the parameters of the first subnet and the parameters of the second subnet in the fourth model to the server, and Not sending parameters in the fourth model other than the parameters of the first subnet and the parameters of the second subnet to the server.
  • the embodiment of the present application provides a model training device, including a processor and a memory; wherein the memory is used to store program code, and the processor is used to call the program code to execute the method.
  • the present application provides a computer storage medium, including computer instructions.
  • the computer instructions When the computer instructions are run on an electronic device, the electronic device executes any possible implementation manner and/or The method provided in any possible implementation manner of the third aspect and/or any possible implementation manner of the fourth aspect.
  • the embodiment of the present application provides a computer program product, which, when the computer program product is run on a computer, enables the computer to execute any possible implementation manner of the second aspect and/or any possible implementation mode of the third aspect.
  • the method provided in any possible implementation manner of the embodiment and/or the fourth aspect.
  • model training system described in the first aspect the model training device described in the fifth aspect, the model training device described in the sixth aspect, the model training device described in the seventh aspect, and the eighth aspect provided above
  • the model training device, the computer storage medium described in the ninth aspect, or the computer program product described in the tenth aspect are all used to implement any possible implementation manner of the second aspect and/or any possible implementation manner of the third aspect.
  • Figure 1a is a schematic diagram of the framework of a model training system provided by an embodiment of the present application.
  • Fig. 1b is a schematic framework diagram of another model training system provided by the embodiment of the present application.
  • FIG. 2 is an interactive schematic diagram of a model training method provided by an embodiment of the present application
  • Fig. 3 is a schematic flow chart of a model training method provided by an embodiment of the present application.
  • Fig. 4a is a schematic flowchart of a server-side model training method provided by an embodiment of the present application.
  • Fig. 4b is a schematic diagram of a model training method provided by an embodiment of the present application.
  • Fig. 5a is a schematic flowchart of a client model training method provided by an embodiment of the present application.
  • Fig. 5b is a schematic diagram of a model training method provided by an embodiment of the present application.
  • Fig. 6 is a schematic flowchart of another model training method provided by the embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • Fig. 8a is a schematic structural diagram of a model training system provided by an embodiment of the present application.
  • Fig. 8b is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • Fig. 8c is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • Fig. 8d is a schematic structural diagram of a model training device provided by an embodiment of the present application.
  • FIG. 1 a it is a schematic framework diagram of a model training system provided by an embodiment of the present application.
  • the system includes server and client.
  • the client maintains tagged data
  • the server maintains tagged data.
  • the client trains the first model according to the unlabeled data to obtain parameters of the first model.
  • the client sends the parameters of the first subnet in the first model to the server.
  • the server trains the second model according to the parameters of the first subnet and the labeled data reported by the client, so as to update the parameters of the second model.
  • the server sends the updated parameters of the first subnet and parameters of the third subnet to the client, and then the client sends the updated parameters of the first subnet and the third subnet from the server The parameters of the net get the target model.
  • FIG. 1 b it is a schematic framework diagram of another model training system provided by an embodiment of the present application.
  • the system includes server and client.
  • the server includes a federated learning server (Federated Learning Server, FL-Server) module and a federated learning workstation (Federated Learning Worker, FL-Worker) module.
  • the FL-Server module includes an aggregation unit and a communication unit. FL-Server module is used for data processing.
  • the FL-Worker module includes model decomposition unit, training unit and communication unit.
  • the FL-Worker module is used for model training.
  • the client includes a model decomposition unit, a training unit, an inference unit and a communication unit. The client is used for model training and inference.
  • the model decomposing unit of the client is used to decompose the model of the client into multiple subnets.
  • the training unit of the client side trains the decomposed model based on the unlabeled data, and then the communication unit of the client side sends the parameters of the first subnetwork obtained through training to the FL-Server module.
  • the aggregation unit of the FL-Server module aggregates the received parameters of the first subnet sent by multiple clients, and then sends the processed parameters of the first subnet to the FL-Worker module.
  • the model decomposition unit of the FL-Worker module is used to decompose the server-side model into multiple subnets.
  • the training unit of the FL-Worker module trains the decomposed model based on the parameters of the processed first subnet and the labeled data to obtain the updated parameters of the first subnet and the parameters of the third subnet, and then It is sent to the FL-Server module.
  • the FL-Server module sends the updated parameters of the first subnet and the parameters of the third subnet to each client, and the client receives the parameters of the first subnet and the parameters of the third subnet from the server.
  • the parameters of the three subnetworks are obtained from the target model.
  • the reasoning unit of the client can perform reasoning based on the target model.
  • FIG. 2 it is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • the method is applied to disjoint scenarios of horizontal federated learning.
  • the server maintains labeled data, and the server performs supervised training.
  • the client maintains unlabeled data, and the client performs unsupervised training.
  • the client-side unsupervised training starts first. Afterwards, unsupervised training on the client side and supervised training on the server side are performed alternately. The alternate process stops training after a preset condition is met.
  • the preset condition may be that the number of iterations is satisfied, or the loss value is smaller than a preset value, etc., which is not specifically limited in this solution.
  • FIG. 3 is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • the method is applied to a model training system, and the model training system includes a server and a client. It includes steps 301-305, specifically as follows:
  • the client trains the first model according to unlabeled data, so as to obtain parameters of the first model
  • the first model may be any model, such as a neural network model, a support vector machine model, a decision tree model, and the like. Wherein, the first model may correspond to model 2 in FIG. 2 .
  • the aforementioned training of the first model may be unsupervised training.
  • the loss value used for unsupervised training is obtained according to the unlabeled data of the client and the first data, and the first data is obtained by inputting the second data into the first model, and the second data is obtained by masking the unlabeled data.
  • unsupervised training may include the following steps:
  • the mask operation is to replace some values in the original data feature, and the replaced value can be a specific value or a learnable parameter.
  • the unlabeled data is SMS data [Special, Selected, Imported, Imported, High, Science, Technology, Surface, Film], and the mask operation is performed on the "in” in the SMS data, wherein, the data after the mask operation For [special, selection, MASK, mouth, high, science, technology, surface, film] and so on.
  • the results of the comparison are fed into the optimizer to update the parameters of the model.
  • the training is stopped until the stopping condition of the unsupervised training is reached.
  • the aforementioned training of the first model to obtain the parameters of the first model may be understood as training the first model to update the parameters of the first model.
  • the client sends parameters of the first subnet in the first model to the server, where the first model also includes a second subnet;
  • the client may decompose the first model, decompose the parameters of each subnet of the first model, and then perform model training.
  • the first model includes multiple subnets, and the multiple subnets include a first subnet and a second subnet.
  • the above-mentioned first subnetwork may be a subnetwork used for feature extraction.
  • the above-mentioned second subnetwork may be a subnetwork for outputting calculation results of the first model.
  • the first model may include embedding Embedding subnetwork, lightweight bidirectional encoding representation ALBERT subnetwork from deformer, masked language model (Masked Language Model, MLM) subnetwork, adaptive moment estimation Adam optimizer subnetwork, etc. .
  • the first subnet is the ALBERT subnet
  • the second subnet is the MLM subnet.
  • subnets can be understood as submodels.
  • Embedding sub-model For example, Embedding sub-model, ALBERT sub-model, MLM sub-model, etc.
  • the foregoing first model is only an example, and it may also be a model composed of other subnets, which is not specifically limited in this solution.
  • the client sends parameters in the first model to the server except the parameters of the first subnet.
  • the client not only sends the parameters of the first subnet in the first model to the server, but also sends parameters other than the parameters of the first subnet.
  • parameters of the second subnet or other subnets may also be sent, and parameters of all other subnets may also be sent to the server, which is not specifically limited in this solution.
  • the client only sends parameters of the first subnet in the first model to the server.
  • the client does not send parameters other than the parameters of the first subnet to the server.
  • this solution can reduce the communication overhead during the training process because less data is transmitted.
  • the server trains the second model according to the parameters of the first subnet reported by the client and the labeled data, so as to update the parameters of the second model, and the second model includes the first subnet and a third subnet, the third subnet corresponding to the second subnet;
  • the second model can be any model, such as a neural network model, a support vector machine model, a decision tree model, and the like. Wherein, the second model may correspond to model 1 in FIG. 2 .
  • the above server trains the second model according to the parameters of the first subnet reported by the client and the labeled data to update the parameters of the second model.
  • the reported parameters of the first subnetwork are used to replace the parameters of the first subnetwork of the second model to update the parameters of the second model; then, the updated second model is performed according to the labeled data Train to update the parameters of the second model again.
  • the aforementioned training of the second model may be supervised training of the second model.
  • server-side training can refer to the following operations:
  • the labeled data is input into the second model for supervised training, and the output result of the second model is obtained.
  • the results of the comparison are input into the optimizer, and then the parameters used to update the model are obtained.
  • the parameters of the optimizer can also be updated.
  • the stop condition may be that a preset number of iterations is met, or the loss value meets a preset requirement, etc., which is not specifically limited in this solution.
  • the method further includes:
  • the server aggregates the parameters of the K first subnets from the K clients to obtain the processed parameters of the first subnet.
  • the aggregation process may be weighted summation of the parameters of the first subnet sent by each client according to a preset weight, and then the processed parameters of the first subnet are obtained.
  • the server trains the second model of the server according to the parameters of the first subnet reported by the client and the labeled data, so as to update the parameters of the second model, including :
  • the server trains the second model of the server according to the processed parameters of the first subnetwork and the labeled data, so as to update the parameters of the second model.
  • the server replaces the parameters of the first subnet of the second model of the server according to the processed parameters of the first subnet, so as to update the parameters of the second model;
  • the server trains the updated second model according to the labeled data, so as to update the parameters of the second model again.
  • the server may perform a decomposing operation on the second model, and decompose the parameters of each subnet of the second model respectively.
  • the second model includes multiple subnets, and the multiple subnets include the first subnet and the third subnet.
  • the above-mentioned first subnetwork may be a subnetwork used for feature extraction.
  • the above-mentioned third subnetwork may be a subnetwork for outputting calculation results of the second model.
  • the second model may include an Embedding subnet, a lightweight bidirectional encoding representation from a deformer, an ALBERT subnet, a Classifier subnet, an Adam optimizer subnet, and the like.
  • the third subnet is the Classifier subnet.
  • subnets can be understood as submodels. For example, Embedding sub-model, ALBERT sub-model, Classifier sub-model, etc.
  • the foregoing second model is only an example, and it may also be a model composed of other subnets, which is not specifically limited in this solution.
  • the third subnetwork of the above-mentioned second model corresponds to the second subnetwork of the first model, it can be understood that the functions of the two are the same, for example, the third subnetwork of the second model is used to output the second model The calculation result of the first model; the second subnetwork of the first model is used to output the calculation result of the first model.
  • the structure of the third subnetwork of the second model is different from that of the second subnetwork of the first model.
  • the third subnet is a Classifier subnet
  • the second subnet is an MLM subnet, and so on.
  • the server sends updated parameters of the first subnet and parameters of the third subnet to the client;
  • step 304 it may also include:
  • step 304 If the preset condition is met, execute step 304;
  • the server sends updated parameters of the first subnet and parameters of the third subnet to the client, and the client sends The parameters of the first subnet of the server update the first model, and repeatedly execute steps 301, 302, 303, 304-1, and 304-3 until the preset condition is met.
  • the above-mentioned preset condition may be the number of repeated steps 301, 302, 303, 304-1, and 304-3, wherein the client and the server may predetermine the number of stops, and when the number of repetitions is reached, the training is stopped.
  • the above preset condition may also be that the loss value calculated by the server based on the loss function is smaller than the preset value, etc., which is not specifically limited in this solution.
  • the server only sends the updated parameters of the first subnet to the client, and the client sends the updated parameters of the first subnet according to the The parameters of the first subnet at the server end update the first model, and repeatedly execute steps 301, 302, 303, 304-1, and 304-3 until the preset condition is met.
  • the client obtains a target model according to parameters of the first subnet and parameters of the third subnet from the server, where the target model includes the first subnet and the third subnet Three subnets.
  • the target model can be used for inference.
  • the client performs training based on unlabeled data
  • the server performs training based on the parameters of the first subnet reported by the client and the labeled data, and sends the updated parameters of the first subnet to the client and the parameters of the third subnet, and then the client obtains the target model according to the updated parameters of the first subnet and the parameters of the third subnet.
  • this method ensures the security of the client’s private data, and at the same time improves the feature extraction capability of the model on unlabeled data, saving labor costs.
  • This solution can realize that there is only labeled data on the server side, and it can also perform horizontal federated learning when there is no labeled data on the client side, so as to adapt to the real scene of lack of labeled data.
  • FIG. 4a it is a schematic flowchart of a model training method provided by an embodiment of the present application.
  • the method is applied to a server, and the server maintains labeled data, which includes steps 401-402, specifically as follows:
  • the server may perform a decomposing operation on the second model, and decompose the parameters of each subnet of the second model respectively.
  • the second model includes multiple subnets, and the multiple subnets include the first subnet and the third subnet.
  • the above-mentioned first subnetwork may be a subnetwork used for feature extraction.
  • the above-mentioned third subnetwork may be a subnetwork for outputting calculation results of the second model.
  • the second model may include an Embedding subnet, a lightweight bidirectional encoding representation from a deformer, an ALBERT subnet, a Classifier subnet, an Adam optimizer subnet, and the like.
  • the third subnet is a Classifier subnet.
  • the foregoing second model is only an example, and it may also be a model composed of other subnets, which is not specifically limited in this solution.
  • the number of clients is K, and K is an integer greater than 1, and the method further includes:
  • the parameters of the K first subnets from the K clients are aggregated to obtain the processed parameters of the first subnet.
  • the aggregation process may be weighted summation of the parameters of the first subnet sent by each client according to a preset weight, and then the processed parameters of the first subnet are obtained.
  • the server trains the second model of the server according to the parameters of the first subnet reported by the client and the labeled data, so as to update the parameters of the second model, including :
  • the server trains the second model of the server according to the processed parameters of the first subnetwork and the labeled data, so as to update the parameters of the second model.
  • the server replaces the parameters of the first subnet of the second model of the server according to the processed parameters of the first subnet, so as to update the parameters of the second model;
  • the server trains the updated second model according to the labeled data, so as to update the parameters of the second model again.
  • the server may perform supervised training on the second model.
  • the output of the second model is obtained by inputting the labeled data into the second model for supervised training.
  • the similarity function that is, the loss function
  • compare the similarity between the output of the model and the labeled data The results of the comparison are input into the optimizer, and then the parameters used to update the model are obtained. Wherein, if the optimizer itself has parameters, the parameters of the optimizer can also be updated.
  • the stopping condition of the supervised training may be that the preset number of iterations is met, or the loss value meets the preset requirements, etc., which is not specifically limited in this solution.
  • the server also maintains tagged data.
  • the server can perform semi-supervised training.
  • the training of the second model according to the parameters of the first subnet reported by the client and the labeled data, so as to update the parameters of the second model includes:
  • the second model is trained according to the parameters of the third model and the labeled data, so as to update the parameters of the second model.
  • the third model is trained according to the parameters of the first subnet reported by the client and the unlabeled data, so as to update the parameters of the third model; according to the parameters of the third model and the labeled data
  • a second model is trained to update parameters of the second model.
  • the parameters of the third model are updated according to the parameters of the second model, and the updated third model is trained according to the unlabeled data, to again Updating the parameters of the third model; repeating the above steps until the stopping condition of the semi-supervised training is reached.
  • the stop condition of the semi-supervised training may be that the preset number of iterations is met, or the loss value meets the preset requirements, etc., which is not specifically limited in this solution.
  • the second model further includes a fourth subnetwork, where parameters of the fourth subnetwork remain unchanged before and after training.
  • the fourth subnet may be an Embedding subnet.
  • the parameters of the fourth subnetwork of the second model are delivered during initialization, and the parameters remain unchanged during the training process.
  • the client if it only has unlabeled data, it can perform unsupervised training; if it has labeled data, the client can perform semi-supervised training, etc. This plan does not specifically limit this.
  • the client can obtain the target model and perform inference.
  • step 402 before step 402, it also includes:
  • step 402 If the preset condition is met, execute step 402;
  • the server sends updated parameters of the first subnet and parameters of the third subnet to the client, so that the client can
  • the first model is updated with the parameters of the first subnet from the server, and steps 401, 402-1, and 402-3 are repeatedly executed until the preset condition is met.
  • the above-mentioned preset condition may be the number of repeated executions of steps 401, 402-1, and 402-3, wherein the client and the server may predetermine the number of stops, and when the number of repetitions is reached, the training is stopped.
  • the above preset condition may also be that the loss value calculated by the server based on the loss function is smaller than the preset value, etc., which is not specifically limited in this solution.
  • the server only sends the updated parameters of the first subnet to the client, and repeats steps 401, 402- 1. 402-3, until the preset condition is reached.
  • the model of the server includes the Embedding subnet, the ALBERT subnet, the Classifier subnet and the optimizer subnet. This embodiment is described by taking the short message classification service as an example.
  • the server In the supervised training on the server side, the server first performs data preprocessing on the SMS text.
  • the data preprocessing here can be word segmentation based on the tokenizer.
  • the server inputs the output result of the tokenizer into the second model.
  • the server inputs the labeled data into the cross-entropy function, and then calculates the similarity based on the output of the second model.
  • the similarity is then fed into the optimizer, which in turn updates the parameters of the second model.
  • the server sends the updated parameters of the ALBERT subnet and the parameters of the Classifier subnet to the client. Then, train again based on the parameters of the ALBERT subnet sent by the client. Until the stop condition is reached, the server sends the updated parameters of the ALBERT subnet and the parameters of the Classifier subnet to the client, so that the client can perform inference.
  • the server does not currently have a training engine, it is necessary to build a model training simulation platform for model training during training.
  • this method cannot be deployed.
  • an embodiment of the present application further provides a server, which includes a federated learning server FL-Server module and a federated learning workstation FL-Worker module.
  • the FL-Server module of the federated learning server is used to aggregate the parameters of K first subnets sent by K clients, and then send the processed parameters to the federated learning workstation FL-Worker module.
  • the FL-Worker module is used to perform training according to the processed parameters and the labeled data of the server to obtain the updated parameters of the first subnet and the parameters of the third subnet, and then use the updated first
  • the parameters of the subnet and the parameters of the third subnet are sent to the FL-Server module.
  • the training task can be carried out on the server, so that the server can directly perform model training and improve the efficiency of model training.
  • the server performs training based on the parameters of the first subnet and the labeled data reported by the client, and then sends the updated parameters of the first subnet and the parameters of the third subnet to the client.
  • the parameters of the first subnetwork reported by the client are obtained by the client through training based on unlabeled data.
  • it ensures the security of the client’s private data, and at the same time improves the feature extraction capability of the model on unlabeled data, saving labor costs.
  • This solution can realize that there is only labeled data on the server side, and it can also perform horizontal federated learning when there is no labeled data on the client side, so as to adapt to the real scene of lack of labeled data.
  • FIG. 5a it is a schematic flow chart of a model training method provided by an embodiment of the present application.
  • the method is applied to a client, and the client maintains label data, which includes steps 501-503, specifically as follows:
  • the first model may be any model, such as a neural network model, a support vector machine model, a decision tree model, and the like.
  • the aforementioned training of the first model may be unsupervised training.
  • the loss value used for unsupervised training is obtained according to the unlabeled data of the client and the first data, and the first data is obtained by inputting the second data into the first model, and the second data is obtained by masking the unlabeled data.
  • unsupervised training may include the following steps:
  • the mask operation is to replace some values in the original data feature, and the replaced value can be a specific value or a learnable parameter.
  • the unlabeled data is SMS data [Special, Selected, Imported, Imported, High, Science, Technology, Surface, Film], and the mask operation is performed on the "in” in the SMS data, wherein, the data after the mask operation It is [Special, Select, MASK, Mouth, High, Science, Technology, Surface, Film].
  • the masked data is then fed into the model for unsupervised training to get the output of the model.
  • the results of the comparison are fed into the optimizer to update the parameters of the model.
  • Stop training by repeating the above steps until the stopping condition for unsupervised training is reached.
  • the foregoing acquisition of the parameters of the first model may be understood as training the first model to update the parameters of the first model.
  • the client may perform a decomposing operation on the first model, and decompose the parameters of each subnet of the first model respectively.
  • the first model includes multiple subnets, and the multiple subnets include a first subnet and a second subnet.
  • the above-mentioned first subnetwork may be a subnetwork used for feature extraction.
  • the above-mentioned second subnetwork may be a subnetwork for outputting calculation results of the first model.
  • the first model may include embedding Embedding subnetwork, lightweight bidirectional encoding representation ALBERT subnetwork from deformer, masked language model (Masked Language Model, MLM) subnetwork, adaptive moment estimation Adam optimizer subnetwork, etc. .
  • the first subnet is the ALBERT subnet
  • the second subnet is the MLM subnet.
  • subnets can be understood as submodels.
  • Embedding sub-model For example, Embedding sub-model, ALBERT sub-model, MLM sub-model, etc.
  • the foregoing first model is only an example, and it may also be a model composed of other subnets, which is not specifically limited in this solution.
  • the client sends parameters in the first model to the server except the parameters of the first subnet.
  • the client not only sends the parameters of the first subnet in the first model to the server, but also sends parameters other than the parameters of the first subnet.
  • parameters of the second subnet or other subnets may also be sent, and parameters of all other subnets may also be sent to the server, which is not specifically limited in this solution.
  • the client only sends parameters of the first subnet in the first model to the server.
  • the client does not send parameters other than the parameters of the first subnet to the server.
  • this solution can reduce the communication overhead in the training process because less data is transmitted.
  • the target model can be used for inference.
  • step 503 may also include:
  • step 503 If the preset condition is met, execute step 503;
  • the above-mentioned preset condition may be the number of repeated executions of steps 501, 502, 503-1, and 503-3, wherein the client and the server may predetermine the number of stops, and when the number of repetitions is reached, the training is stopped.
  • the foregoing preset condition may also be that the loss value calculated by the client based on the loss function is less than a preset value, etc., which is not specifically limited in this solution.
  • the client model includes Embedding subnet, ALBERT subnet, MLM subnet and optimizer subnet. This embodiment is described by taking the short message classification service as an example.
  • the client in the unsupervised training task of the client, the client first performs data preprocessing on the SMS text.
  • Data preprocessing includes word segmentation and then masking. Among them, the masked result is used as the input of the model.
  • the client also inputs the word segmentation result into the cross-entropy function to calculate the similarity with the output of the model. Then, the calculated similarity is input into the optimizer sub-network to obtain the updated parameters of the first model. Then, the client sends the parameters of the ALBERT subnet to the server.
  • the first model further includes a fifth subnetwork, and parameters of the fifth subnetwork remain unchanged before and after training.
  • the fifth subnet may be an Embedding subnet.
  • the parameters of the fifth subnetwork of the first model are delivered during initialization, and the parameters remain unchanged during the subsequent training process.
  • the parameters of the second subnetwork of the first model remain unchanged before and after training.
  • the second subnet may be an MLM subnet.
  • the parameters of the second subnetwork of the first model are delivered during initialization, and the parameters remain unchanged during the subsequent training process. With this method, the training overhead is reduced.
  • the client performs training based on unlabeled data, reports the parameters of the first subnet to the server, and obtains the target model according to the parameters of the first subnet and the parameters of the third subnet from the server,
  • the parameters of the first subnet and the parameters of the third subnet from the server are obtained by the server through training based on the parameters of the first subnet and the labeled data reported by the client.
  • this method ensures the security of the client’s private data, and at the same time improves the feature extraction capability of the model on unlabeled data, saving labor costs.
  • This solution can realize that there is only labeled data on the server side, and it can also perform horizontal federated learning when there is no labeled data on the client side, so as to adapt to the real scene of lack of labeled data.
  • the client performs unsupervised training as an example.
  • the client can perform semi-supervised training.
  • the following describes the semi-supervised training of the client.
  • the embodiment of the present application also provides a model training method, including steps 601-604, specifically as follows:
  • unsupervised training is performed on the first model, and the loss value used for unsupervised training is obtained according to the unlabeled data of the client and the first data, and the first data is the The second data is obtained by inputting to the first model for processing, and the second data is obtained by masking the unlabeled data.
  • unsupervised training may include the following steps:
  • the mask operation is to replace some values in the original data features, and the replaced values can be specific values or learnable parameters.
  • the masked data is then fed into the model for unsupervised training to get the output of the model.
  • the results of the comparison are fed into the optimizer to update the parameters of the model.
  • Stop training by repeating the above steps until the stopping condition for unsupervised training is reached.
  • the client updates the parameters of the fourth model based on the parameters of the first model.
  • a fourth model is then supervised training based on the labeled data.
  • step 602 it also includes:
  • step 603 executes step 603;
  • the stopping condition of the above-mentioned semi-supervised training on the client side may be the number of times steps 601, 602, and 6022 are repeated, etc., which is not specifically limited in this solution.
  • the fourth model includes a first subnet and a second subnet.
  • parameters in the fourth model other than the parameters of the first subnet and the parameters of the second subnet are sent to the server.
  • only the parameters of the first subnet and the parameters of the second subnet in the fourth model are sent to the server.
  • this solution can reduce the communication overhead in the training process to a certain extent due to less data to be transmitted.
  • step 603 it also includes:
  • the client performs unsupervised training on the first model based on unlabeled data, and performs supervised training on the fourth model based on labeled data, and then sends the first subnet and the first subnet of the fourth model to the server.
  • the parameters of the second subnet, and then the client is updated based on the parameters of the first subnet and the second subnet from the server.
  • FIG. 7 it is a schematic diagram of a hardware structure of a model training device provided by an embodiment of the present application.
  • the model training apparatus 700 shown in FIG. 7 (the apparatus 700 may specifically be a computer device) includes a memory 701 , a processor 702 , a communication interface 703 and a bus 704 .
  • the memory 701 , the processor 702 , and the communication interface 703 are connected to each other through a bus 704 .
  • the memory 701 may be a read-only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device or a random access memory (Random Access Memory, RAM).
  • ROM Read Only Memory
  • RAM Random Access Memory
  • the memory 701 may store a program.
  • the processor 702 and the communication interface 703 are used to execute each step of the model training method of the embodiment of the present application.
  • the processor 702 may be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the model training device of the embodiment of the present application, or to execute the model training method of the method embodiment of the present application.
  • the processor 702 may also be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the model training method of the present application may be completed by an integrated logic circuit of hardware in the processor 702 or instructions in the form of software.
  • the above-mentioned processor 702 can also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application-specific integrated circuit
  • FPGA Field Programmable Gate Array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and combines its hardware to complete the functions required by the units included in the model training device of the embodiment of the present application, or execute the model training of the method embodiment of the present application method.
  • the communication interface 703 implements communication between the apparatus 700 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, data can be acquired through the communication interface 703 .
  • the bus 704 may include pathways for transferring information between various components of the device 700 (eg, memory 701 , processor 702 , communication interface 703 ).
  • the device 700 shown in FIG. 7 only shows a memory, a processor, and a communication interface, in the specific implementation process, those skilled in the art should understand that the device 700 also includes other devices necessary for normal operation . Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 700 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the device 700 may also only include the components necessary to realize the embodiment of the present application, and does not necessarily include all the components shown in FIG. 7 .
  • the embodiment of the present application also provides a model training system, as shown in Figure 8a
  • the model training system 800 includes a server 801 and a client 802, the server 801 maintains Labeled data, the client 802 maintains label-free data, wherein the client 802 is used to train the first model according to the unlabeled data, so as to obtain the parameters of the first model; the client The terminal 802 is also used to send the parameters of the first subnet in the first model to the server 801, and the first model also includes the second subnet; The reported parameters of the first subnetwork and the labeled data train the second model to update the parameters of the second model, the second model includes the first subnetwork and the third subnetwork , the third subnet corresponds to the second subnet; the server 801 is further configured to send the updated parameters of the first subnet and the parameters of the third subnet to the client 802 parameters; the client 802 is further configured to obtain a target model according to the parameters of the first subnet
  • the client 802 in terms of sending the parameters of the first model to the server 801, the client 802 is used to only send the parameters of the first model to the server 801. parameters of the first subnet, instead of sending parameters in the first model to the server 801 other than the parameters of the first subnet.
  • the client 802 is further configured to send parameters in the first model to the server 801 except the parameters of the first subnet.
  • the number of the clients 802 is K, and K is an integer greater than 1, and the server 801 is also used to perform K first subroutines from the K clients 802
  • the parameters of the network are aggregated to obtain the processed parameters of the first subnet; and the second model of the server is performed according to the parameters of the first subnet reported by the client and the labeled data Training, to update the parameters of the second model, the server 801 is also used to update the second model of the server 801 according to the processed parameters of the first subnetwork and the labeled data Training is performed to update parameters of the second model.
  • the third subnetwork of the second model is used to output the calculation results of the second model; the second subnetwork of the first model is used to output the calculation results of the first model A calculation result, wherein the third subnetwork of the second model has a different structure than the second subnetwork of the first model.
  • the embodiment of the present application provides a model training device, the device 803 includes: a training module 8031 and a sending module 8032, wherein:
  • the training module 8031 is configured to train the second model according to the parameters of the first subnet reported by the client and the labeled data, so as to update the parameters of the second model, the second model includes the first subnetwork and third subnetwork;
  • a sending module 8032 configured to send the updated parameters of the first subnet and parameters of the third subnet to the client.
  • the device further includes a processing module, configured to: process the K first subnets from the K clients The parameters are aggregated to obtain the parameters of the processed first subnetwork; the training module is also used to train the second model according to the processed parameters of the first subnetwork and the labeled data , to update the parameters of the second model.
  • a processing module configured to: process the K first subnets from the K clients The parameters are aggregated to obtain the parameters of the processed first subnetwork; the training module is also used to train the second model according to the processed parameters of the first subnetwork and the labeled data , to update the parameters of the second model.
  • the training module 8031 is further configured to: train the third model according to the parameters of the first subnet reported by the client and the unlabeled data, so as to update the third model Parameters of the second model are trained according to the parameters of the third model and the labeled data, so as to update the parameters of the second model.
  • the embodiment of the present application also provides a model training device, the device 804 includes: a training module 8041, a sending module 8042 and an acquiring module 8043, specifically as follows:
  • a training module 8041 configured to train the first model according to unlabeled data, so as to obtain parameters of the first model
  • a sending module 8042 configured to send parameters of the first subnet in the first model to the server, where the first model also includes a second subnet;
  • Obtaining module 8043 configured to obtain a target model according to the parameters of the first subnet and the parameters of the third subnet from the server, wherein the target model includes the first subnet and the third subnet network, the third subnet corresponds to the second subnet.
  • the sending module 8042 is further configured to: send parameters in the first model except parameters of the first subnet to the server.
  • the sending module 8042 is further configured to: only send the parameters of the first subnet in the first model to the server instead of sending Parameters in the first model other than parameters of the first subnetwork.
  • the embodiment of the present application also provides a model training device, the device 805 includes:
  • a training module 8051 configured to train the first model according to the unlabeled data to obtain parameters of the first model; train a fourth model according to the parameters of the first model and the labeled data, to obtain parameters of the fourth model;
  • a sending module 8052 configured to send parameters of the first subnet and parameters of the second subnet in the fourth model to the server;
  • An updating module 8053 configured to update the fourth model according to the parameters of the first subnet and the parameters of the second subnet from the server.
  • the sending module 8052 is further configured to: send to the server any of the parameters in the fourth model except the parameters of the first subnet and the parameters of the second subnet outside parameters.
  • the sending module 8052 is further configured to: only send parameters of the first subnet and parameters of the second subnet in the fourth model to the server , without sending parameters in the fourth model other than the parameters of the first subnet and the parameters of the second subnet to the server.
  • the embodiment of the present application also provides a chip system, the chip system is applied to electronic equipment; the chip system includes one or more interface circuits, and one or more processors; the interface circuit and the processor pass The circuit is interconnected; the interface circuit is used to receive a signal from the memory of the electronic device and send the signal to the processor, and the signal includes a computer instruction stored in the memory; when the processor executes the When using the computer instructions, the electronic device executes the model training method.
  • the embodiment of the present application also provides a model training device, including a processor and a memory; wherein the memory is used to store program codes, and the processor is used to call the program codes to execute the model training method.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer or a processor, the computer or the processor executes one of the above-mentioned methods or multiple steps.
  • the embodiment of the present application also provides a computer program product including instructions.
  • the computer program product is run on the computer or the processor, the computer or the processor is made to perform one or more steps in any one of the above methods.
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same function and effect.
  • words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not necessarily limit the difference.
  • words such as “exemplary” or “for example” are used as examples, illustrations or illustrations. Any embodiment or design scheme described as “exemplary” or “for example” in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes.
  • the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner for easy understanding.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the division of this unit is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components can be combined or integrated into another system, or some features can be ignored, or not implement.
  • the mutual coupling, or direct coupling, or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • a unit described as a separate component may or may not be physically separated, and a component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted over a computer-readable storage medium.
  • the computer instructions may be sent from a website site, computer, server, or data center via wired (e.g. coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. integrated with one or more available media.
  • the usable medium can be read-only memory (read-only memory, ROM), or random access memory (random access memory, RAM), or magnetic medium, for example, floppy disk, hard disk, magnetic tape, magnetic disk, or optical medium, such as , a digital versatile disc (digital versatile disc, DVD), or a semiconductor medium, for example, a solid state disk (solid state disk, SSD) and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer And Data Communications (AREA)

Abstract

本申请实施例提供一种模型训练方法及相关系统、存储介质,应用于人工智能技术领域,例如联邦学习方面,其中,所述系统包括:客户端用于根据无标签数据对第一模型进行训练,还用于向所述服务端发送所述第一模型中第一子网的参数;所述服务端用于根据所述第一子网的参数和有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述服务端还用于向所述客户端发送更新后的所述第一子网的参数以及第三子网的参数;所述客户端还用于根据来自所述服务端的所述第一子网的参数和所述第三子网的参数得到目标模型。采用该手段,保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。

Description

模型训练方法及相关系统、存储介质
本申请要求于2021年6月15日提交中国专利局、申请号为202110662048.9、申请名称为“模型训练方法及相关系统、存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种模型训练方法及相关系统、存储介质。
背景技术
随着人工智能的发展,提出了“联邦学习”的概念,使得联邦双方在不用给出己方数据的情况下,也可进行模型训练得到模型参数,并且可以避免数据隐私泄露的问题。
横向联邦学习,也称为特征对齐的联邦学习(feature-alignedfederated learning),是在各个客户端的数据特征重叠较多(即数据特征是对齐的),而用户重叠较少的情况下,取出客户端数据特征相同而用户不完全相同的那部分数据进行联合机器学习。横向联邦学习的应用场景分为两类:标准场景和不相交场景。标准场景指的是参与模型训练的有标签数据存放于客户端,即在客户端上执行标准的有监督训练。不相交场景是指参与模型训练的有标签数据被存放在服务端中,而大量无标签数据却存放在客户端中。不相交场景主要是由于许多数据的标注工作是需要具备相关专业知识的人员来进行处理的。例如,对于瑜伽姿势矫正的手机应用软件,由于普通人难以确认自己的瑜伽姿势是否完全正确,因此,即使用户愿意为服务商标注所有的图片数据,服务商也只能聘请专业的瑜伽从业人员来对相关数据进行标注。
目前的横向联邦学习对于不相交场景,通常假设客户端有大量的有标签数据,能够保证使用横向联邦学习的训练模式进行模型训练,但是实际情况通常是客户端有少量甚至是没有有标签数据,事实上也很难要求客户端对数据进行标注,因而很难使用现有的横向联邦学习训练模式获得优质的模型。
发明内容
本申请公开了一种模型训练方法及相关系统、存储介质,可以提高模型在无标签数据上的特征提取能力。
第一方面,本申请实施例提供一种模型训练系统,所述模型训练系统包括服务端和客户端,所述服务端维护有有标签数据,所述客户端维护有无标签数据,其中:所述客户端用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;所述客户端还用于向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;所述服务端用于根据所述客户端上报的所述第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网,所述第三子网与所述第二子网对应;所述服务端还用于向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数;所述客户端还用于根据来自所述服务端的所述第一子网的参数和所述第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网。
通过本方案,客户端基于无标签数据进行训练,然后服务端基于客户端上报的第一子网的参数和有标签数据进行训练,并向客户端发送更新后的所述第一子网的参数以及第三子网的参数,进而客户端根据所述第一子网的参数以及第三子网的参数得到目标模型。采用该手 段,一方面保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
上述客户端根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数,可以理解为,客户端根据所述无标签数据对第一模型进行训练,以更新所述第一模型的参数。
作为一种可选的实现方式,上述第一子网可以用于对输入该子网的数据进行特征提取。
相较于现有技术中客户端向服务端发送训练得到的所有参数,采用本方案,由于传输的数据较少,在一定程度上可以降低训练过程中的通信开销。
作为一种可选的实现方式,在向所述服务端发送所述第一模型的参数的方面,所述客户端用于仅向所述服务端发送所述第一模型中所述第一子网的参数。
作为另一种可选的实现方式,所述客户端还用于向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
作为一种可选的实现方式,所述客户端的数量为K个,K为大于1的整数,所述服务端还用于对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;在根据所述客户端上报的所述第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数的方面所述服务端用于根据所述处理后的第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数。
采用该手段,服务端基于多个客户端的第一子网的参数来进行训练,可以有效提高模型在无标签数据上的特征提取能力。
作为一种可选的实现方式,所述第二模型的第三子网用于输出所述第二模型的计算结果;所述第一模型的第二子网用于输出所述第一模型的计算结果,其中,所述第二模型的第三子网与所述第一模型的第二子网的结构不同。
作为一种可选的实现方式,第三子网为Classifier子网,第二子网为MLM子网等。
作为一种可选的实现方式,所述第一模型的第二子网的参数在训练前和训练后保持不变。
采用该手段,可以降低训练开销。
作为一种可选的实现方式,所述第二模型还包括第四子网,第二模型的第四子网的参数在训练前和训练后保持不变。
采用该手段,可以降低训练开销。
第二方面,本申请实施例提供一种模型训练方法,应用于服务端,所述服务端维护有有标签数据,所述方法包括:根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网;向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。
通过本申请实施例,服务端基于客户端上报的第一子网的参数和有标签数据进行训练,然后向客户端发送更新后的所述第一子网的参数以及第三子网的参数。其中,客户端上报的第一子网的参数为客户端基于无标签数据进行训练得到的。采用该手段,一方面保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
作为一种可选的实现方式,所述客户端的数量为K个,K为大于1的整数,所述方法还包括:对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子 网的参数;所述根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,包括:根据所述处理后的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
采用该手段,服务端基于多个客户端的第一子网的参数来进行训练,可以有效提高模型在无标签数据上的特征提取能力。
作为一种可选的实现方式,所述服务端还维护有无标签数据,所述根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,包括:根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
采用该手段,可以实现在服务端维护有有标签数据和无标签数据的场景下进行横向联邦学习,进一步提高了模型的特征提取能力,节省人力成本。
第三方面,本申请实施例提供一种模型训练方法,应用于客户端,所述客户端维护有无标签数据,所述方法包括:根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;根据来自所述服务端的所述第一子网的参数和第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网,所述第三子网与所述第二子网对应。
通过本申请实施例,客户端基于无标签数据进行训练,并将第一子网的参数上报给服务端,并根据来自服务端的第一子网的参数和第三子网的参数得到目标模型,其中,来自服务端的第一子网的参数和第三子网的参数是服务端基于客户端上报的第一子网的参数和有标签数据进行训练得到的。采用该手段,一方面保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
作为一种可选的实现方式,所述客户端仅向所述服务端发送所述第一模型中所述第一子网的参数,而不向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
相较于现有技术中客户端向服务端发送训练得到的所有参数,采用本方案,由于传输的数据较少,在一定程度上可以降低训练过程中的通信开销。
作为另一种可选的实现方式,所述方法还包括:向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
作为一种可选的实现方式,进行所述无监督训练所使用的损失值是根据所述客户端的无标签数据和第一数据得到的,所述第一数据是将第二数据输入至所述第一模型进行处理得到的,所述第二数据是对所述无标签数据进行掩码处理得到的。
通过本申请实施例,客户端进行无监督训练时对无标签数据进行掩码处理,基于客户端的无标签数据和掩码处理后的数据来计算损失值。采用该手段,可以提高模型在无标签数据上的特征提取能力。
第四方面,本申请实施例提供一种模型训练方法,应用于客户端,所述客户端维护有无标签数据和有标签数据,所述方法包括:根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;根据所述第一模型的参数和所述有标签数据对第四模型进行训练,以获得所述第四模型的参数;向所述服务端发送所述第四模型中第一子网的参数和第二子网的参数;根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型。
采用该手段,可以实现客户端维护有有标签数据和无标签数据的场景下进行横向联邦学习,进一步提高了模型的特征提取能力,节省人力成本。
作为一种可选的实现方式,所述客户端仅向所述服务端发送所述第四模型中所述第一子网的参数和所述第二子网的参数,而不向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
作为另一种可选的实现方式,所述方法还包括:向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
第五方面,本申请实施例提供一种模型训练装置,所述装置包括:训练模块,用于根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网;发送模块,用于向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。
作为一种可选的实现方式,所述客户端的数量为K个,K为大于1的整数,所述装置还包括处理模块,用于:对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;所述训练模块,还用于根据所述处理后的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
作为一种可选的实现方式,所述训练模块,还用于:根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
第六方面,本申请实施例提供一种模型训练装置,所述装置包括:训练模块,用于根据无标签数据对第一模型进行训练,以获得所述第一模型的参数;发送模块,用于向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;更新模块,用于根据来自所述服务端的所述第一子网的参数和第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网,所述第三子网与所述第二子网对应。
作为一种可选的实现方式,所述发送模块,还用于:向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
作为另一种可选的实现方式,所述发送模块,用于:仅向所述服务端发送所述第一模型中所述第一子网的参数,而不向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
第七方面,本申请实施例提供一种模型训练装置,所述装置包括:训练模块,用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;根据所述第一模型的参数和所述有标签数据对第四模型进行训练,以获得所述第四模型的参数;发送模块,用于向所述服务端发送所述第四模型中第一子网的参数和第二子网的参数;更新模块,用于根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型。
作为一种可选的实现方式,所述发送模块,还用于:向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
作为另一种可选的实现方式,所述发送模块,用于:仅向所述服务端发送所述第四模型中所述第一子网的参数和所述第二子网的参数,而不向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
第八方面,本申请实施例提供一种模型训练装置,包括处理器和存储器;其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行所述的方法。
第九方面,本申请提供了一种计算机存储介质,包括计算机指令,当所述计算机指令在 电子设备上运行时,使得所述电子设备执行如第二方面任一种可能的实施方式和/或第三方面任一种可能的实施方式和/或第四方面任一种可能的实施方式提供的方法。
第十方面,本申请实施例提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第二方面任一种可能的实施方式和/或第三方面任一种可能的实施方式和/或第四方面任一种可能的实施方式提供的方法。
可以理解地,上述提供的第一方面所述的模型训练系统、第五方面所述的模型训练装置、第六方面所述的模型训练装置、第七方面所述的模型训练装置、第八方面所述的模型训练装置、第九方面所述的计算机存储介质或者第十方面所述的计算机程序产品均用于执行第二方面任一种可能的实施方式和/或第三方面任一种可能的实施方式和/或第四方面任一种可能的实施方式提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
附图说明
下面对本申请实施例用到的附图进行介绍。
图1a是本申请实施例提供的一种模型训练系统的框架示意图;
图1b是本申请实施例提供的另一种模型训练系统的框架示意图;
图2是本申请实施例提供的一种模型训练方法的交互示意图;
图3是本申请实施例提供的一种模型训练方法的流程示意图;
图4a是本申请实施例提供的一种服务端的模型训练方法的流程示意图;
图4b是本申请实施例提供的一种模型训练方法的示意图;
图5a是本申请实施例提供的一种客户端的模型训练方法的流程示意图;
图5b是本申请实施例提供的一种模型训练方法的示意图;
图6是本申请实施例提供的又一种模型训练方法的流程示意图;
图7是本申请实施例提供的一种模型训练装置的结构示意图;
图8a是本申请实施例提供的一种模型训练系统的结构示意图;
图8b是本申请实施例提供的一种模型训练装置的结构示意图;
图8c是本申请实施例提供的一种模型训练装置的结构示意图;
图8d是本申请实施例提供的一种模型训练装置的结构示意图。
具体实施方式
下面结合本申请实施例中的附图对本申请实施例进行描述。本申请实施例的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
参照图1a所示,为本申请实施例提供的一种模型训练系统的框架示意图。该系统包括服务端和客户端。其中,客户端维护有无标签数据,服务端维护有有标签数据。客户端根据无标签数据对第一模型进行训练,以获得第一模型的参数。然后,客户端向服务端发送所述第一模型中第一子网的参数。服务端根据客户端上报的所述第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。服务端向所述客户端发送更新后的所述第一子网的参数以及第三子网的参数,进而客户端根据来自所述服务端的所述第一子网的参数和所述第三子网的参数得到目标模型。
参照图1b所示,为本申请实施例提供的另一种模型训练系统的框架示意图。该系统包括服务端和客户端。其中,服务端包括联邦学习服务端(Federated LearningServer,FL-Server)模块和联邦学习工作站(Federated Learning Worker,FL-Worker)模块。FL-Server模块包括聚合单元和通信单元。FL-Server模块用于数据处理。FL-Worker模块包括模型分解单元、训练单元和通信单元。FL-Worker模块用于模型训练。客户端包括模型分解单元、训练单元、推理单元和通信单元。客户端用于模型训练和推理。
其中,客户端的模型分解单元用于将客户端的模型分解为多个子网。客户端的训练单元基于无标签数据对分解后的模型进行训练,然后客户端的通信单元将训练得到的第一子网的参数发送给FL-Server模块。
FL-Server模块的聚合单元将接收到的多个客户端发送的第一子网的参数进行聚合处理,然后将处理后的第一子网的参数发送给FL-Worker模块。FL-Worker模块的模型分解单元用于将服务端的模型分解为多个子网。FL-Worker模块的训练单元基于该处理后的第一子网的参数和有标签数据对分解后的模型进行训练,得到更新后的第一子网的参数和第三子网的参数,然后将其发送给FL-Server模块。FL-Server模块将该更新后的第一子网的参数和第三子网的参数下发给每个客户端,客户端根据来自所述服务端的所述第一子网的参数和所述第三子网的参数得到目标模型。进而,客户端的推理单元可基于该目标模型进行推理。
参照图2所示,为本申请实施例提供的一种模型训练方法的流程示意图。该方法应用于横向联邦学习的不相交场景。其中,服务端维护有有标签数据,服务端进行有监督训练。客户端维护有无标签数据,客户端进行无监督训练。
在模型参数初始化之后,客户端的无监督训练首先启动。之后,客户端的无监督训练和服务端的有监督训练交替进行。交替流程在满足预设条件后停止训练。预设条件可以是满足迭代次数,或者损失值小于预设值等,本方案对此不做具体限定。
具体地,参照图3所示,为本申请实施例提供的一种模型训练方法的流程示意图。该方法应用于模型训练系统,所述模型训练系统包括服务端和客户端。其包括步骤301-305,具体如下:
301、客户端根据无标签数据对第一模型进行训练,以获得所述第一模型的参数;
该第一模型可以是任意模型,如神经网络模型、支持向量机模型、决策树模型等。其中,该第一模型可对应图2中的模型2。
上述对第一模型进行训练,可以是进行无监督训练。
作为一种可选的实现方式,进行无监督训练所使用的损失值是根据所述客户端的无标签数据和第一数据得到的,所述第一数据是将第二数据输入至所述第一模型进行处理得到的,所述第二数据是对所述无标签数据进行掩码处理得到的。
具体地,无监督训练可包括如下步骤:
首先对无标签数据进行掩码操作。
掩码操作即为对原始的数据特征中的部分值进行替换操作,替换的值可以为特定的值,或可学习的参数。
例如,无标签数据为短信数据[特,选,进,口,高,科,技,面,膜],对该短信数据中的“进”进行掩码操作,其中,掩码操作后的数据为[特,选,MASK,口,高,科,技,面,膜]等。
然后将掩码后的数据输入到用于无监督训练的模型中,得到模型的输出结果。
通过相似度函数(即损失函数),比较模型的输出结果和上述无标签数据的相似性。
将比较的结果输入到优化器中,进而更新模型的参数。
通过重复执行上述步骤,直到达到该无监督训练的停止条件时停止训练。
上述仅为一种无监督训练的示例,其还可以是其他形式的无监督训练,本方案对此不做具体限定。
上述对第一模型进行训练,以获得所述第一模型的参数,可以理解为,对第一模型进行训练,以更新所述第一模型的参数。
302、所述客户端向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;
其中,在模型初始化时,客户端可对第一模型进行分解操作,将第一模型的各个子网的参数分别分解开来,进而进行模型训练。
其中,第一模型包括多个子网,该多个子网包括第一子网、第二子网。
上述第一子网可以是用于进行特征提取的子网。上述第二子网可以是用于输出该第一模型的计算结果的子网。
如第一模型可包括嵌入Embedding子网、轻量化的来自于变形器的双向编码表示ALBERT子网、掩码语言模型(Masked Language Model,MLM)子网、自适应矩估计Adam优化器子网等。相应地,第一子网为ALBERT子网,第二子网为MLM子网。
上述子网可以理解为子模型。例如,Embedding子模型、ALBERT子模型、MLM子模型等。
上述第一模型仅为一种示例,其还可以是由其他子网组成的模型,本方案对此不做具体限定。
作为一种可选的实现方式,所述客户端向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
也就是说,所述客户端不仅向服务端发送所述第一模型中第一子网的参数,还发送除第一子网的参数之外的参数。例如,还发送第二子网的参数,或者还发送其他子网的参数,其还可以向服务端发送其他所有子网的参数等,本方案对此不做具体限定。
作为另一种可选的实现方式,所述客户端仅向所述服务端发送所述第一模型中所述第一子网的参数。
也就是说,客户端不向服务端发送除所述第一子网的参数之外的参数。相较于现有技术中客户端将训练得到的所有参数均传输给服务端,采用本方案,由于传输的数据较少,可以降低训练过程中的通信开销。
303、所述服务端根据所述客户端上报的所述第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网,所述第三子网与所述第二子网对应;
该第二模型可以是任意模型,如神经网络模型、支持向量机模型、决策树模型等。其中,该第二模型可对应图2中的模型1。
上述服务端根据所述客户端上报的所述第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,可以是服务端根据所述客户端上报的所述第一子网的参数对第二模型的第一子网的参数进行替换,以更新第二模型的参数;然后,根据所述有标签数据对所述更新后的第二模型进行训练,以再次更新第二模型的参数。
上述对第二模型进行训练,可以是对第二模型进行有监督训练。
作为一种可选的实现方式,服务端的训练可以参照如下操作:
将有标签数据输入到用于有监督训练的第二模型中,得到第二模型的输出结果。
通过相似度函数(即损失函数),比较模型的输出结果和有标签数据的相似性。
将比较的结果输入到优化器中,进而得到用来更新模型的参数。
其中,若优化器自身带有参数,则优化器的参数也可进行更新。
重复执行上述步骤,直到达到该有监督训练的停止条件时停止,得到服务端更新后的第二模型的参数。
该停止条件可以是满足预设迭代次数,或者损失值满足预设要求等,本方案对此不做具体限定。
作为一种可选的实现方式,当客户端的数量为K个,K为大于1的整数,在步骤303之前,所述方法还包括:
所述服务端对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数。
该聚合处理可以是按照预设权重对各个客户端发送的第一子网的参数进行加权求和,进而得到处理后的第一子网的参数。
上述方式仅为一种示例,当然,还可以是其他形式的处理,本方案对此不做具体限定。
相应地,所述服务端根据所述客户端上报的所述第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数,包括:
所述服务端根据所述处理后的第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数。
具体地,所述服务端根据所述处理后的第一子网的参数对所述服务端的第二模型的第一子网的参数进行替换,以更新所述第二模型的参数;
所述服务端根据所述有标签数据对所述更新后的所述第二模型进行训练,以再次更新所述第二模型的参数。
其中,在模型初始化时,服务端可对第二模型进行分解操作,将第二模型的各个子网的参数分别分解开来。
其中,第二模型包括多个子网,该多个子网包括所述第一子网、第三子网。
上述第一子网可以是用于进行特征提取的子网。上述第三子网可以是用于输出该第二模型的计算结果的子网。
如第二模型可包括嵌入Embedding子网、轻量化的来自于变形器的双向编码表示ALBERT子网、分类器Classifier子网、Adam优化器子网等。相应地,第三子网为Classifier子网。
上述子网可以理解为子模型。例如,Embedding子模型、ALBERT子模型、Classifier子模型等。
上述第二模型仅为一种示例,其还可以是由其他子网组成的模型,本方案对此不做具体限定。
上述第二模型的第三子网与第一模型的第二子网对应,可以理解为,两者功能是相同的,例如所述第二模型的第三子网用于输出所述第二模型的计算结果;所述第一模型的第二子网用于输出所述第一模型的计算结果。
其中,所述第二模型的第三子网与所述第一模型的第二子网的结构不同。
作为一种可选的实现方式,第三子网为Classifier子网,第二子网为MLM子网等。
304、所述服务端向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参 数;
作为一种可选的实现方式,在步骤304之前,还可包括:
304-1、确认是否达到预设条件;
304-2、若达到所述预设条件,则执行步骤304;
304-3、若未达到所述预设条件,所述服务端向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数,所述客户端根据来自所述服务端的所述第一子网的参数更新所述第一模型,并重复执行步骤301、302、303、304-1、304-3,直到达到所述预设条件。
上述预设条件可以是重复执行步骤301、302、303、304-1、304-3的次数,其中客户端和服务端可预先确定停止次数,当达到该重复次数后,则停止训练。
上述预设条件还可以是服务端基于损失函数计算得到的损失值小于预设值等,本方案对此不做具体限定。
作为另一种可选的实现方式,若未达到所述预设条件,所述服务端仅向所述客户端发送更新后的所述第一子网的参数,所述客户端根据来自所述服务端的所述第一子网的参数更新所述第一模型,并重复执行步骤301、302、303、304-1、304-3,直到达到所述预设条件。
305、所述客户端根据来自所述服务端的所述第一子网的参数和所述第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网。
其中,目标模型可用于进行推理。
通过本方案,客户端基于无标签数据进行训练,然后服务端基于客户端上报的第一子网的参数和有标签数据进行训练,并向客户端发送更新后的所述第一子网的参数以及第三子网的参数,进而客户端根据该更新后的所述第一子网的参数以及第三子网的参数得到目标模型。采用该手段,一方面保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
参照图4a所示,为本申请实施例提供的一种模型训练方法的流程示意图。该方法应用于服务端,所述服务端维护有有标签数据,其包括步骤401-402,具体如下:
401、根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网;
其中,在模型初始化时,服务端可对第二模型进行分解操作,将第二模型的各个子网的参数分别分解开来。
其中,第二模型包括多个子网,该多个子网包括所述第一子网、第三子网。
上述第一子网可以是用于进行特征提取的子网。上述第三子网可以是用于输出该第二模型的计算结果的子网。
如第二模型可包括嵌入Embedding子网、轻量化的来自于变形器的双向编码表示ALBERT子网、分类器Classifier子网、Adam优化器子网等。相应地,第三子网为Classifier子网。
上述第二模型仅为一种示例,其还可以是由其他子网组成的模型,本方案对此不做具体限定。
作为一种可选的实现方式,所述客户端的数量为K个,K为大于1的整数,所述方法还包括:
对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网 的参数。
该聚合处理可以是按照预设权重对各个客户端发送的第一子网的参数进行加权求和,进而得到处理后的第一子网的参数。
上述方式仅为一种示例,当然,还可以是其他形式的处理,本方案对此不做具体限定。
相应地,所述服务端根据所述客户端上报的所述第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数,包括:
所述服务端根据所述处理后的第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数。
具体地,所述服务端根据所述处理后的第一子网的参数对所述服务端的第二模型的第一子网的参数进行替换,以更新所述第二模型的参数;
所述服务端根据所述有标签数据对所述更新后的所述第二模型进行训练,以再次更新所述第二模型的参数。
作为一种可选的实现方式,所述服务端可对第二模型进行有监督训练。
通过将有标签数据输入到用于有监督训练的第二模型中,得到第二模型的输出。通过相似度函数(即损失函数),比较模型的输出和有标签数据的相似性。将比较的结果输入到优化器中,进而得到用来更新模型的参数。其中,若优化器自身带有参数,则优化器的参数也可进行更新。
重复执行上述步骤,直到达到有监督训练的停止条件时停止,得到服务端更新后的第二模型的参数。
该有监督训练的停止条件可以是满足预设迭代次数,或者损失值满足预设要求等,本方案对此不做具体限定。
作为另一种可选的实现方式,所述服务端还维护有无标签数据。也就是说,服务端可进行半监督训练。
相应地,所述根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,包括:
根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;
根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
其中,根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。当达到服务端半监督训练的停止条件,则执行步骤402。
若未达到上述半监督训练的停止条件,则根据所述第二模型的参数对所述第三模型的参数进行更新,并根据所述无标签数据对更新后的第三模型进行训练,以再次更新所述第三模型的参数;重复执行上述步骤,直到达到所述半监督训练的停止条件。
该半监督训练的停止条件可以是满足预设迭代次数,或者损失值满足预设要求等,本方案对此不做具体限定。
作为一种可选的实现方式,第二模型还包括第四子网,其中,所述第四子网的参数在训练前和训练后保持不变。
例如,该第四子网可以是Embedding子网。
也就是说,在初始化时下发了第二模型的第四子网的参数,训练的过程中该参数不变。
采用该手段,降低了训练的开销。
在此过程中,客户端若只有无标签数据,可进行无监督训练;若还有有标签数据,则客户端可进行半监督训练等。本方案对此不做具体限定。
402、向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。
通过步骤402,以便客户端得到目标模型,并进行推理。
作为一种可选的实现方式,在步骤402之前,还包括:
402-1、确认是否达到预设条件;
402-2、若达到所述预设条件,则执行步骤402;
402-3、若未达到所述预设条件,所述服务端向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数,以便所述客户端根据来自所述服务端的所述第一子网的参数更新所述第一模型,并重复执行步骤401、402-1、402-3,直到达到所述预设条件。
上述预设条件可以是重复执行步骤401、402-1、402-3的次数,其中客户端和服务端可预先确定停止次数,当达到该重复次数后,则停止训练。
上述预设条件还可以是服务端基于损失函数计算得到的损失值小于预设值等,本方案对此不做具体限定。
作为另一种可选的实现方式,若未达到所述预设条件,所述服务端仅向所述客户端发送更新后的所述第一子网的参数,并重复执行步骤401、402-1、402-3,直到达到所述预设条件。
如图4b所示,为本申请实施例提供的一种模型训练方法。其中,服务端的模型包括Embedding子网、ALBERT子网、Classifier子网和优化器子网。该实施例以短信分类业务为例进行说明。
在服务端的有监督训练中,服务端首先对短信文本进行数据预处理。这里的数据预处理可以是基于分词器进行分词操作。其中,服务端将分词器的输出结果输入到第二模型中。此外,服务端将有标签数据输入到交叉熵函数中,进而根据第二模型的输出计算得到相似度。然后将相似度输入到优化器中,进而更新第二模型的参数。
若未达到停止条件,服务端向客户端发送更新后的ALBERT子网的参数和Classifier子网的参数。然后,基于客户端发送的ALBERT子网的参数进行再次训练。直到达到停止条件,服务端向所述客户端发送更新后的ALBERT子网的参数和Classifier子网的参数,以便客户端进行推理。
由于目前服务端没有训练引擎,因此在进行训练的时候,需要搭建模型训练的模拟平台来进行模型训练。然而,例如在手机和服务端交互的真实场景中,该手段无法实现部署。
基于此,本申请实施例还提供一种服务端,所述服务端包括联邦学习服务端FL-Server模块和联邦学习工作站FL-Worker模块。参照图2所示,联邦学习服务端FL-Server模块,用于对K个客户端发送的K个第一子网的参数进行聚合处理,然后将处理后的参数发送给联邦学习工作站FL-Worker模块。FL-Worker模块用于根据该处理后的参数和服务端的有标签数据进行训练,得到更新后的所述第一子网的参数以及第三子网的参数,然后将更新后的所述第一子网的参数以及所述第三子网的参数发送至所述FL-Server模块。
采用上述手段,通过在服务端中加入联邦学习工作站FL-Worker模块,使得训练任务可以在服务端进行,实现了服务端可直接进行模型训练,提高了模型训练的效率。
通过本申请实施例,服务端基于客户端上报的第一子网的参数和有标签数据进行训练,然后向客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。其中,客户端上报的第一子网的参数为客户端基于无标签数据进行训练得到的。采用该手段,一方面保障了 客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
参照图5a所示,为本申请实施例提供的一种模型训练方法的流程示意图。该方法应用于客户端,所述客户端维护有无标签数据,其包括步骤501-503,具体如下:
501、根据无标签数据对第一模型进行训练,以获得所述第一模型的参数;
该第一模型可以是任意模型,如神经网络模型、支持向量机模型、决策树模型等。
上述对第一模型进行训练,可以是进行无监督训练。
作为一种可选的实现方式,进行无监督训练所使用的损失值是根据所述客户端的无标签数据和第一数据得到的,所述第一数据是将第二数据输入至所述第一模型进行处理得到的,所述第二数据是对所述无标签数据进行掩码处理得到的。
具体地,无监督训练可包括如下步骤:
首先对无标签数据进行掩码操作。掩码操作即为对原始的数据特征中的部分值进行替换操作,替换的值可以为特定的值,或可学习的参数。
例如,无标签数据为短信数据[特,选,进,口,高,科,技,面,膜],对该短信数据中的“进”进行掩码操作,其中,掩码操作后的数据为[特,选,MASK,口,高,科,技,面,膜]。
然后将掩码后的数据输入到用于无监督训练的模型中,得到模型的输出。
通过相似度函数(即损失函数),比较模型的输出和上述无标签数据的相似性。
将比较的结果输入到优化器中,进而更新模型的参数。
通过重复执行上述步骤,直到达到无监督训练的停止条件时停止训练。
上述仅为一种无监督训练的示例,其还可以是其他形式的无监督训练,本方案对此不做具体限定。
上述以获得所述第一模型的参数,可以理解为,对第一模型进行训练,以更新所述第一模型的参数。
502、向服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;
其中,在模型初始化时,客户端可对第一模型进行分解操作,将第一模型的各个子网的参数分别分解开来。
其中,第一模型包括多个子网,该多个子网包括第一子网、第二子网。
上述第一子网可以是用于进行特征提取的子网。上述第二子网可以是用于输出该第一模型的计算结果的子网。
如第一模型可包括嵌入Embedding子网、轻量化的来自于变形器的双向编码表示ALBERT子网、掩码语言模型(Masked Language Model,MLM)子网、自适应矩估计Adam优化器子网等。相应地,第一子网为ALBERT子网,第二子网为MLM子网。
上述子网可以理解为子模型。例如,Embedding子模型、ALBERT子模型、MLM子模型等。
上述第一模型仅为一种示例,其还可以是由其他子网组成的模型,本方案对此不做具体限定。
作为一种可选的实现方式,所述客户端向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
也就是说,所述客户端不仅向服务端发送所述第一模型中第一子网的参数,还发送除第 一子网的参数之外的参数。例如,还发送第二子网的参数,或者还发送其他子网的参数,其还可以向服务端发送其他所有子网的参数等,本方案对此不做具体限定。
作为另一种可选的实现方式,所述客户端仅向所述服务端发送所述第一模型中所述第一子网的参数。
也就是说,客户端不向服务端发送除所述第一子网的参数之外的参数。相较于现有技术中将训练得到的所有参数均传输给服务端,采用本方案,由于传输的数据较少,可以降低训练过程中的通信开销。
503、根据来自所述服务端的所述第一子网的参数和第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网,所述第三子网与所述第二子网对应。
其中,目标模型可用于进行推理。
作为一种可选的实现方式,在步骤503之前,还可包括:
503-1、确认是否达到预设条件;
503-2、若达到所述预设条件,则执行步骤503;
503-3、若未达到所述预设条件,根据来自所述服务端的所述第一子网的参数更新所述第一模型,并重复执行步骤501、502、503-1、503-3,直到达到所述预设条件。
上述预设条件可以是重复执行步骤501、502、503-1、503-3的次数,其中客户端和服务端可预先确定停止次数,当达到该重复次数后,则停止训练。
上述预设条件还可以是客户端基于损失函数计算得到的损失值小于预设值等,本方案对此不做具体限定。
如图5b所示,为本申请实施例提供的一种模型训练方法。其中,客户端的模型包括Embedding子网、ALBERT子网、MLM子网和优化器子网。该实施例以短信分类业务为例进行说明。
如图5b所示,在客户端的无监督训练任务中,客户端首先对短信文本进行数据预处理。数据预处理包括进行分词处理,然后进行掩码处理。其中,掩码处理后的结果作为模型的输入。其中,客户端还将分词处理后的结果输入到交叉熵函数中,用于和模型的输出计算相似度。然后将计算得到的相似度输入到优化器子网中,进而得到更新后的第一模型的参数。然后,客户端将ALBERT子网的参数发送给服务端。
在此过程中,服务端若只有有标签数据,则进行有监督训练;若服务端还有无标签数据,则服务端进行半监督训练。本方案对此不做具体限定。
作为一种可选的实现方式,所述第一模型还包括第五子网,所述第五子网的参数在训练前和训练后保持不变。例如,该第五子网可以是Embedding子网。
也就是说,在初始化时下发了第一模型的第五子网的参数,后面训练的过程中该参数不变。
采用该手段,在训练过程中第五子网的参数保持不变,降低了训练开销。
作为又一种可选的实现方式,所述第一模型的第二子网的参数在训练前和训练后保持不变。例如,该第二子网可以是MLM子网。
也就是说,在初始化时下发了第一模型的第二子网的参数,后面训练的过程中该参数不变。采用该手段,降低了训练开销。
通过本申请实施例,客户端基于无标签数据进行训练,并将第一子网的参数上报给服务端,并根据来自服务端的第一子网的参数和第三子网的参数得到目标模型,其中,来自服务端的第一子网的参数和第三子网的参数是服务端基于客户端上报的第一子网的参数和有标签 数据进行训练得到的。采用该手段,一方面保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
上述实施例对于客户端进行无监督训练为例进行说明,其中,客户端维护有有标签数据和无标签数据时,客户端可进行半监督训练。下面对于客户端进行半监督训练进行说明。如图6所示,本申请实施例还提供一种模型训练方法,包括步骤601-604,具体如下:
601、根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;
作为一种可选的实现方式,对第一模型进行无监督训练,进行无监督训练所使用的损失值是根据所述客户端的无标签数据和第一数据得到的,所述第一数据是将第二数据输入至所述第一模型进行处理得到的,所述第二数据是对所述无标签数据进行掩码处理得到的。
具体地,无监督训练可包括如下步骤:
首先对无标签数据进行掩码操作。
掩码操作即为对原始的数据特征中的部分值进行替换操作,替换的值可以为特定的值或可学习的参数。
然后将掩码后的数据输入到用于无监督训练的模型中,得到模型的输出。
通过相似度函数(即损失函数),比较模型的输出和上述无标签数据的相似性。
将比较的结果输入到优化器中,进而更新模型的参数。
通过重复执行上述步骤,直到达到无监督训练的停止条件时停止训练。
602、根据所述第一模型的参数和所述有标签数据对第四模型进行训练,以获得所述第四模型的参数;
例如,客户端基于第一模型的参数对第四模型的参数进行更新。然后第四模型基于有标签数据进行有监督训练。
作为一种可选的实现方式,在步骤602之后,还包括:
6021、当达到客户端侧半监督训练的停止条件,则执行步骤603;
6022、若未达到客户端侧半监督训练的停止条件,则根据所述第四模型的参数对所述第一模型的参数进行更新,并重复执行步骤601、602、6022,直到达到所述客户端侧半监督训练的停止条件。
上述客户端侧半监督训练的停止条件,可以是重复执行步骤601、602、6022的次数等,本方案对此不做具体限定。
603、向所述服务端发送所述第四模型中第一子网的参数和第二子网的参数;
所述第四模型包括第一子网和第二子网。
作为一种可选的实现方式,向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
作为另一种可选的实现方式,仅向所述服务端发送所述第四模型中所述第一子网的参数和所述第二子网的参数。
相较于现有技术中客户端向服务端发送训练得到的所有参数,采用本方案,由于传输的数据较少,在一定程度上可以降低训练过程中的通信开销。
作为一种可选的实现方式,在步骤603之前,还包括:
6031、若未达到预设条件,根据来自所述服务端的所述第一子网的参数更新所述第一模 型,根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型,并重复执行步骤601、602、6031,直到达到所述预设条件;
6032、若达到所述预设条件,执行步骤603。
604、根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型,其中,更新后的所述第四模型包括所述第一子网和所述第二子网。
通过本申请实施例,客户端基于无标签数据对第一模型进行无监督训练,并基于有标签数据对第四模型进行有监督训练,然后向服务端发送第四模型的第一子网和第二子网的参数,然后客户端基于来自服务端的所述第一子网和第二子网的参数进行更新。采用该手段,一方面保障了客户端的隐私数据的安全性,同时提高了模型在无标签数据上的特征提取能力,节省人力成本。本方案可以实现只在服务端有有标签数据,在客户端完全没有有标签数据时,也能够进行横向联邦学习,从而适应缺乏标签数据的现实场景。
如图7所示,是本申请实施例提供的一种模型训练装置的硬件结构示意图。图7所示的模型训练装置700(该装置700具体可以是一种计算机设备)包括存储器701、处理器702、通信接口703以及总线704。
其中,存储器701、处理器702、通信接口703通过总线704实现彼此之间的通信连接。
存储器701可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。
存储器701可以存储程序,当存储器701中存储的程序被处理器702执行时,处理器702和通信接口703用于执行本申请实施例的模型训练方法的各个步骤。
处理器702可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的模型训练装置中的单元所需执行的功能,或者执行本申请方法实施例的模型训练方法。
处理器702还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的模型训练方法的各个步骤可以通过处理器702中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器702还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器701,处理器702读取存储器701中的信息,结合其硬件完成本申请实施例的模型训练装置中包括的单元所需执行的功能,或者执行本申请方法实施例的模型训练方法。
通信接口703使用例如但不限于收发器一类的收发装置,来实现装置700与其他设备或通信网络之间的通信。例如,可以通过通信接口703获取数据。
总线704可包括在装置700各个部件(例如,存储器701、处理器702、通信接口703)之间传送信息的通路。
应注意,尽管图7所示的装置700仅仅示出了存储器、处理器、通信接口,但是在具体 实现过程中,本领域的技术人员应当理解,装置700还包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置700还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置700也可仅仅包括实现本申请实施例所必须的器件,而不必包括图7中所示的全部器件。
与上述实施例一致的,另一方面,本申请实施例还提供一种模型训练系统,如图8a所示,该模型训练系统800包括服务端801和客户端802,所述服务端801维护有有标签数据,所述客户端802维护有无标签数据,其中,所述客户端802用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;所述客户端802还用于向所述服务端801发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;所述服务端801用于根据所述客户端802上报的所述第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网,所述第三子网与所述第二子网对应;所述服务端801还用于向所述客户端802发送更新后的所述第一子网的参数以及所述第三子网的参数;所述客户端802还用于根据来自所述服务端801的所述第一子网的参数和所述第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网。
作为一种可选的实现方式,在向所述服务端801发送所述第一模型的参数的方面,所述客户端802用于仅向所述服务端801发送所述第一模型中所述第一子网的参数,而不向所述服务端801发送所述第一模型中除所述第一子网的参数之外的参数。
作为另一种可选的实现方式,所述客户端802还用于向所述服务端801发送所述第一模型中除所述第一子网的参数之外的参数。
作为一种可选的实现方式,所述客户端802的数量为K个,K为大于1的整数,所述服务端801还用于对来自所述K个客户端802的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;在根据所述客户端上报的所述第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数的方面,所述服务端801还用于根据所述处理后的第一子网的参数和所述有标签数据对所述服务端801的第二模型进行训练,以更新所述第二模型的参数。
作为一种可选的实现方式,所述第二模型的第三子网用于输出所述第二模型的计算结果;所述第一模型的第二子网用于输出所述第一模型的计算结果,其中,所述第二模型的第三子网与所述第一模型的第二子网的结构不同。
如图8b所示,本申请实施例提供一种模型训练装置,所述装置803包括:训练模块8031和发送模块8032,其中:
训练模块8031,用于根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网;
发送模块8032,用于向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。
作为一种可选的实现方式,所述客户端的数量为K个,K为大于1的整数,所述装置还包括处理模块,用于:对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;所述训练模块,还用于根据所述处理后的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
作为一种可选的实现方式,所述训练模块8031,还用于:根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
如图8c所示,本申请实施例还提供一种模型训练装置,所述装置804包括:训练模块8041、发送模块8042和获取模块8043,具体如下:
训练模块8041,用于根据无标签数据对第一模型进行训练,以获得所述第一模型的参数;
发送模块8042,用于向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;
获取模块8043,用于根据来自所述服务端的所述第一子网的参数和第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网,所述第三子网与所述第二子网对应。
作为一种可选的实现方式,所述发送模块8042,还用于:向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
作为另一种可选的实现方式,所述发送模块8042,还用于:仅向所述服务端发送所述第一模型中所述第一子网的参数,而不向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
如图8d所示,本申请实施例还提供一种模型训练装置,所述装置805包括:
训练模块8051,用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;根据所述第一模型的参数和所述有标签数据对第四模型进行训练,以获得所述第四模型的参数;
发送模块8052,用于向所述服务端发送所述第四模型中第一子网的参数和第二子网的参数;
更新模块8053,用于根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型。
作为一种可选的实现方式,所述发送模块8052,还用于:向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
作为另一种可选的实现方式,所述发送模块8052,还用于:仅向所述服务端发送所述第四模型中所述第一子网的参数和所述第二子网的参数,而不向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
本申请实施例还提供一种芯片系统,所述芯片系统应用于电子设备;所述芯片系统包括一个或多个接口电路,以及一个或多个处理器;所述接口电路和所述处理器通过线路互联;所述接口电路用于从所述电子设备的存储器接收信号,并向所述处理器发送所述信号,所述信号包括所述存储器中存储的计算机指令;当所述处理器执行所述计算机指令时,所述电子设备执行所述模型训练方法。
本申请实施例还提供一种模型训练装置,包括处理器和存储器;其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行所述模型训练方法。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个 步骤。
本申请实施例还提供了一种包含指令的计算机程序产品。当该计算机程序产品在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
应理解,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;其中A,B可以是单数或者复数。并且,在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。所显示或讨论的相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行该计算机程序指令时,全部或部分地产生按照本申请实施例的流程或功能。该计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。该计算机指令可以存储在计算机可读存储介质中,或者通过该计算机可读存储介质进行传输。该计算机指令可以从一个网站站点、计算机、服务端或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务端或数据中心进行传输。该计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务端、数据中心等数据存储设备。该可用介质可以是只读存储器(read-only memory,ROM),或随机存取存储器(random access memory,RAM),或磁性介质,例如,软盘、硬盘、磁带、磁碟、或光介质,例如,数字通用光盘(digital versatile disc,DVD)、或者半导体介质,例如,固态硬盘(solid state disk,SSD)等。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (26)

  1. 一种模型训练系统,所述模型训练系统包括服务端和客户端,所述服务端维护有有标签数据,所述客户端维护有无标签数据,其特征在于:
    所述客户端用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;
    所述客户端还用于向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;
    所述服务端用于根据所述客户端上报的所述第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网,所述第三子网与所述第二子网对应;
    所述服务端还用于向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数;
    所述客户端还用于根据来自所述服务端的所述第一子网的参数和所述第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网。
  2. 根据权利要求1所述的系统,其特征在于,在向所述服务端发送所述第一模型的参数的方面,所述客户端用于仅向所述服务端发送所述第一模型中所述第一子网的参数。
  3. 根据权利要求1所述的系统,其特征在于,所述客户端还用于向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
  4. 根据权利要求1至3任一项所述的系统,其特征在于,所述客户端的数量为K个,K为大于1的整数,所述服务端还用于对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;
    在根据所述客户端上报的所述第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数的方面,所述服务端用于根据所述处理后的第一子网的参数和所述有标签数据对所述服务端的第二模型进行训练,以更新所述第二模型的参数。
  5. 根据权利要求1至4任一项所述的系统,其特征在于,所述第二模型的第三子网用于输出所述第二模型的计算结果;所述第一模型的第二子网用于输出所述第一模型的计算结果,其中,所述第二模型的第三子网与所述第一模型的第二子网的结构不同。
  6. 一种模型训练方法,应用于服务端,所述服务端维护有有标签数据,其特征在于,所述方法包括:
    根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网;
    向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。
  7. 根据权利要求6所述的方法,其特征在于,所述客户端的数量为K个,K为大于1的整数,所述方法还包括:
    对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;
    所述根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,包括:
    根据所述处理后的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
  8. 根据权利要求6或7所述的方法,其特征在于,所述服务端还维护有无标签数据,所述根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,包括:
    根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;
    根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
  9. 一种模型训练方法,应用于客户端,所述客户端维护有无标签数据,其特征在于,所述方法包括:
    根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;
    向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;
    根据来自所述服务端的所述第一子网的参数和第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网,所述第三子网与所述第二子网对应。
  10. 根据权利要求9所述的方法,其特征在于,所述客户端仅向所述服务端发送所述第一模型中所述第一子网的参数,而不向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
  11. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
  12. 一种模型训练方法,应用于客户端,所述客户端维护有无标签数据和有标签数据,其特征在于,所述方法包括:
    根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;
    根据所述第一模型的参数和所述有标签数据对第四模型进行训练,以获得所述第四模型的参数;
    向所述服务端发送所述第四模型中第一子网的参数和第二子网的参数;
    根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型。
  13. 根据权利要求12所述的方法,其特征在于,仅向所述服务端发送所述第四模型中所述第一子网的参数和所述第二子网的参数,而不向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
  14. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的 参数。
  15. 一种模型训练装置,其特征在于,所述装置包括:
    训练模块,用于根据客户端上报的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数,所述第二模型包括所述第一子网和第三子网;
    发送模块,用于向所述客户端发送更新后的所述第一子网的参数以及所述第三子网的参数。
  16. 根据权利要求15所述的装置,其特征在于,所述客户端的数量为K个,K为大于1的整数,所述装置还包括处理模块,用于:
    对来自所述K个客户端的K个第一子网的参数进行聚合处理,以得到处理后的第一子网的参数;
    所述训练模块,还用于:
    根据所述处理后的第一子网的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
  17. 根据权利要求15或16所述的装置,其特征在于,所述训练模块,还用于:
    根据客户端上报的第一子网的参数和所述无标签数据对第三模型进行训练,以更新所述第三模型的参数;
    根据所述第三模型的参数和所述有标签数据对第二模型进行训练,以更新所述第二模型的参数。
  18. 一种模型训练装置,其特征在于,所述装置包括:
    训练模块,用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;
    发送模块,用于向所述服务端发送所述第一模型中第一子网的参数,所述第一模型还包括第二子网;
    更新模块,用于根据来自所述服务端的所述第一子网的参数和第三子网的参数得到目标模型,其中,所述目标模型包括所述第一子网和所述第三子网,所述第三子网与所述第二子网对应。
  19. 根据权利要求18所述的装置,其特征在于,所述发送模块,用于:
    仅向所述服务端发送所述第一模型中所述第一子网的参数,而不向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
  20. 根据权利要求18所述的装置,其特征在于,所述发送模块,还用于:
    向所述服务端发送所述第一模型中除所述第一子网的参数之外的参数。
  21. 一种模型训练装置,其特征在于,所述装置包括:
    训练模块,用于根据所述无标签数据对第一模型进行训练,以获得所述第一模型的参数;根据所述第一模型的参数和所述有标签数据对第四模型进行训练,以获得所述第四模型的参数;
    发送模块,用于向所述服务端发送所述第四模型中第一子网的参数和第二子网的参数;
    更新模块,用于根据来自所述服务端的所述第一子网的参数和所述第二子网的参数更新所述第四模型。
  22. 根据权利要求21所述的装置,其特征在于,所述发送模块,用于:
    仅向所述服务端发送所述第四模型中所述第一子网的参数和所述第二子网的参数,而不向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
  23. 根据权利要求21所述的装置,其特征在于,所述发送模块,还用于:
    向所述服务端发送所述第四模型中除所述第一子网的参数和所述第二子网的参数之外的参数。
  24. 一种模型训练装置,其特征在于,包括处理器和存储器;其中,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行如权利要求6至8任意一项所述的方法,和/或9至11任意一项所述的方法,和/或12至14任意一项所述的方法。
  25. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行以实现如权利要求6至8任意一项所述的方法,和/或9至11任意一项所述的方法,和/或12至14任意一项所述的方法。
  26. 一种计算机程序产品,其特征在于,当计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求6至8任意一项所述的方法,和/或9至11任意一项所述的方法,和/或12至14任意一项所述的方法。
PCT/CN2022/095802 2021-06-15 2022-05-28 模型训练方法及相关系统、存储介质 WO2022262557A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22824037.0A EP4354361A1 (en) 2021-06-15 2022-05-28 Model training method and related system, and storage medium
US18/540,144 US20240119368A1 (en) 2021-06-15 2023-12-14 Model training method, related system, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110662048.9A CN115481746B (zh) 2021-06-15 2021-06-15 模型训练方法及相关系统、存储介质
CN202110662048.9 2021-06-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/540,144 Continuation US20240119368A1 (en) 2021-06-15 2023-12-14 Model training method, related system, and storage medium

Publications (1)

Publication Number Publication Date
WO2022262557A1 true WO2022262557A1 (zh) 2022-12-22

Family

ID=84419234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095802 WO2022262557A1 (zh) 2021-06-15 2022-05-28 模型训练方法及相关系统、存储介质

Country Status (4)

Country Link
US (1) US20240119368A1 (zh)
EP (1) EP4354361A1 (zh)
CN (2) CN115481746B (zh)
WO (1) WO2022262557A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275207A (zh) * 2020-02-10 2020-06-12 深圳前海微众银行股份有限公司 基于半监督的横向联邦学习优化方法、设备及存储介质
US20200218937A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Generative adversarial network employed for decentralized and confidential ai training
CN112887145A (zh) * 2021-01-27 2021-06-01 重庆邮电大学 一种基于分布式的网络切片故障检测方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109840530A (zh) * 2017-11-24 2019-06-04 华为技术有限公司 训练多标签分类模型的方法和装置
CN111310938A (zh) * 2020-02-10 2020-06-19 深圳前海微众银行股份有限公司 基于半监督的横向联邦学习优化方法、设备及存储介质
CN112434142B (zh) * 2020-11-20 2023-04-07 海信电子科技(武汉)有限公司 一种标记训练样本的方法、服务器、计算设备及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200218937A1 (en) * 2019-01-03 2020-07-09 International Business Machines Corporation Generative adversarial network employed for decentralized and confidential ai training
CN111275207A (zh) * 2020-02-10 2020-06-12 深圳前海微众银行股份有限公司 基于半监督的横向联邦学习优化方法、设备及存储介质
CN112887145A (zh) * 2021-01-27 2021-06-01 重庆邮电大学 一种基于分布式的网络切片故障检测方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WONYONG JEONG; JAEHONG YOON; EUNHO YANG; SUNG JU HWANG: "Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning", ARXIV.ORG, 29 March 2021 (2021-03-29), XP081899292 *
YUCHEN ZHAO; HANYANG LIU; HONGLIN LI; PAYAM BARNAGHI; HAMED HADDADI: "Semi-supervised Federated Learning for Activity Recognition", ARXIV.ORG, 2 November 2020 (2020-11-02), XP081805371 *

Also Published As

Publication number Publication date
CN115481746B (zh) 2023-09-01
US20240119368A1 (en) 2024-04-11
CN117494834A (zh) 2024-02-02
CN115481746A (zh) 2022-12-16
EP4354361A1 (en) 2024-04-17

Similar Documents

Publication Publication Date Title
CN110249342B (zh) 使用机器学习模型的自适应信道编码
US11715044B2 (en) Methods and systems for horizontal federated learning using non-IID data
CN111310932A (zh) 横向联邦学习系统优化方法、装置、设备及可读存储介质
CN111030861B (zh) 一种边缘计算分布式模型训练方法、终端和网络侧设备
CN113627085B (zh) 横向联邦学习建模优化方法、设备、介质
CN110430068A (zh) 一种特征工程编排方法及装置
CN110533106A (zh) 图像分类处理方法、装置及存储介质
CN110795558B (zh) 标签获取方法和装置、存储介质及电子装置
WO2022217210A1 (en) Privacy-aware pruning in machine learning
CN114091572A (zh) 模型训练的方法、装置、数据处理系统及服务器
WO2022262557A1 (zh) 模型训练方法及相关系统、存储介质
CN113537495A (zh) 基于联邦学习的模型训练系统、方法、装置和计算机设备
CN113779422A (zh) 关系链标签的实现方法、装置、电子设备及存储介质
US20230038310A1 (en) Devices, Methods, and System for Heterogeneous Data-Adaptive Federated Learning
Wen et al. Cloud-computing-based framework for multi-camera topology inference in smart city sensing system
WO2023065640A1 (zh) 一种模型参数调整方法、装置、电子设备和存储介质
CN112738225B (zh) 基于人工智能的边缘计算方法
CN111967612A (zh) 横向联邦建模优化方法、装置、设备及可读存储介质
CN112348197A (zh) 基于联邦学习的模型生成方法及装置
CN105323142B (zh) 一种基于图像识别的信息传输方法、系统及移动终端
CN113283521B (zh) 一种条件生成对抗网络生成方法及装置
CN113946758B (zh) 一种数据识别方法、装置、设备及可读存储介质
Agrawal Machine Learning for 5G RAN
CN114048804B (zh) 一种分类模型训练方法及装置
CN113810212B (zh) 5g切片用户投诉的根因定位方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22824037

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022824037

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022824037

Country of ref document: EP

Effective date: 20231214

NENP Non-entry into the national phase

Ref country code: DE