WO2024055979A1 - 模型训练方法、装置以及系统和存储介质 - Google Patents
模型训练方法、装置以及系统和存储介质 Download PDFInfo
- Publication number
- WO2024055979A1 WO2024055979A1 PCT/CN2023/118478 CN2023118478W WO2024055979A1 WO 2024055979 A1 WO2024055979 A1 WO 2024055979A1 CN 2023118478 W CN2023118478 W CN 2023118478W WO 2024055979 A1 WO2024055979 A1 WO 2024055979A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- terminal
- model
- training
- cloud
- output
- Prior art date
Links
- 238000012549 training Methods 0.000 title claims abstract description 677
- 238000000034 method Methods 0.000 title claims abstract description 135
- 238000010801 machine learning Methods 0.000 claims abstract description 62
- 230000015654 memory Effects 0.000 claims description 28
- 238000012216 screening Methods 0.000 claims description 16
- 230000004044 response Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 description 45
- 238000010586 diagram Methods 0.000 description 26
- 230000006870 function Effects 0.000 description 16
- 238000004891 communication Methods 0.000 description 14
- 238000012545 processing Methods 0.000 description 13
- 238000004590 computer program Methods 0.000 description 8
- 102100028892 Cardiotrophin-1 Human genes 0.000 description 4
- 101000916283 Homo sapiens Cardiotrophin-1 Proteins 0.000 description 4
- 102100037676 CCAAT/enhancer-binding protein zeta Human genes 0.000 description 3
- 101000880588 Homo sapiens CCAAT/enhancer-binding protein zeta Proteins 0.000 description 3
- 101100166823 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) CTF3 gene Proteins 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013475 authorization Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 208000000044 Amnesia Diseases 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 231100000863 loss of memory Toxicity 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Definitions
- Embodiments of the present disclosure relate to a model training method, a model training device, a model training system and a non-transitory computer-readable storage medium.
- Federated Learning is a distributed machine learning technology. Its core idea is to conduct distributed model training among multiple data sources with local data, without the need to exchange local data between multiple data sources. Under the premise of only exchanging model parameters or intermediate results, a global model based on virtual fusion data is constructed to achieve cross-institutional data sharing, thereby achieving a balance between data privacy protection and data sharing calculations, that is, "data is available but not visible” ", "the data does not move but the model moves” application mode.
- At least one embodiment of the present disclosure provides a model training method, which is applied to a server and used to train a machine learning model, wherein the machine learning model includes a Yunzi model and M terminal models, where the Yunzi model Run on the server, the M terminal models run on at least one terminal, M is a positive integer, the model training method includes: obtaining cloud training features; using the cloud training features to train the cloud sub-model, To obtain the cloud output result of the cloud model; send the cloud output result and the current parameters of the M terminal models to the at least one terminal; receive the M terminal models output by the at least one terminal N terminals of The terminal gradients output by the sub-models respectively, where N is a positive integer and less than or equal to M, and the terminal gradient output by each terminal model in the N terminal models includes the parameter gradient and cloud output gradient of the terminal model; based on The parameter gradients of the Yunzi model are calculated from the terminal gradients respectively output by the N terminal models and the cloud output results; using the parameter gradients of the N terminal models and the parameter gradient
- At least one embodiment of the present disclosure provides a model training method, which is applied to a first terminal and used to train a machine learning model, wherein the machine learning model includes a Yunzi model and a first terminal model, and the Yunzi model Running on the server, the first terminal model runs on the first terminal, wherein the model training method includes: obtaining at least one terminal training sample, wherein each terminal training sample includes terminal training features and sample labels ; Based on the at least one terminal training sample, send a training request to the server; receive from the server the cloud output corresponding to the at least one terminal training sample and the current parameters of the first terminal model; utilize the cloud The output, the current parameters of the first terminal model and the at least one terminal training sample are used to train the first terminal model to obtain the terminal gradient output by the first terminal model, wherein the terminal gradient includes the the parameter gradient of the first terminal model and the cloud output gradient; output the terminal gradient to the server, so that the server calculates the parameter gradient of the cloud sub-model based on the terminal gradient and the cloud output, and Using the parameter
- At least one embodiment of the present disclosure also provides a model training device, including: one or more memories non-transiently storing computer-executable instructions; one or more processors configured to run the computer-executable instructions, Wherein, the computer-executable instructions implement the model training method according to any embodiment of the present disclosure when run by the one or more processors.
- At least one embodiment of the present disclosure also provides a model training system for training a machine learning model and including: at least one terminal and a server, wherein the machine learning model includes a Yunzi model and M terminal models, and the The cloud model runs on the server, the M terminal models run on the at least one terminal, M is a positive integer, and the server is configured to: obtain cloud training features; use the cloud training features to The Yunzi model is trained to obtain the cloud output result of the Yunzi model; and the cloud output result and the M terminal models are sent.
- the machine learning model includes a Yunzi model and M terminal models
- M runs on the server
- M terminal models run on the at least one terminal
- M is a positive integer
- the server is configured to: obtain cloud training features; use the cloud training features to The Yunzi model is trained to obtain the cloud output result of the Yunzi model; and the cloud output result and the M terminal models are sent.
- each terminal in the at least one terminal includes the parameter gradient and cloud output gradient of the terminal model; the terminal gradient output by the N terminal models and the cloud output result are calculated based on The parameter gradient of the Yunzi model; using the parameter gradient of the N terminal models and the parameter gradient of the Yunzi model, adjust the current parameters of the N terminal models and the current parameters of the Yunzi model; so
- Each terminal in the at least one terminal is configured to: obtain at least one terminal training sample, wherein each terminal training sample includes a terminal training feature and a sample label, and the cloud training feature includes the same as the at least one terminal training sample.
- a corresponding at least one sub-cloud training feature receiving from the server the cloud output corresponding to the at least one terminal training sample and the current parameters of the terminal model running on the terminal, wherein the cloud output result includes the Cloud output; using the cloud output, the current parameters of the terminal model running on the terminal and the at least one terminal training sample to train the terminal model running on the terminal to obtain the terminal model running on the terminal Output the terminal gradient; output the terminal gradient output by the terminal model running on the terminal to the server.
- At least one embodiment of the present disclosure also provides a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are implemented when executed by a processor.
- a model training method according to any embodiment of the present disclosure.
- Figure 1A is a schematic diagram of a machine learning model provided by at least one embodiment of the present disclosure
- Figure 1B is a schematic diagram of another machine learning model provided by at least one embodiment of the present disclosure.
- Figure 2 is a schematic flow chart of a model training method provided by at least one embodiment of the present disclosure
- Figure 3 is a schematic diagram of interaction between a terminal and a server provided by at least one embodiment of the present disclosure
- Figure 4 is a schematic diagram of another model training method provided by at least one embodiment of the present disclosure.
- Figure 5 is a schematic diagram of a model training system provided by at least one embodiment of the present disclosure.
- Figure 6 is a schematic diagram of the overall process of model training by a model training system provided by at least one embodiment of the present disclosure
- Figure 7 is an example diagram of a specific training process of model training by a model training system provided by at least one embodiment of the present disclosure
- Figure 8 is an example diagram of a specific training process of model training by a model training system provided by at least one embodiment of the present disclosure
- Figure 9 is a schematic block diagram of a model training device provided by at least one embodiment of the present disclosure.
- Figure 10 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
- Figure 11 is a schematic diagram of the hardware structure of an electronic device provided by at least one embodiment of the present disclosure.
- the term “include” and its variations are open-ended, ie, “including but not limited to.”
- the term “based on” means “based at least in part on.”
- the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; and the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
- Federated learning refers to a method of jointly performing machine learning modeling by uniting multiple participants (terminals) with data ownership.
- the participants with data do not need to expose their data to the central server (also called the parameter server), but jointly complete the model training process through parameter or gradient updates. Therefore, federated learning can protect user privacy data and complete the modeling training process.
- machine learning models are often very large and require a lot of computing power to be trained quickly.
- the traditional model training method is to save the user's data in the cloud, and then use the powerful computing power of the server to quickly train the model.
- a huge model also corresponds to a large amount of training data, which will cause greater storage pressure on the server. In order to maintain a balance between model effect and training speed, it is often necessary to use batch training.
- At least one embodiment of the present disclosure provides a model training method, which is applied to a server and used to train a machine learning model.
- the machine learning model includes the Yunzi model and M terminal models.
- the Yunzi model runs on the server, and the M terminal models run on at least one terminal.
- M is a positive integer.
- the model training method includes: obtaining cloud training features; using cloud training The feature trains the Yunzi model to obtain the cloud output result of the Yunzi model; sends the cloud output result and the current parameters of the M terminal models to at least one terminal; receives N terminals among the M terminal models output by at least one terminal.
- the terminal gradient output by the model respectively, where N is a positive integer and is less than or equal to M.
- the terminal gradient output by each terminal model in the N terminal models includes the parameter gradient and cloud output gradient of the terminal model; based on N terminal models
- the parameter gradients of the Yunzi model are calculated from the respectively output terminal gradients and cloud output results; using the parameter gradients of the N terminal models and the parameter gradients of the Yunzi model, the current parameters of the N terminal models and the current parameters of the Yunzi model are calculated. Adjustment.
- the model training method divides the machine learning model into clouds Model and terminal model, thereby realizing federated machine learning between servers and terminals, achieving user privacy and data security protection, and solving the problem that models on terminals such as in-vehicle entertainment devices are too large to be trained; in addition, it can Different terminal models are used according to different terminals, so that the model training process is more flexible and the application scenarios are wider; the server can perform federated machine learning with multiple terminals at the same time, and on the basis of ensuring the model effect of the trained machine learning model, Greatly improve model training speed and save model training time.
- At least one embodiment of the present disclosure also provides a model training device, a model training system, and a non-transitory computer-readable storage medium.
- the model training method can be applied to the model training device provided by the embodiment of the present disclosure, and the model training device can be configured on an electronic device.
- the electronic device may be a fixed terminal, a mobile terminal, etc.
- Figure 1A is a schematic diagram of a machine learning model provided by at least one embodiment of the present disclosure.
- Figure 1B is a schematic diagram of another machine learning model provided by at least one embodiment of the present disclosure.
- Figure 2 is a schematic diagram of a machine learning model provided by at least one embodiment of the present disclosure.
- the model training method provided by the embodiments of the present disclosure can be applied to the server, that is, the server implements the model training method.
- the server may be a cloud server, etc., and the server may include a central processing unit (CPU) or other devices with data processing capabilities and/or program execution capabilities.
- CPU central processing unit
- the model training method can be used to train a machine learning model
- the machine learning model can be a neural network model, etc.
- the first part is a terminal model with a smaller model structure executed by the terminal
- the second part is a terminal model executed by the terminal.
- the model structure executed by the server is a Yunzi model with a larger structure.
- the terminal model is relatively simple and consists of several neural network layers at the top of the original machine learning model, making it suitable for terminals with smaller computing power to avoid increasing the computing power burden on the terminal.
- Different terminal models can be used according to different terminals, that is, the terminal models on each terminal can use different structures according to needs; in addition, the input of different terminal models can also be set according to different terminals.
- the Yunzi model contains most of the structure of the machine learning model. Therefore, the Yunzi model is relatively complex and is mainly executed on the server. Use the powerful computing power of the server to complete model training.
- the Yunzi model cooperates with each terminal model to complete the federated training process.
- the machine learning model may include a cloud model and M terminal models.
- Figure 1A shows three terminal models, namely terminal model A, terminal model B and terminal model C.
- Each terminal model and Yunzi model together form a complete model, which can be used to implement predetermined functions, such as classification, prediction and other functions.
- M terminal models run on at least one terminal
- M is a positive integer
- at least one terminal model can run on each terminal.
- one terminal model can run on each terminal.
- M The terminal models are respectively run on M terminals.
- the three terminal models shown in FIG. 1A can be run on three terminals respectively.
- multiple terminal models can be run on one terminal, for example, at least two terminal models among the three terminal models shown in Figure 1A can also be run on the same terminal, for example, as shown in Figure 1A Terminal model A and terminal model B are executed by the same terminal.
- the Yunzi model runs on the server.
- Each server can run at least one Yunzi model.
- Yunzi model A and terminal model D together form a complete model
- Yunzi model B and terminal model E together form a complete Models
- Yunzi model A and Yunzi model B can be run by the same server
- terminal model D and terminal model E can be run by the same terminal, or they can be run by different terminals.
- each Yunzi model can correspond to at least one terminal model, as shown in Figure 1A.
- One Yunzi model can correspond to three terminal models.
- the output of the Yunzi model can be transmitted to the three terminal models; such as As shown in Figure 1B, a Yunzi model corresponds to a terminal model, Yunzi model A corresponds to terminal model D, Yunzi model B corresponds to terminal model E, so the output of Yunzi model A is transmitted to terminal model D, Yunzi model B The output of is transferred to terminal model E.
- the cloud model corresponds to the terminal model
- the terminal model and the cloud model can jointly form a complete model
- the inputs of the M terminal models match the output of the Yunzi model, that is, the Yunzi model outputs feature maps of the same size to the M terminal models.
- the sub-cloud output 1, sub-cloud Cloud output 2 and sub-cloud output 3 are the same size.
- each terminal model may include terminal input and sub-cloud output, as shown in Figure 1A.
- the input of terminal model A may include sub-cloud output 1 and terminal input 1
- the input of terminal model B may Including sub-cloud output 2 and terminal input 2
- the input of terminal model C may include sub-cloud output 3 and terminal input 3.
- the terminal input may be terminal training features (described below) stored on the terminal running the terminal model.
- M terminal models achieve the same goal, such as regulating temperature, etc.
- terminal models can run on different terminals, and the different terminals can be the same type of terminals applied in different scenarios, or different types of terminals applied in the same scenario or different scenarios.
- terminal model A can run on terminal 1
- terminal model B can run on terminal 2
- terminal model C can run on terminal 3.
- terminal 1 can be a vehicle air conditioner
- terminal 2 can be an air conditioner in the living room
- terminal 3 can be an air conditioner in the bedroom.
- each terminal and server can be set up separately and communicate through a network.
- a network may include a wireless network, a wired network, and/or any combination of wireless and wired networks.
- the network may include a local area network, the Internet, a telecommunications network, the Internet of Things (Internet of Things) based on the Internet and/or telecommunications networks, and/or any combination of the above networks.
- a wired network may use twisted pair, coaxial cable, or fiber optic transmission to communicate, and a wireless network may use a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi communication method. This disclosure does not limit the type and function of the network.
- the terminal may be various mobile terminals, fixed terminals, etc., for example, the terminal may include an application (App) of the mobile terminal.
- the mobile terminal can be a tablet computer, a vehicle-mounted device, a notebook computer, smart glasses, a smart watch, a vehicle-mounted entertainment device, etc.
- Fixed terminals can be desktop computers, smart home appliances (for example, smart air conditioners, smart refrigerators, smart purifiers, smart switches, smart gateways, smart rice cookers, etc.).
- the model training method may include the following steps S100 to S105.
- step S100 cloud training features are obtained.
- step S101 the Yunzi model is trained using cloud training features to obtain the cloud output result of the Yunzi model.
- step S102 the cloud output result and the current parameters of the M terminal models are sent to at least one terminal.
- step S103 receive N terminals among the M terminal models output by at least one terminal.
- the model outputs the terminal gradients respectively.
- N is a positive integer and is less than or equal to M.
- the terminal gradient output by each terminal model in the N terminal models includes the parameter gradient of the terminal model and the cloud output gradient.
- step S104 the parameter gradient of the cloud model is calculated based on the end gradients output by the N terminal models and the cloud output results.
- step S105 the current parameters of the N terminal models and the current parameters of the Yunzi model are adjusted using the parameter gradients of the N terminal models and the parameter gradients of the Yunzi model.
- Steps S100 to S101 represent the forward propagation process of the Yunzi model
- steps S103 to S104 represent the backpropagation process of the Yunzi model.
- the cloud training feature may include at least one sub-cloud training feature corresponding to each terminal, and the sub-cloud training feature may be information that has been disclosed by the terminal and/or information authorized by the terminal to the server, etc. that does not involve the terminal.
- the terminal can be a vehicle air conditioner.
- the sub-cloud training features corresponding to the terminal can be the ambient temperature, address, time and other information of the location of the motor vehicle to which the vehicle air conditioner belongs. The specific content of the sub-cloud training features can be determined according to actual conditions, and this disclosure does not limit this.
- At least one sub-cloud training feature can be stored in the server.
- the server receives a training request from the terminal, it can obtain the sub-cloud training feature corresponding to the terminal based on the identification information and other information in the training request.
- At least one terminal includes a first terminal
- step S100 may include: receiving a training request issued by the first terminal; and obtaining at least one first sub-cloud training feature based on the training request issued by the first terminal.
- the cloud training features include at least one first sub-cloud training feature, and the at least one first sub-cloud training feature corresponds to the first terminal.
- the training request issued by the first terminal includes the identification information and sample identification of the first terminal
- the server can obtain the at least one first sub-cloud training feature based on the identification information and sample identification of the first terminal.
- sample identification can represent the identification information of the terminal training samples (described below). Based on the sample identification, it can be determined which terminal training samples are used for training, so that the server can obtain the information corresponding to these terminal training samples. sub-cloud training features for training.
- Each terminal will periodically query the server for model training at regular intervals (the period of time is at the minute level, for example, the period of time can be one minute, two minutes, five minutes, etc.). During this period, each terminal There are generally not many new terminal training features added to each terminal. When there are tens of millions When the terminal needs to perform model training with the server, the number of new samples added to each terminal device at each moment is very small. If the server trains each terminal separately, it will consume a lot of server resources and greatly reduce the training speed. Therefore, the model training method provided by the embodiments of the present disclosure can perform merged training, that is, merge the terminal training features of multiple terminals to form a batch for training, thereby increasing the training speed, saving training time, and optimizing or reducing the server load. Resource consumption, real-time sample merging solution to solve the problem of insufficient samples on the end.
- At least one terminal includes a first terminal and a second terminal.
- Step S100 may include: receiving a training request issued by the first terminal; and obtaining at least one first sub-cloud training based on the training request issued by the first terminal.
- Features receiving a training request sent by the second terminal; obtaining at least one second sub-cloud training feature based on the training request sent by the second terminal; merging at least one first sub-cloud training feature and at least one second sub-cloud training feature Process to obtain cloud training features.
- the server can perform federated machine learning with multiple terminals at the same time, thereby greatly increasing the model training speed, reducing the server's resource consumption, and reducing the pressure on the server.
- the training request issued by the second terminal includes identification information of the second terminal, sample identification, etc.
- the absolute value of the time difference between the time of the training request issued by the first terminal and the time of the training request issued by the second terminal is within the time difference range.
- the preset time difference range can be 500 milliseconds, etc., which is set according to the actual situation.
- sub-cloud training features obtained within a specific time difference range can be merged, thereby increasing the training speed and saving training time.
- step S101 may include: obtaining the current parameters of the Yunzi model; and using cloud training features to train the Yunzi model with the current parameters of the Yunzi model to obtain the cloud output result of the Yunzi model.
- the current parameters of the Yunzi model represent the parameters of the Yunzi model when obtaining cloud training features. Since during the training process, the parameters of the Yunzi model will be continuously updated and optimized, when performing forward propagation of the Yunzi model, It is necessary to obtain the latest updated parameters (ie, current parameters) of the Yunzi model, and then perform a forward propagation process based on the Yunzi model with the latest updated parameters.
- the cloud sub-model processes the cloud training features to obtain a cloud output result.
- the cloud output result may include at least one sub-cloud output, and each sub-cloud output corresponds to a sub-cloud training feature (for example, the above-mentioned first sub-cloud training feature or the above-mentioned Second sub-cloud training characteristics, etc.), as shown in Figure 1A, the cloud output results include sub-cloud output 1, sub-cloud output 2 and sub-cloud output 3.
- the server can perform training with multiple terminals at the same time.
- the M terminal models include a first terminal model and a second terminal model. At least one terminal includes a first terminal and a second terminal. The first terminal model is in Run on the first terminal, and run the second terminal model on the second terminal.
- Step S102 may include: splitting the cloud output result to obtain the first cloud output corresponding to the first terminal model and the second cloud output corresponding to the second terminal model; obtaining the current parameters of the first terminal model and the second cloud output. Current parameters of the two-terminal model; transmitting the current parameters of the first cloud output and the first terminal model to the first terminal; transmitting the current parameters of the second cloud output and the second terminal model to the second terminal.
- different terminal models for example, the above-mentioned first terminal model and the second terminal model
- the first cloud output may include at least one sub-cloud output and the second cloud output may include at least one sub-cloud output.
- the current parameters of each terminal model represent the parameters of the terminal model when obtaining cloud training features.
- an example of the first terminal model may be terminal model A
- an example of the second terminal model may be terminal model B
- an example of the first cloud output may be sub-cloud output 1
- the second cloud output An example could be sub-cloud output 2
- sub-cloud output 1 is transmitted to the first terminal and serves as part of the input of terminal model A
- sub-cloud output 2 is transmitted to the second terminal and serves as the input of terminal model B part.
- each terminal can participate in the training process of multiple terminal models at the same time.
- the M terminal models include a first terminal model and a third terminal model.
- the structures of the first terminal model and the third terminal model may be different.
- At least one terminal includes a first terminal.
- the first terminal model and The third terminal models all run on the first terminal.
- Step S102 may include: splitting the cloud output result to obtain the first cloud output corresponding to the first terminal model and the third cloud output corresponding to the third terminal model; obtaining the current parameters of the first terminal model and the third cloud output.
- Current parameters of the three-terminal model transmit the first cloud output, the third cloud output, the current parameters of the first terminal model, and the current parameters of the third terminal model to the first terminal.
- the third cloud output may include at least one sub-cloud output.
- an example of the first terminal model may be terminal model A
- an example of the third terminal model may be terminal model C
- an example of the first cloud output may be sub-cloud output 1
- the third cloud output An example of this could be output 3 for a subcloud.
- Sub-cloud output 1 and sub-cloud output 3 may both be transmitted to the first terminal, however, sub-cloud output 1 as part of the input to terminal model A and sub-cloud output 3 as part of the input to terminal model C.
- the first terminal is used as an example to run multiple terminal models.
- the second terminal can also run multiple terminal models.
- the terminal gradients respectively output by the N terminal models may be received from a terminal running the N terminal models among at least one terminal.
- the gradient information transmitted from the terminal can be received within the feedback time range.
- the feedback time range can be 8 seconds, 10 seconds, 20 seconds, etc., which can be set according to the actual situation.
- the parameter gradient of the terminal model represents the gradient of the parameters of each layer in the terminal model
- the cloud output gradient of the terminal model represents the gradient of the cloud output received by the terminal model.
- the first terminal model receives the first cloud output such that the cloud output gradient of the first terminal model represents the gradient of the first cloud output.
- the parameter gradient of the cloud sub-model can be calculated based on the cloud output gradients and the cloud output results respectively output by the N terminal models.
- M is greater than 1 and N is greater than 1.
- Step S104 may include: merging the cloud output gradients of N terminal models to obtain the combined output gradient; calculating the cloud sub-model based on the combined output gradient and the cloud output result. parameter gradient.
- N 1
- the process of merging gradients can be omitted, and the parameter gradient of the cloud model is calculated directly based on the cloud output gradient output by the terminal model and the cloud output result.
- step S104 may also include: merging the parameter gradients of the N terminal models to obtain the merged parameter gradient.
- step S105 may include: using the parameter gradients of the N terminal models to adjust the current parameters of the N terminal models; using the parameter gradients of the Yunzi model, Adjust the current parameters of the Yunzi model.
- the adjusted parameters of the N terminal models can be used as the current parameters of the N terminal models and the adjusted parameters of the Yunzi model can be used as the current parameters of the Yunzi model. stored in the server.
- the parameters of a machine learning model can be adjusted through a parameter optimizer.
- the training progress information of each terminal can be saved in the parameters of each terminal model, as a part of the parameters of the terminal model, and stored in the server.
- the training progress of multiple terminal models running on each terminal can be different.
- Each terminal model only needs to save a number of training progress for each terminal, so that on the basis of almost no increase in data transmission volume and data storage volume, the training progress of the terminal model is stored through the server, solving the problem of repeated training of model data. problem, further improving the training speed of the model and saving model training time.
- Each terminal can participate in the training of multiple terminal models, and each terminal model needs to record the trained data for each terminal.
- the training progress records of all terminals of this model need to be rolled back.
- M terminal models correspond one-to-one to M pieces of stored training progress information
- M pieces of stored training progress information are stored in the server.
- the model training method further includes: for each terminal in at least one terminal: receiving current training progress information of each terminal model running on the terminal from the terminal; adjusting each terminal model based on the current training progress information.
- the training progress information corresponding to the terminal model is stored.
- adjusting the stored training progress information corresponding to each terminal model based on the current training progress information may include: setting the stored training progress information corresponding to the terminal model as the current training progress information of the terminal model. Therefore, the stored training progress information corresponding to the terminal model indicates the current training progress of the terminal model.
- Each terminal can independently maintain the training progress corresponding to the terminal.
- the training progress is a strictly increasing number (it can be a timestamp or a number accumulated on the terminal, etc.).
- a unique training progress identifier can be set for the terminal training sample.
- the training progress identifier can be the timestamp when the terminal training sample is generated.
- each terminal stores a training sample set used to train all terminal models running on the terminal.
- the training sample set includes multiple terminal training samples.
- Each terminal training sample includes terminal training features and sample labels. Multiple terminal training samples
- the terminal training features in are generated in sequence, and each terminal training sample has a corresponding training progress identifier.
- the current training progress information in each terminal model represents the training progress identification of the last generated terminal training sample among all terminal training samples that have been used to train each terminal model in the training sample set.
- the current training progress information in each terminal model represents the training progress identifier of the first generated terminal training sample among all terminal training samples in the training sample set that have not been used to train each terminal model.
- all terminal models running on the terminal can share the same training sample set, or different terminal models running on the terminal can correspond to different training sample sets.
- terminal training samples may be preset based on experience, or may be generated in real time as the terminal is used.
- the model training method further includes: receiving a training progress query request sent by each terminal corresponding to the terminal model run by the terminal; and obtaining stored training progress information corresponding to the terminal model based on the training progress query request. ; Output the stored training progress information to the terminal, so that the terminal can perform sample screening operations based on the stored training progress information. For example, in response to the sample screening operation, at least one terminal training sample is obtained, and the terminal sends a training request to the server to perform model training; in response to the sample screening operation, if no terminal training sample is obtained, model training is not performed.
- Figure 3 is a schematic diagram of interaction between a terminal and a server provided by at least one embodiment of the present disclosure.
- a terminal model running on the terminal can correspond to multiple terminal training samples.
- the multiple terminal training samples include terminal training samples 1 to 9 which are generated in sequence.
- the terminal can send a training progress query request to the server to query the current training progress of the terminal model in the terminal.
- the server can obtain the stored training progress information of the terminal model (i.e., the terminal model). The current training progress of the terminal model), and transmits the stored training progress information to the terminal.
- the terminal model For example, if the stored training progress information of the terminal model indicates that terminal training samples 1 to 5 have been used to train the terminal model, That is, the current training progress of the terminal model can be the training progress identifier corresponding to the terminal training sample 5; then, the terminal performs a sample screening operation based on the stored training progress information to screen out terminal training samples that are greater than the current training progress.
- the terminal can, for example, call the push interface to return the current training progress of the terminal model and the terminal gradient output by the terminal model to the server, thereby causing the server to adjust parameters and adjust the storage corresponding to the terminal model based on the current training progress of the terminal model.
- Training progress information After the current training process ends, the current training progress of the terminal model becomes the training progress indicator corresponding to the terminal training sample 8.
- Terminal model is trained.
- the terminal can set a terminal sample threshold for each terminal model.
- the terminal sample threshold represents the maximum number of terminal training samples that can be used to train the terminal model during each training process, that is, the terminal sample threshold is used to train the terminal model during each training process. The number of terminal training samples used to train the terminal model cannot exceed the terminal sample threshold.
- the server can set a cloud sample threshold for each terminal model.
- the cloud sample threshold represents the maximum number of cloud training features that can be used to train the terminal model during each training process, that is, each training process is used for training. The number of cloud training features of this terminal model cannot exceed the cloud sample threshold.
- terminal sample threshold and the cloud sample threshold may be the same or different.
- the terminal sample threshold set by the terminal for the terminal model can be 8, and the cloud sample threshold set by the server for the terminal model can be 6.
- the terminal sends a training request to the server , the training request indicates that the terminal model is trained using 8 terminal training samples.
- the training sample set may include terminal training sample 1 to terminal training sample 20, and the 8 terminal training samples may be terminal training sample 10. ⁇ Terminal training sample 17.
- the server obtains 6 cloud samples corresponding to the first 6 terminal training samples (ie, terminal training sample 10 ⁇ terminal training sample 15) among the 8 terminal training samples. Train features for the training process.
- the first 6 terminal training samples (i.e., terminal training samples 10 to 15) of the 8 terminal training samples are used to train the terminal model.
- the current training corresponding to the terminal model The progress information is the sample identification corresponding to the terminal training sample 15.
- the online time of each terminal is uncontrollable.
- each terminal When each terminal is online, it will regularly access the server to check whether a machine learning model is being trained, and then send the training data information (does not contain sensitive information) is uploaded to the server, allowing the server to obtain cloud training features for model training.
- Figure 4 is a schematic diagram of another model training method provided by at least one embodiment of the present disclosure.
- the model training method provided by the embodiments of the present disclosure can be applied to a terminal (for example, the first terminal), that is, the model training method is implemented by the terminal.
- a terminal for example, the first terminal
- the model training method is implemented by the terminal.
- the machine learning model includes the Yunzi model and the first terminal model.
- the Yunzi model runs on the server, and the first terminal model runs on the first terminal. It should be noted that more terminal models can be run on the first terminal.
- the model training method may include the following steps S200 to S203.
- each terminal training sample includes terminal training features and sample labels.
- step S201 a training request is sent to the server based on at least one terminal training sample.
- step S202 the cloud output corresponding to at least one terminal training sample and the current parameters of the first terminal model are received from the server.
- the cloud output includes at least one sub-cloud output that corresponds one-to-one to at least one terminal training sample.
- the first terminal model is trained using the cloud output, the current parameters of the first terminal model and at least one terminal training sample to obtain the terminal gradient output by the first terminal model.
- the terminal gradient includes the parameter gradient of the first terminal model and the cloud output gradient
- the cloud output gradient may be the gradient of the cloud output.
- step S204 the end gradient is output to the server.
- the server can be caused to calculate the parameter gradient of the cloud submodel based on the terminal gradient output by the first terminal model and the cloud output, and use the parameter gradient of the first terminal model and the cloud output.
- the parameter gradient of the sub-model adjusts the current parameters of the first terminal model and the current parameters of the Yun sub-model respectively.
- Step S200 and steps S203 to S204 represent the forward propagation process and the back propagation process of the first terminal model.
- the model training method splits the machine learning model into a cloud sub-model and a terminal model, thereby realizing federated machine learning between the server and the terminal, achieving user privacy and data security protection, and solving problems such as in-vehicle entertainment
- the problem is that the model on an endpoint such as a device is too large to train.
- the structure of the terminal model running on the terminal is smaller, so it can be adapted to terminals with smaller computing power, so that federated machine learning can be applied to terminals with smaller computing power, further expanding the application scope and application scenarios of federated machine learning. It can effectively help multiple terminals perform data usage and machine learning modeling while meeting the requirements of user privacy protection and data security.
- the accuracy and accuracy of the trained machine learning model can be improved. Accuracy.
- the first terminal may store a training sample set for training the first terminal model.
- the training sample set includes a plurality of terminal training samples, and each terminal training sample includes terminal training features and sample labels.
- Terminal training samples can be preset based on experience, or can be generated in real time as the terminal is used.
- the first terminal may be a vehicle air conditioner.
- the first terminal needs to control the interior temperature of the motor vehicle to which the vehicle air conditioner belongs.
- the first terminal generates a terminal training feature
- the terminal training features may include information such as the current interior temperature of the motor vehicle and the current number of people in the vehicle.
- the server Corresponding to the terminal training features, the server generates corresponding cloud training features.
- the machine learning model can train the terminal features and The cloud training features are processed to obtain a predicted temperature, and then the vehicle air conditioner can be adjusted to the predicted temperature. Then, when the person in the car sends a feedback message, the feedback message is the sample label corresponding to the terminal training feature.
- the feedback message can be inappropriate temperature (higher temperature or lower temperature) or suitable temperature, etc.; based on the Predicting temperature and feedback information, gradients can be generated to adjust the parameters of the machine learning model.
- the terminal training feature and sample label are a terminal training sample.
- the training progress identifier corresponding to the terminal training sample may be a timestamp corresponding to the moment when the terminal training feature is generated or a number accumulated on the first terminal.
- the result currently predicted by the machine learning model reaches the user's expected result.
- the sample label corresponding to the terminal training feature is suitable for the predicted temperature.
- the specific information of the terminal training characteristics can be set according to the actual situation, and this disclosure does not specifically limit this.
- step S200 may include: sending a training progress query request to The server; receives stored training progress information corresponding to the first terminal model from the server; performs a sample screening operation based on the stored training progress information; obtains K terminal training samples in response to the sample screening operation, and obtains at least one terminal training sample based on the K terminal training samples.
- K is a positive integer.
- the first terminal can perform a sample filtering operation on the training sample set corresponding to the first terminal model based on the stored training progress information to obtain K terminal training samples for training the first terminal model, and then, from the K At least one terminal training sample is selected from the terminal training samples.
- the number of the at least one terminal training sample is less than or equal to the terminal sample threshold. For example, when K is less than or equal to the terminal sample When K is greater than the threshold, some of the K terminal training samples can be selected for model training. When no terminal training sample is obtained in response to the sample screening operation, model training is not performed.
- the first terminal may send a training request to the server based on the at least one terminal training sample, and the training request sent by the first terminal includes the first terminal's Identification information and sample identification list, etc.
- the sample identification list is used to indicate the sample identification corresponding to the at least one terminal training sample.
- the server obtains at least one corresponding to the at least one terminal training sample.
- a sub-cloud training feature is used for training to obtain at least one sub-cloud output corresponding to at least one terminal training sample; in addition, the server also obtains the current parameters of the first terminal model, and then, the server obtains at least one sub-cloud output corresponding to at least one terminal training sample.
- the sub-cloud output and the current parameters of the first terminal model are output to the first terminal.
- step S203 may include: for each terminal training sample in the at least one terminal training sample: using the first terminal model with the current parameters of the first terminal model to output the sub-cloud corresponding to the terminal training sample. and the terminal training features in the terminal training samples are processed to obtain the output of the first terminal model; based on the output of the first terminal model and the sample labels in the terminal training samples, the loss value of the first terminal model is obtained; based on the loss value and The output of the first terminal model, obtains the terminal gradient.
- the model training method further includes: determining current training progress information corresponding to the first terminal model based on at least one terminal training sample; sending the current training progress information to the server, so that the server converts the first terminal model to The corresponding stored training progress information is updated to the current training progress information.
- Figure 5 is a schematic diagram of a model training system provided by at least one embodiment of the present disclosure.
- At least one embodiment of the present disclosure also provides a model training system, which is used to train a machine learning model.
- the machine learning model includes a Yunzi model and M terminal models, where M is a positive integer.
- the model training system 1100 may include at least one terminal 1101 and a server 1102.
- the Yunzi model runs on the server 1102, and the M terminal models run on at least one terminal 1101.
- the server 1102 is configured to: obtain cloud training features; use the cloud training features to train the Yunzi model to obtain the cloud output results of the Yunzi model; send the cloud output results and the current parameters of the M terminal models to at least one terminal ;
- Each terminal in the at least one terminal 1101 is configured to: obtain at least one terminal training sample, wherein each terminal training sample includes a terminal training feature and a sample label, and the cloud training feature includes at least one corresponding to the at least one terminal training sample.
- a sub-cloud training feature receiving from the server 1102 at least one cloud output corresponding to at least one terminal training sample and current parameters of the terminal model running on the terminal, wherein the cloud output result includes at least one cloud output; utilizing at least one cloud output, The current parameters of the terminal model running on the terminal and at least one terminal training sample are used to train the terminal model running on the terminal to obtain the terminal gradient output by the terminal model running on the terminal; the terminal gradient output by the terminal model running on the terminal is output to Server 1102.
- the server 1102 can be used to implement the model training method shown in Figure 2, and each terminal in at least one terminal 1101 can be used to implement the model training method shown in Figure 4.
- each terminal in at least one terminal 1101 can be used to implement the model training method shown in Figure 4.
- the terminal regularly checks whether it meets the training conditions, which include sample size, network environment and other information. If the training conditions are met, the terminal sends a training progress query request to the server to query whether training can be performed.
- the training progress query request includes the identification information of the terminal.
- the server receives the training progress query request from the terminal, it will search for the model being trained and find the stored training progress information of the terminal model running on the terminal based on the identification information of the terminal. Then return the name of the terminal model and storage training progress information to the terminal.
- the terminal After receiving the name of the terminal model and storing the training progress information sent by the server, the terminal searches for all trainable terminal training samples in the local training sample set. If the terminal finds a terminal training sample that can be trained, it will send the name of the terminal model and a sample identification list to the server.
- the sample identification list may include sample identifications of each terminal training sample.
- the server receives the name of the terminal model and the sample identification list of the terminal, and uses the sample identification list to search the server-side sample library for the cloud training sample corresponding to the terminal training sample indicated by the sample identification in the sample identification list.
- Each cloud training sample includes a cloud training sample. training features. Then obtain the current parameters of the Yunzi model through the parameter module, and input the cloud training features into the Yunzi model to obtain the output of the Yunzi model (if you need to merge training, you need to wait for other terminals to report terminal training samples here before merging all clouds The training features are then input to the Yunzi model). At the same time, the server will also obtain the current parameters of the terminal model through the parameter module. Finally, the output of the Yunzi model and the current parameters of the terminal model are returned to the terminal, and a session identification information is generated and returned.
- the terminal After the terminal receives the output of the Yunzi model and the current parameters of the terminal model, it inputs the terminal training characteristics of the terminal training sample and the output of the Yunzi model to the terminal model to obtain the output of the terminal model (i.e., the prediction result), and then based on the terminal
- the output of the model and the sample labels of the terminal training samples yield the loss value of the output of the terminal model. Calculate the gradient through the loss value, that is, back propagation to obtain the terminal gradient output by the terminal model (including the parameter gradient of the terminal model and the cloud output gradient), and finally return the terminal gradient output by the terminal model to the server, along with the previous session ID. information.
- the server After the server receives the end gradient output from the terminal model, it continues backpropagation through the cloud output gradient to obtain the parameter gradient of the Yunzi model (if you need to merge training, you need to wait for other terminals to report the end gradient again before backpropagating). In this way, the server obtains the parameter gradient of the Yunzi model and the parameter gradient of the terminal model, that is, the gradient of the entire machine learning model. Then, the parameter gradient of the Yunzi model and the parameter gradient of the terminal model are submitted to the parameter server to update the machine. Learn the parameters of the model and update the stored training progress information corresponding to the terminal model (the stored training progress information is also stored in the parameter server).
- Figure 6 is a schematic diagram of the overall process of model training by a model training system provided by at least one embodiment of the present disclosure.
- the model training system may be the model training system shown in Figure 5.
- the overall process of model training by the model training system includes three parts: forward propagation of the Yunzi model, forward propagation and backpropagation of the terminal model, and backpropagation of the Yunzi model.
- the machine learning model may include a Yunzi model and a terminal model, as shown in Figure 6.
- the Yunzi model may include the first layer layer1 and the second layer layer2, and the terminal model may include the third layer layer3.
- Each layer in the first layer layer1, the second layer layer2 and the third layer layer3 can be a convolution layer, a fully connected layer, a pooling layer, etc.
- the Yunzi model runs on the server, and the terminal model runs on the terminal.
- the cloud model and terminal model shown in Figure 6 are only schematic.
- the cloud model can include more layers, and the terminal model can also include more layers.
- the server inputs the cloud training features into the Yunzi model, obtains the forward propagation result of the Yunzi model (i.e., the above-mentioned cloud output result), and then combines the forward propagation result with the current value of the terminal model.
- the parameters are sent to the terminal together.
- the first layer layer1 of the Yunzi model processes the cloud training features to obtain the output O1 of the first layer layer1
- the second layer layer2 of the Yunzi model processes the output O1 of the first layer layer1.
- the output O2 of the second layer layer2 can represent the cloud output result of the Yunzi model; then, the server sends the cloud output result and the current parameters of the terminal model to the server used to run the terminal model. on the terminal.
- the terminal receives the cloud output results of the Yunzi model and the current parameters of the terminal model, plus the terminal's own terminal input In (including terminal training features) and sample labels are input to The terminal model performs forward propagation and back propagation to obtain the end gradient output by the terminal model, and then sends the end gradient together to the server.
- the third layer layer3 of the terminal model processes the cloud output result (that is, the output O2 of the second layer layer2) and the terminal input In to obtain the output O3 of the third layer layer3, which is the output of the third layer layer3.
- O3 is the prediction result of the machine learning model.
- the loss value of the machine learning model is calculated using the loss function; based on the loss value and the output O3 of the third layer layer3, the parameter gradient of the terminal model is calculated.
- GL3 that is, the gradient of the parameters of the third layer layer3
- the cloud output gradient GO of the terminal model is transmitted to the server.
- the server executes the back propagation process of the Yunzi model, thereby obtaining the parameter gradient of the Yunzi model.
- the parameters of the machine learning model are updated through the parameter optimizer to complete a round of training.
- the parameter gradient GL2 of the second layer layer2 is calculated; then, based on the The parameter gradient GL2 of the second layer layer2 and the output O1 of the first layer layer1 are calculated to obtain the parameter gradient GL1 of the first layer layer1.
- the parameter gradient of the Yunzi model includes the parameter gradient GL1 of the first layer layer1 and the parameter gradient GL2 of the second layer layer2.
- the parameter optimizer updates the parameters of the terminal model (third layer layer3) based on the parameter gradient GL3 of the terminal model, and updates the Yunzi model (first layer) based on the parameter gradient GL1 of the first layer layer1 and the parameter gradient GL2 of the second layer layer2.
- the parameters of layer1 and the second layer layer 2 2).
- Figure 7 is an example diagram of a specific training process of model training provided by a model training system provided by some embodiments of the present disclosure.
- Figure 8 is an example of a specific training process of model training provided by a model training system provided by some embodiments of the present disclosure. picture.
- Figures 7 and 8 show the process of merging training on multiple terminals. Figures 7 and 8 take three terminals as an example for description. The overall process of model training by the model training system is described in detail below with reference to Figures 7 and 8.
- At least one terminal includes terminal Tem1, terminal Tem2 and terminal Tem3, and the M terminal models include a terminal model 10 run by terminal Tem1, a terminal model 20 run by terminal Tem2 and a terminal model 30 run by terminal Tem3. .
- the server receives the training request issued by the terminal Tem1, and then, based on the training request issued by the terminal Tem1, obtains at least one sub-cloud training feature CTF1 corresponding to the terminal Tem1,
- Figure 8 Two sub-cloud training features CTF1 are shown (each rectangular box represents a sub-cloud training feature); at time t2, the server receives the training request issued by the terminal Tem2, and then, based on the training request issued by the terminal Tem2, obtains the information related to the terminal At least one sub-cloud training feature CTF2 corresponding to Tem2.
- Figure 8 shows three sub-cloud training features CTF2; at time t3, the server receives the training request sent by the terminal Tem3, and then, based on the training request sent by the terminal Tem3, obtains the There is at least one sub-cloud training feature CTF3 corresponding to the terminal Tem3.
- Figure 8 shows two sub-cloud training features CTF3. Then, perform input merging, that is, at least one sub-cloud training feature CTF1 corresponding to the terminal Tem1, and at least one sub-cloud training feature CTF1 corresponding to the terminal Tem2.
- the sub-cloud training feature CTF2 and at least one sub-cloud training feature CTF3 corresponding to the terminal Tem3 are input and merged to obtain the cloud training feature.
- time t1, time t2 and time t3 are within the preset time difference range.
- time t1, time t2, and time t3 may be the same time.
- the current parameters of the Yunzi model can be obtained from the parameter module, and then, based on the cloud training features and the current parameters of the Yunzi model, forward propagation of the Yunzi model is performed , thereby obtaining the cloud output results. Then, perform an output splitting operation on the cloud output result to obtain the cloud output FCO1 corresponding to the terminal model 10, the cloud output FCO2 corresponding to the terminal model 20, and the cloud output FCO3 corresponding to the terminal model 30.
- the current parameters of each terminal model can be obtained from the parameter module, and then the cloud output FCO1 and the current parameter CP1 of the terminal model 10 are transmitted to the terminal Tem1, so that the terminal Tem1 can perform forward propagation and reverse propagation of the terminal model 10,
- the parameter gradient GP1 of the terminal model 10 and the cloud output gradient GO1 of the terminal model 10 are obtained; the cloud output FCO2 and the current parameter CP2 of the terminal model 20 are transmitted to the terminal Tem2, so that the terminal Tem2 can perform forward propagation and reverse propagation of the terminal model 20.
- the terminal Tem1 can transmit the parameter gradient GP1 of the terminal model 10 and the cloud output gradient GO1 of the terminal model 10 to the server, and the terminal Tem2 can transmit the parameter gradient GP2 of the terminal model 20 and the cloud output gradient of the terminal model 20.
- the output gradient GO2 is transmitted to the server, and the terminal Tem3 can transmit the parameter gradient GP3 of the terminal model 30 and the cloud output gradient GO3 of the terminal model 30 to the server.
- the server After receiving the gradients transmitted by each terminal, the server can perform gradient merging.
- the server can merge the parameter gradient GP1 of the terminal model 10, the parameter gradient GP2 of the terminal model 20, and the parameter gradient GP3 of the terminal model 30 to obtain the combined parameter gradient; and combine the cloud output gradient GO1 of the terminal model 10 and the terminal model 20 The cloud output gradient GO2 and the cloud output gradient GO3 of the terminal model 30 are merged to obtain the merged output gradient.
- the back propagation of the Yunzi model can be performed to obtain the parameter gradient of the Yunzi model.
- the gradient of the machine learning model can include Yunzi The parameter gradient of the model and the parameter gradient of each terminal model (i.e. GP1 ⁇ GP3, combined parameter gradient).
- the parameter module may include a parameter optimizer.
- the parameter optimizer may receive the merged parameter gradient and the parameter gradient of the Yunzi model, and then adjust the parameters of the Yunzi model based on the parameter gradient of the Yunzi model to update the parameters of the Yunzi model.
- the parameter gradients of each terminal model may not be merged, but the parameter gradients of each terminal model may be directly input to the parameter optimizer, and the parameter optimizer may be based on the parameter gradients of each terminal model. Adjust the parameters of each terminal model separately.
- the parameter optimizer can adjust the parameters of the terminal model 10 based on the parameter gradient GP1 of the terminal model 10 to update the parameters of the terminal model 10; adjust the parameters of the terminal model 20 based on the parameter gradient GP2 of the terminal model 20. , to update the parameters of the terminal model 20; adjust the parameters of the terminal model 30 based on the parameter gradient GP3 of the terminal model 30, to update the parameters of the terminal model 30.
- Figure 9 is a schematic block diagram of a model training device provided by at least one embodiment of the present disclosure.
- the model training device 1000 may include one or more memories 1001 and one or more processors 1002 . It should be noted that the components of the model training device 1000 are only exemplary and not restrictive. According to actual application requirements, the model training device 1000 may also have other components, which are not specifically limited by the embodiments of the present disclosure.
- one or more memories 1001 are configured to non-transiently store computer-executable instructions; one or more processors 1002 are configured to execute the computer-executable instructions.
- Computer-executable instructions when executed by one or more processors 1002, implement one or more steps in the model training method according to any embodiment of the present disclosure.
- the model training device 1000 can be used in the model training method shown in FIG. 2 and/or the model training method shown in FIG. 4 .
- memory 1001 and processor 1002 may communicate with each other directly or indirectly.
- the model training device 1000 may also include a communication interface and a communication bus.
- the memory 1001, the processor 1002 and the communication interface can communicate with each other through a communication bus.
- the memory 1001, the processor 1002 and the communication interface and other components can also communicate through a network connection.
- the network can include a wireless network, a wired network, and/or Any combination of wireless and wired networks. This disclosure does not limit the type and function of the network.
- the communication bus may be a Peripheral Component Interconnect Standard (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, or the like.
- PCI Peripheral Component Interconnect Standard
- EISA Extended Industry Standard Architecture
- the communication bus can be divided into address bus, data bus, control bus, etc.
- the communication interface is used to implement communication between the model training device 1000 and other devices.
- the communication interface may be a Universal Serial Bus (USB) interface, etc.
- the memory 1001 and the processor 1002 can be provided on the server side (or cloud).
- processor 1002 may control other components in the model training device to perform desired functions.
- the processor may be a central processing unit (CPU), a graphics processing unit (GPU), a network processor (NP), etc.; the processor may also be other forms of processing units with model training capabilities and/or program execution capabilities, for example, Digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), tensor processing unit (TPU) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP Digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- TPU tensor processing unit
- the central processing unit (CPU) can be X86 or ARM architecture, etc.
- memory 1001 may be a computer-readable medium, and may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Loss of memory. Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc. Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, and the like.
- One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor may execute the computer-readable instructions to implement various functions of the model training device 1000 . Various applications and various data can also be stored in the storage medium.
- Figure 10 is a schematic diagram of a non-transitory computer-readable storage medium provided by at least one embodiment of the present disclosure.
- one or more computer-executable instructions 2001 may be non-transitory stored on a non-transitory computer-readable storage medium 2000.
- the execution instructions 2001 may execute one or more steps in the model training method according to any embodiment of the present disclosure.
- the non-transitory computer-readable storage medium 2000 can be applied to the above-mentioned model training device 1000.
- the non-transitory computer-readable storage medium 2000 may include the memory 1001 in the above-mentioned model training device 1000.
- the description of the non-transitory computer-readable storage medium 2000 may refer to the description of the memory 1001 in the embodiment of the model training device 1000, and repeated descriptions will not be repeated.
- FIG. 11 shows a schematic structural diagram of an electronic device 3000 suitable for implementing embodiments of the present disclosure.
- the electronic device 3000 may be a terminal (for example, a computer) or a processor, and may be used to execute the model training method of the above embodiment.
- Electronic devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia Portable Media Player (PMP for short), vehicle terminals (such as vehicle navigation terminals), mobile terminals such as wearable electronic devices, and fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
- PDA Personal Digital Assistant
- PDA Personal Digital Assistant
- PAD Portable Media Player
- vehicle terminals such as vehicle navigation terminals
- mobile terminals such as wearable electronic devices
- fixed terminals such as digital TVs, desktop computers, smart home devices, etc.
- the electronic device shown in FIG. 11 is only an example
- the electronic device 3000 may include a processing device (eg, central processing unit, graphics processor, etc.) 3001, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 3002 or from a storage device 3008.
- the program in the memory (RAM) 3003 executes various appropriate actions and processes.
- various programs and data required for the operation of the electronic device 3000 are also stored.
- the processing device 3001, ROM 3002 and RAM 3003 are connected to each other via a bus 3004.
- An input/output (I/O) interface 3005 is also connected to bus 3004.
- the following devices can be connected to the I/O interface 3005: input devices 3006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) , an output device 3007 such as a speaker, a vibrator, etc.; a storage device 3008 including a magnetic tape, a hard disk, etc.; and a communication device 3009.
- the communication device 3009 may allow the electronic device 3000 to communicate wirelessly or wiredly with other devices to exchange data.
- FIG. 11 illustrates electronic device 3000 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
- embodiments of the present disclosure include a computer program product including a computer program carried on a non-transitory computer-readable medium, the computer program including program code for performing the method illustrated in the flowchart to perform the method according to One or more steps in the model training method described above.
- the computer program may be downloaded and installed from the network via the communication device 3009, or from the storage device 3008, or from the ROM 3002.
- the computer program When the computer program is executed by the processing device 3001, it can cause the processing device 3001 to perform the above-mentioned functions defined in the model training method of the embodiment of the present disclosure.
- a computer-readable medium may be a tangible medium that may contain or be stored for use by or in conjunction with an instruction execution system, apparatus, or device. program.
- the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two.
- the computer-readable storage medium may be, for example, but is not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof.
- Computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmd read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
- a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.
- a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein.
- Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
- a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
- Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
- the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
- Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages such as Java, Smalltalk, C++, or a combination thereof. Includes conventional procedural programming languages, such as the "C" language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer, such as through an Internet service provider through the Internet. connect).
- LAN local area network
- WAN wide area network
- each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
- each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
- the units involved in the embodiments of the present disclosure can be implemented in software or hardware.
- the name of a unit does not constitute a qualification for the unit itself.
- FPGAs Field Programmable Gate Arrays
- ASICs Application Specific Integrated Circuits
- ASSPs Application Specific Standard Products
- SOCs Systems on Chips
- CPLD Complex Programmable Logical device
- the user in response to receiving an active request from the user, send a prompt message to the user to clarify Prompt the user accurately that the operation they request will require obtaining and using the user's user information. Therefore, the user can autonomously choose according to the prompt information whether to provide user information to software or hardware such as electronic devices, applications, servers or storage media that perform operations of the technical solution of the present disclosure.
- the method of sending prompt information to the user may be, for example, a pop-up window, and the prompt information may be presented in the form of text in the pop-up window.
- the pop-up window can also carry a selection control for the user to choose "agree” or "disagree” to provide user information to the electronic device.
- a model training method is applied to a server and used to train a machine learning model, wherein the machine learning model includes a Yunzi model and M terminal models.
- the cloud model runs on the server, the M terminal models run on at least one terminal, M is a positive integer
- the model training method includes: obtaining cloud training features; using the cloud training features to The Yunzi model is trained to obtain the cloud output result of the Yunzi model; sending the cloud output result and the current parameters of the M terminal models to the at least one terminal; receiving the at least one terminal output
- the M terminal models correspond to M pieces of stored training progress information one-to-one, and the M pieces of stored training progress information are stored in the server.
- the model training method further includes: for each terminal in the at least one terminal: receiving from the terminal current training progress information of each terminal model running on the terminal. ;Adjust each terminal model pair based on the current training progress information Store training progress information accordingly.
- each terminal stores a training sample set for training all terminal models running on the terminal, the training sample set includes a plurality of terminal training samples, each terminal training sample Including terminal training features and sample labels, the terminal training features in the plurality of terminal training samples are generated in sequence, and each terminal training sample has a corresponding training progress identifier.
- the current training progress information of each terminal model Indicates the training progress identifier of the last generated terminal training sample among all terminal training samples that have been used to train each terminal model in the training sample set.
- the at least one terminal includes a first terminal, wherein obtaining cloud training features includes: receiving a training request issued by the first terminal, wherein the training request issued by the first terminal The request includes identification information of the first terminal; based on the training request issued by the first terminal, obtain at least one first sub-cloud training feature, wherein the cloud training feature includes the at least one first sub-cloud training feature .
- the at least one terminal further includes a second terminal, wherein obtaining cloud training features further includes: receiving a training request issued by the second terminal, wherein the second terminal issues The training request includes the identification information of the second terminal; based on the training request issued by the second terminal, obtain at least one second sub-cloud training feature; and the at least one first sub-cloud training feature, the at least one The second sub-cloud training features are merged to obtain the cloud training features.
- the absolute value of the time difference between the time of the training request issued by the first terminal and the time of the training request issued by the second terminal is within the time difference range.
- M is greater than 1
- the M terminal models include a first terminal model and a second terminal model
- the at least one terminal includes a first terminal and a second terminal
- the first The terminal model runs on the first terminal
- the second terminal model runs on the second terminal
- the cloud output results and the current parameters of the M terminal models are sent to the at least one terminal, including : Split the cloud output result to obtain a first cloud output corresponding to the first terminal model and a second cloud output corresponding to the second terminal model; obtain the first cloud output of the first terminal model current parameters and current parameters of the second terminal model; transmitting the first cloud output and current parameters of the first terminal model to the first terminal; transmitting the second The cloud output and the current parameters of the second terminal model are transmitted to the second terminal.
- the M terminal models include a first terminal model and a third terminal model
- the at least one terminal includes a first terminal
- the models are all run on the first terminal
- sending the cloud output results and the current parameters of the M terminal models to the at least one terminal includes: splitting the cloud output results to obtain The first cloud output corresponding to the first terminal model and the third cloud output corresponding to the third terminal model; obtain the current parameters of the first terminal model and the current parameters of the third terminal model;
- the first cloud output, the third cloud output, the current parameters of the first terminal model and the current parameters of the third terminal model are transmitted to the first terminal.
- using the cloud training features to train the Yunzi model to obtain cloud output results of the Yunzi model includes: obtaining current parameters of the Yunzi model, Wherein, the current parameters of the Yunzi model represent the parameters of the Yunzi model when the cloud training features are obtained; and the Yunzi model with the current parameters of the Yunzi model is performed using the cloud training features. Training to obtain the cloud output results of the Yunzi model.
- M is greater than 1 and N is greater than 1.
- the parameter gradient of the cloud model is calculated based on the end gradients output by the N terminal models and the cloud output results, including: The cloud output gradients of the N terminal models are merged to obtain a merged output gradient; the parameter gradient of the cloud submodel is calculated based on the merged output gradient and the cloud output result.
- the inputs of the M terminal models match the outputs of the cloud sub-model.
- the model training method further includes: receiving a training progress query request sent by each terminal corresponding to the terminal model run by the terminal; based on the training progress query request, obtaining the information related to the terminal model.
- Stored training progress information corresponding to the terminal model output the stored training progress information to the terminal, so that the terminal can perform a sample screening operation based on the stored training progress information; wherein in response to the sample screening operation, at least A terminal training sample, the terminal sends a training request to the server for model training.
- a model training method is applied to a first terminal and used to train a machine learning model, wherein the machine learning model includes a Yunzi model and a first Terminal model, the Yunzi model runs on the server, and the first terminal model
- the model is run on the first terminal
- the model training method includes: obtaining at least one terminal training sample, wherein each terminal training sample includes terminal training features and sample labels; based on the at least one terminal training sample, Send a training request to the server; receive cloud output corresponding to the at least one terminal training sample and current parameters of the first terminal model from the server; use the cloud output, current parameters of the first terminal model
- the first terminal model is trained with the parameters and the at least one terminal training sample to obtain the terminal gradient output by the first terminal model, where the terminal gradient includes the parameter gradient and cloud of the first terminal model.
- obtaining at least one terminal training sample includes: sending a training progress query request to the server; receiving stored training progress information corresponding to the first terminal model from the server; based on the The storage training progress information is performed to perform a sample screening operation; K terminal training samples are obtained in response to the sample screening operation, and the at least one terminal training sample is obtained based on the K terminal training samples, where K is a positive integer.
- the model training method further includes: determining current training progress information corresponding to the first terminal model based on the at least one terminal training sample; sending the current training progress information to the The server is configured to update the stored training progress information corresponding to the first terminal model to the current training progress information.
- the cloud output includes at least one sub-cloud output corresponding to the at least one terminal training sample, using the cloud output, the current parameters of the first terminal model and
- the at least one terminal training sample trains the first terminal model to obtain the terminal gradient output by the first terminal model, including: for each terminal training sample in the at least one terminal training sample: using The first terminal model of the current parameters of the first terminal model processes the sub-cloud output corresponding to the terminal training sample and the terminal training features in the terminal training sample to obtain the first terminal model.
- a model training device includes: One or more memories non-transitoryly storing computer-executable instructions; one or more processors configured to execute the computer-executable instructions, wherein the computer-executable instructions are processed by the one or more processors When the server is running, the model training method according to any embodiment of the present disclosure is implemented.
- a model training system is used to train a machine learning model and includes: at least one terminal and a server, wherein the machine learning model includes a cloud sub-model and M terminal models, the cloud model runs on the server, the M terminal models run on the at least one terminal, M is a positive integer, the server is configured to: obtain cloud training features; use The cloud training feature trains the Yunzi model to obtain the cloud output result of the Yunzi model; sends the cloud output result and the current parameters of the M terminal models to the at least one terminal; receives Terminal gradients output by N terminal models among the M terminal models output by the at least one terminal, where N is a positive integer and less than or equal to M, and each terminal model among the N terminal models outputs
- the terminal gradient includes the parameter gradient of the terminal model and the cloud output gradient; the parameter gradient of the cloud model is calculated based on the terminal gradients output by the N terminal models and the cloud output results; using the N The parameter gradient of the terminal model and the parameter gradient of
- a fifth aspect a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions, the computer-executable When executed by the processor, the instructions implement the model training method according to any embodiment of the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
一种模型训练方法、装置、系统和存储介质。模型训练方法应用于服务器且用于对机器学习模型进行训练,机器学习模型包括云子模型和M个端子模型,模型训练方法包括:获取云端训练特征;利用云端训练特征对云子模型进行训练,以得到云子模型的云端输出结果;发送云端输出结果和M个端子模型的当前参数至至少一个终端;接收M个端子模型中的N个端子模型分别输出的端梯度,N个端子模型中的每个端子模型输出的端梯度包括端子模型的参数梯度和云输出梯度;基于N个端子模型分别输出的端梯度和云端输出结果计算得到云子模型的参数梯度;利用N个端子模型的参数梯度和云子模型的参数梯度,对N个端子模型的当前参数和云子模型的当前参数进行调整。
Description
本申请要求于2022年9月14日递交的中国专利申请第202211117189.3号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
本公开的实施例涉及一种模型训练方法、模型训练装置、模型训练系统和非瞬时性计算机可读存储介质。
联邦学习(Federated Learning)是一种分布式机器学习技术,其核心思想是通过在多个拥有本地数据的数据源之间进行分布式模型训练,在不需要在多个数据源之间交换本地数据的前提下,仅通过交换模型参数或中间结果的方式,构建基于虚拟融合数据下的全局模型,实现跨机构的数据共享,从而实现数据隐私保护和数据共享计算的平衡,即“数据可用不可见”、“数据不动模型动”的应用模式。
发明内容
提供该内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
本公开至少一个实施例提供一种模型训练方法,应用于服务器且用于对机器学习模型进行训练,其中,所述机器学习模型包括云子模型和M个端子模型,所述云子模型在所述服务器上运行,所述M个端子模型在至少一个终端上运行,M为正整数,所述模型训练方法包括:获取云端训练特征;利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果;发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端;接收所述至少一个终端输出的所述M个端子模型中的N个端
子模型分别输出的端梯度,其中,N为正整数,且小于等于M,所述N个端子模型中的每个端子模型输出的端梯度包括所述端子模型的参数梯度和云输出梯度;基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度;利用所述N个端子模型的参数梯度和所述云子模型的参数梯度,对所述N个端子模型的当前参数和所述云子模型的当前参数进行调整。
本公开至少一个实施例提供一种模型训练方法,应用于第一终端且用于对机器学习模型进行训练,其中,所述机器学习模型包括云子模型和第一端子模型,所述云子模型在服务器上运行,所述第一端子模型在所述第一终端上运行,其中,所述模型训练方法包括:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签;基于所述至少一个终端训练样本,发送训练请求至所述服务器;从所述服务器接收与所述至少一个终端训练样本对应的云输出和所述第一端子模型的当前参数;利用所述云输出、所述第一端子模型的当前参数和所述至少一个终端训练样本对所述第一端子模型进行训练,以得到所述第一端子模型输出的端梯度,其中,所述端梯度包括所述第一端子模型的参数梯度和云输出梯度;输出所述端梯度至所述服务器,以使得所述服务器基于所述端梯度和所述云输出计算得到所述云子模型的参数梯度,并利用所述第一端子模型的参数梯度和所述云子模型的参数梯度,对所述第一端子模型的当前参数和所述云子模型的当前参数进行调整。
本公开至少一个实施例还提供一种模型训练装置,包括:一个或多个存储器,非瞬时性地存储有计算机可执行指令;一个或多个处理器,配置为运行所述计算机可执行指令,其中,所述计算机可执行指令被所述一个或多个处理器运行时实现根据本公开任一实施例所述的模型训练方法。
本公开至少一个实施例还提供一种模型训练系统,用于对机器学习模型进行训练且包括:至少一个终端和服务器,其中,所述机器学习模型包括云子模型和M个端子模型,所述云子模型在所述服务器上运行,所述M个端子模型在所述至少一个终端上运行,M为正整数,所述服务器被配置为:获取云端训练特征;利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果;发送所述云端输出结果和所述M个端子模
型的当前参数至所述至少一个终端;接收所述至少一个终端输出的所述M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,所述N个端子模型中的每个端子模型输出的端梯度包括所述端子模型的参数梯度和云输出梯度;基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度;利用所述N个端子模型的参数梯度和所述云子模型的参数梯度,对所述N个端子模型的当前参数和所述云子模型的当前参数进行调整;所述至少一个终端中的每个终端被配置为:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签,所述云端训练特征包括与所述至少一个终端训练样本一一对应的至少一个子云端训练特征;从所述服务器接收与所述至少一个终端训练样本对应的云输出和所述终端上运行的端子模型的当前参数,其中,所述云端输出结果包括所述云输出;利用所述云输出、所述终端上运行的端子模型的当前参数和所述至少一个终端训练样本对所述终端上运行的端子模型进行训练,以得到所述终端上运行的端子模型输出的端梯度;输出所述终端上运行的端子模型输出的端梯度至所述服务器。
本公开至少一个实施例还提供一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据本公开任一实施例所述的模型训练方法。
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1A为本公开至少一个实施例提供的一种机器学习模型的示意图;
图1B为本公开至少一个实施例提供的另一种机器学习模型的示意图;
图2为本公开至少一个实施例提供的一种模型训练方法的示意性流程图;
图3为本公开至少一个实施例提供的一种终端和服务器进行交互的示意图;
图4为本公开至少一个实施例提供的另一种模型训练方法的示意图;
图5为本公开至少一个实施例提供的一种模型训练系统的示意图;
图6为本公开至少一个实施例提供的一种模型训练系统进行模型训练的整体流程的示意图;
图7为本公开至少一个实施例提供的一种模型训练系统进行模型训练的具体训练过程的示例图;
图8为本公开至少一个实施例提供的一种模型训练系统进行模型训练的具体训练过程的示例图;
图9为本公开至少一个实施例提供的一种模型训练装置的示意性框图;
图10为本公开至少一个实施例提供的一种非瞬时性计算机可读存储介质的示意图;以及
图11为本公开至少一个实施例提供的一种电子设备的硬件结构示意图。
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开的实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
伴随着隐私保护政策和用户隐私保护意识的不断提高,尤其是各种终端隐私保护的不断加强,给基于深度网络模型的大规模在线推荐系统带来了新的挑战。用户隐私数据不再能追踪和集中存储,传统的模型训练方式需要首先将数据进行汇聚,然后基于汇聚的数据进行模型训练,从而传统的模型训练方式无法适应这种场景。基于用户隐私和数据安全保护的联邦学习技术,正逐渐受到重视。
联邦学习是指通过联合多个拥有数据所有权的参与方(终端)共同进行机器学习建模的方法。在联邦学习过程中,拥有数据的参与方不需要向中心服务器(也叫做参数服务器)暴露自己的数据,而是通过参数或梯度更新来共同完成模型训练的过程。因此联邦学习可以在保护用户隐私数据,并且可以完成建模训练过程。
在大规模在线推荐系统场景下,机器学习模型往往非常庞大,需要大量的算力才能够快速的进行训练。传统的模型训练方式都是将用户的数据保存在云端,然后通过服务器强大的算力进行快速的模型训练。庞大的模型同样对应着大量的训练数据,从而会导致服务器的存储压力较大。为了在模型效果和训练速度上保持平衡,往往需要使用批量训练的方式。
本公开至少一个实施例提供一种模型训练方法,该模型训练方法应用于服务器且用于对机器学习模型进行训练。机器学习模型包括云子模型和M个端子模型,云子模型在服务器上运行,M个端子模型在至少一个终端上运行,M为正整数,模型训练方法包括:获取云端训练特征;利用云端训练特征对云子模型进行训练,以得到云子模型的云端输出结果;发送云端输出结果和M个端子模型的当前参数至至少一个终端;接收至少一个终端输出的M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,N个端子模型中的每个端子模型输出的端梯度包括该端子模型的参数梯度和云输出梯度;基于N个端子模型分别输出的端梯度和云端输出结果计算得到云子模型的参数梯度;利用N个端子模型的参数梯度和云子模型的参数梯度,对N个端子模型的当前参数和云子模型的当前参数进行调整。
本公开的实施例提供的模型训练方法通过将机器学习模型拆分为云子
模型和端子模型,从而实现在服务器和终端之间进行联邦机器学习,实现用户隐私和数据安全保护,解决诸如车载娱乐设备之类的终端上的模型过大而无法训练的问题;此外,还可以根据不同的终端使用不同的端子模型,从而模型训练的过程更加灵活,应用场景更加广泛;服务器可以同时与多个终端进行联邦机器学习,在保证训练得到的机器学习模型的模型效果的基础上,大大提高模型训练速度,节省模型训练时间。
本公开至少一个实施例还提供一种模型训练装置、模型训练系统和非瞬时性计算机可读存储介质。该模型训练方法可应用于本公开实施例提供的模型训练装置,该模型训练装置可被配置于电子设备上。该电子设备可以是固定终端、移动终端等。
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。为了保持本公开实施例的以下说明清楚且简明,本公开省略了部分已知功能和已知部件的详细说明。
图1A为本公开至少一个实施例提供的一种机器学习模型的示意图,图1B为本公开至少一个实施例提供的另一种机器学习模型的示意图,图2为本公开至少一个实施例提供的一种模型训练方法的示意性流程图。
例如,在一些实施例中,本公开的实施例提供的模型训练方法可以应用于服务器,即由服务器实现模型训练方法。服务器可以为云端服务器等,服务器可以包括中央处理单元(CPU)等具有数据处理能力和/或程序执行能力的器件。
例如,模型训练方法可以用于对机器学习模型进行训练,机器学习模型可以为神经网络模型等。
本公开从切图的方案出发,将建模完成的庞大的机器学习模型通过切图的方式拆分两个部分,第一部分是由终端执行的模型结构较小的端子模型,第二部分是由服务端执行的模型结构较大的云子模型。端子模型相对简单,由原机器学习模型最上层的几层神经网络层组成,从而适用于具有较小算力的终端,避免增加终端的算力负担。可以根据不同的终端使用不同的端子模型,即各个终端上的端子模型可以根据需求使用不一样的结构;此外,也可以根据各个终端的不同,而设置不同的端子模型的输入。云子模型包含了机器学习模型的大部分结构,因此云子模型相对复杂且主要在服务器执行,利
用服务器的强大的算力完成模型训练。云子模型和各个端子模型配合完成联邦训练过程。
例如,机器学习模型可以包括云子模型和M个端子模型,图1A示出了三个端子模型,分别为端子模型A、端子模型B和端子模型C。每个端子模型和云子模型共同构成一个完整的模型,该完整的模型可以用于实现预定的功能,例如,分类、预测等功能。
例如,M个端子模型在至少一个终端上运行,M为正整数,每个终端上可以运行至少一个端子模型,例如,在一个示例中,每个终端上可以运行一个端子模型,此时,M个端子模型分别在M个终端上运行,例如,图1A示出的三个端子模型可以分别在三个终端上运行。例如,在另一些示例中,一个终端上可以运行多个端子模型,例如,图1A示出的三个端子模型中的至少两个端子模型也可以在同一个终端上运行,例如图1A所示的端子模型A和端子模型B由同一个终端执行。
例如,云子模型在服务器上运行。每个服务器上可以运行至少一个云子模型,在一个示例中,如图1B所示,云子模型A和端子模型D共同构成一个完整的模型,云子模型B和端子模型E共同构成一个完整的模型,云子模型A和云子模型B可以由同一个服务器运行,端子模型D和端子模型E可以由同一终端运行,也可以由不同的终端运行。
例如,每个云子模型可以对应至少一个端子模型,如图1A所示,一个云子模型可以对应三个端子模型,此时,该云子模型的输出可以被传输至三个端子模型;如图1B所示,一个云子模型对应一个端子模型,云子模型A对应端子模型D,云子模型B对应端子模型E,从而云子模型A的输出被传输至端子模型D,云子模型B的输出被传输至端子模型E。
需要说明的是,在本公开的实施例中,“云子模型对应端子模型”表示该端子模型和该云子模型能够共同构成一个完整的模型。
例如,M个端子模型的输入与云子模型的输出匹配,也就是说,云子模型将相同尺寸的特征图输出给M个端子模型,例如,如图1A所示,子云输出1、子云输出2和子云输出3的尺寸相同。
例如,每个端子模型的输入可以包括端输入和子云输出,如图1A所示,端子模型A的输入可以包括子云输出1和端输入1,端子模型B的输入可以
包括子云输出2和端输入2,端子模型C的输入可以包括子云输出3和端输入3。端输入可以为存储在运行该端子模型的终端上的终端训练特征(下面描述)。
例如,M个端子模型实现相同的目标,例如,调节温度等。
例如,M个端子模型可以在不同的终端上运行,且不同的终端可以为应用在不同场景的同一类终端,也可以为应用在同一场景或不同场景的不同类终端。例如,在图1A所示的示例中,端子模型A可以在终端1上运行,端子模型B可以在终端2上运行,端子模型C可以在终端3上运行,在一个示例中,终端1、终端2和终端3可以均为空调,终端1可以为车载空调,终端2可以为客厅的空调,终端3可以为卧室的空调,此时,端子模型A、端子模型B和端子模型C所实现的目标可以均为调节温度。
例如,每个终端和服务器可以分开设置且通过网络实现通信连接。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。网络可以包括局域网、互联网、电信网、基于互联网和/或电信网的物联网(Internet of Things)、和/或以上网络的任意组合等。有线网络例如可以采用双绞线、同轴电缆或光纤传输等方式进行通信,无线网络例如可以采用3G/4G/5G移动通信网络、蓝牙、Zigbee或者WiFi等通信方式。本公开对网络的类型和功能在此不作限制。
例如,终端可以为各种移动终端、固定终端等,例如,终端可以包括移动终端的应用程序(application,App)。移动终端可以为平板电脑、车载设备、笔记本电脑、智能眼镜、智能手表、车载娱乐设备等。固定终端可以为台式计算机、智能家电(例如,智能空调、智能冰箱、智能净化器、智能开关、智能网关、智能电饭煲等)等。
如图2所示,模型训练方法可以包括以下步骤S100~S105。
在步骤S100中,获取云端训练特征。
在步骤S101中,利用云端训练特征对云子模型进行训练,以得到云子模型的云端输出结果。
在步骤S102中,发送云端输出结果和M个端子模型的当前参数至至少一个终端。
在步骤S103中,接收至少一个终端输出的M个端子模型中的N个端子
模型分别输出的端梯度。例如,N为正整数,且小于等于M,N个端子模型中的每个端子模型输出的端梯度包括端子模型的参数梯度和云输出梯度。
在步骤S104中,基于N个端子模型分别输出的端梯度和云端输出结果计算得到云子模型的参数梯度。
在步骤S105中,利用N个端子模型的参数梯度和云子模型的参数梯度,对N个端子模型的当前参数和云子模型的当前参数进行调整。
步骤S100~S101表示云子模型的前向传播过程,步骤S103~S104表示云子模型的反向传播过程。
例如,在步骤S100中,云端训练特征可以包括与每个终端对应的至少一个子云端训练特征,子云端训练特征可以为终端的已经公开的信息和/或终端授权给服务器的信息等不涉及终端隐私的信息,在一些示例中,终端可以为车载空调,此时,该终端对应的子云端训练特征可以为该车载空调所属的机动车所在的位置的环境温度、地址、时间等信息。子云端训练特征的具体内容可以根据实际情况确定,本公开对此不作限制。
例如,至少一个子云端训练特征可以存储在服务器,当服务器接收到终端发出的训练请求,则可以根据该训练请求中的标识信息等信息获取与该终端对应的子云端训练特征。
在一些实施例中,至少一个终端包括第一终端,步骤S100可以包括:接收第一终端发出的训练请求;基于第一终端发出的训练请求,获取至少一个第一子云端训练特征。云端训练特征包括至少一个第一子云端训练特征,至少一个第一子云端训练特征与第一终端对应。
例如,第一终端发出的训练请求包括第一终端的标识信息和样本标识等,服务器可以根据第一终端的标识信息和样本标识获取该至少一个第一子云端训练特征。
需要说明的是,“样本标识”可以表示终端训练样本(下面将描述)的标识信息,基于该样本标识,可以确定哪些终端训练样本被用于进行训练,从而服务器可以获取与这些终端训练样本对应的子云端训练特征以进行训练。
每个终端会定期(每间隔一段时间,该一段时间为分钟级,例如,该一段时间可以为一分钟、两分钟、五分钟等)不断查询服务器以进行模型训练,在这段时间内,每个终端新增的终端训练特征一般不会很多。当有几千万个
终端需要与服务器进行模型训练时,每一时刻,各个终端设备新增的样本量很少,如果服务器对每个终端单独进行训练,那么将会非常消耗服务器的资源,大大降低训练速度。因此,本公开的实施例提供的模型训练方法可以进行合并训练,即合并多个终端的终端训练特征以组成一个批次(batch)进行训练,从而提高训练速度,节省训练时间,优化或降低服务器的资源消耗,通过实时样本合并方案解决端上样本不足的问题。
在另一些实施例中,至少一个终端包括第一终端和第二终端,步骤S100可以包括:接收第一终端发出的训练请求;基于第一终端发出的训练请求,获取至少一个第一子云端训练特征;接收第二终端发出的训练请求;基于第二终端发出的训练请求,获取至少一个第二子云端训练特征;对至少一个第一子云端训练特征、至少一个第二子云端训练特征进行合并处理,以得到云端训练特征。
在本公开的实施例中,服务器可以同时与多个终端进行联邦机器学习,从而大大提高模型训练速度,降低服务器的资源消耗,减轻服务器的压力。
例如,第二终端发出的训练请求包括第二终端的标识信息和样本标识等。
例如,第一终端发出的训练请求的时间和第二终端发出的训练请求的时间之间的时间差的绝对值处于时间差范围内。例如,预设时间差范围可以为500毫秒等,具体根据实际情况设置。在本公开的实施例中,可以对特定时间差范围内获取的子云端训练特征进行合并处理,从而提高训练速度,节省训练时间。
例如,步骤S101可以包括:获取云子模型的当前参数;利用云端训练特征对具有云子模型的当前参数的云子模型进行训练,以得到云子模型的云端输出结果。例如,云子模型的当前参数表示在获取云端训练特征时云子模型的参数,由于在训练过程中,云子模型的参数会不断被更新优化,从而在执行云子模型的前向传播时,需要获取该云子模型的最新更新的参数(即当前参数),然后基于具有最新更新的参数的云子模型执行前向传播过程。
例如,云子模型对云端训练特征进行处理可以得到云端输出结果,云端输出结果可以包括至少一个子云输出,每个子云输出对应一个子云端训练特征(例如,上述第一子云端训练特征或上述第二子云端训练特征等),如图1A所示,云端输出结果包括子云输出1、子云输出2和子云输出3。
例如,服务器可以同时与多个终端进行训练,在一些实施例中,M个端子模型包括第一端子模型和第二端子模型,至少一个终端包括第一终端和第二终端,第一端子模型在第一终端上运行,第二端子模型在第二终端上运行。步骤S102可以包括:对云端输出结果进行拆分处理,以得到与第一端子模型对应的第一云输出和与第二端子模型对应的第二云输出;获取第一端子模型的当前参数和第二端子模型的当前参数;将第一云输出和第一端子模型的当前参数传输至第一终端;将第二云输出和第二端子模型的当前参数传输至第二终端。
在本公开的实施例中,可以根据不同的终端使用不同的端子模型(例如,上述第一端子模型和第二端子模型),从而实现模型训练的过程更加灵活,应用场景更加广泛;而且,不同的端子模型可以同时训练,从而进一步节省模型训练的时间。
例如,第一云输出可以包括至少一个子云输出,第二云输出可以包括至少一个子云输出。
例如,每个端子模型的当前参数表示在获取云端训练特征时端子模型的参数。
如图1A所示,第一端子模型的一个示例可以为端子模型A,第二端子模型的一个示例可以为端子模型B,第一云输出的一个示例可以为子云输出1,第二云输出的一个示例可以为子云输出2,子云输出1被传输至第一终端,并作为端子模型A的输入的一部分,子云输出2被传输至第二终端,并作为端子模型B的输入的一部分。
例如,每个终端同一时间内可以同时参与多个端子模型的训练过程。在一些实施例中,M个端子模型包括第一端子模型和第三端子模型,第一端子模型的结构和第三端子模型的结构可以不同,至少一个终端包括第一终端,第一端子模型和第三端子模型均在第一终端上运行。步骤S102可以包括:对云端输出结果进行拆分处理,以得到与第一端子模型对应的第一云输出和与第三端子模型对应的第三云输出;获取第一端子模型的当前参数和第三端子模型的当前参数;将第一云输出、第三云输出、第一端子模型的当前参数和第三端子模型的当前参数传输至第一终端。
例如,第三云输出可以包括至少一个子云输出。
如图1A所示,第一端子模型的一个示例可以为端子模型A,第三端子模型的一个示例可以为端子模型C,第一云输出的一个示例可以为子云输出1,第三云输出的一个示例可以为子云输出3。子云输出1和子云输出3可以均被传输至第一终端,但是,子云输出1作为端子模型A的输入的一部分,子云输出3作为端子模型C的输入的一部分。
上面的实施例中,以第一终端运行多个端子模型为例,但是本公开不限于此,第二终端也可以运行多个端子模型,具体操作流程参考上面的相关描述,重复之处不再赘述。
例如,在步骤S103中,可以从至少一个终端中的运行该N个端子模型的终端接收该N个端子模型分别输出的端梯度。在一些实施例中,在步骤S103中,可以在反馈时间范围内接收从终端传输的梯度信息,当在反馈时间范围内,没有接收到某个终端反馈的梯度信息,则表示该终端掉线了(此时,N小于M),从而在当前的训练过程中,不对该终端运行的端子模型的参数进行调整。例如,反馈时间范围可以为8秒、10秒、20秒等,可以根据实际情况设置。
例如,在步骤S103中,端子模型的参数梯度表示该端子模型中的各个层的参数的梯度,端子模型的云输出梯度表示该端子模型所接收的云输出的梯度。在一个示例中,第一端子模型接收第一云输出,从而第一端子模型的云输出梯度表示该第一云输出的梯度。
例如,在步骤S104中,可以基于N个端子模型分别输出的云输出梯度和云端输出结果计算得到云子模型的参数梯度。在一些实施例中,M大于1,N大于1,步骤S104可以包括:对N个端子模型的云输出梯度进行合并处理以得到合并输出梯度;基于合并输出梯度和云端输出结果计算得到云子模型的参数梯度。
需要说明的是,当N为1时,则可以省略合并梯度的过程,而直接基于端子模型输出的云输出梯度和云端输出结果计算得到云子模型的参数梯度。
例如,在一些实施例中,步骤S104还可以包括:对N个端子模型的参数梯度进行合并处理以得到合并参数梯度。
例如,在一些实施例中,步骤S105可以包括:利用N个端子模型的参数梯度,对N个端子模型的当前参数进行调整;利用云子模型的参数梯度,
对云子模型的当前参数进行调整。
在对N个端子模型和云子模型进行参数调整之后,可以将N个端子模型的调整后的参数作为N个端子模型的当前参数和云子模型的调整后的参数作为云子模型的当前参数存储在服务器中。
例如,可以通过参数优化器调整机器学习模型的参数。
上述步骤S100~步骤S105表示一次完整的训练过程。
为了避免每个终端对每个端子模型都需要管理训练进度,并且避免服务器回滚机器学习模型到某一天时需要重制终端的训练进度的复杂性。在本公开的实施例中,可以将每个终端的训练进度信息保存在每个端子模型的参数中,当成端子模型的参数的一部分,并且被存储在服务器中。每个终端运行的多个端子模型的训练进度可以不一样。每个端子模型只需要为每个终端保存一个训练进度的数字即可,从而在几乎不增加数据传输量和数据存储量的基础上,通过服务器存储端子模型的训练进度,解决模型数据重复训练的问题,进一步提升模型的训练速度,节省模型训练的时间。
每个终端可以参与多个端子模型的训练,每个端子模型都需要为每个终端记录已训练的数据。当模型回滚时,需要回滚这个模型的所有终端的训练进度记录。
例如,M个端子模型分别一一对应M个存储训练进度信息,M个存储训练进度信息存储在服务器。
例如,在一些实施例中,模型训练方法还包括:针对至少一个终端中的每个终端:从终端接收在终端上运行的每个端子模型的当前训练进度信息;基于当前训练进度信息调整每个端子模型对应的存储训练进度信息。
例如,基于当前训练进度信息调整每个端子模型对应的存储训练进度信息可以包括:将端子模型对应的存储训练进度信息设置为端子模型的当前训练进度信息。由此,端子模型对应的存储训练进度信息指示端子模型的当前训练进度。
每个终端可以单独维护该终端对应的训练进度,训练进度是一个严格递增的数字(可以是时间戳或者在终端进行累加的数字等)。在终端,每个终端训练样本生成时,可以为该终端训练样本设置唯一的训练进度标识,例如,在一个示例中,训练进度标识可以为生成该终端训练样本时的时间戳。
例如,每个终端存储有用于训练在终端上运行的所有端子模型的训练样本集,训练样本集包括多个终端训练样本,每个终端训练样本包括终端训练特征和样本标签,多个终端训练样本中的终端训练特征按顺序依次生成,且每个终端训练样本具有对应的训练进度标识。例如,在每个端子模型的当前训练进度信息表示训练样本集中的已经用于训练每个端子模型的所有终端训练样本中的最后先生成的终端训练样本的训练进度标识。又例如,在每个端子模型的当前训练进度信息表示训练样本集中的尚未用于训练每个端子模型的所有终端训练样本中的最先生成的终端训练样本的训练进度标识。
需要说明的是,终端上运行的所有端子模型可以共享同一个训练样本集,或者,终端上运行的不同端子模型可以分别对应不同的训练样本集。
例如,终端训练样本可以是基于经验预先设置的,也可以是随着终端的使用过程而实时产生的。
例如,在一些实施例中,模型训练方法还包括:接收每个终端发送的与由终端运行的端子模型对应的训练进度查询请求;基于训练进度查询请求,获取与端子模型对应的存储训练进度信息;输出存储训练进度信息至终端,以供终端基于存储训练进度信息进行样本筛选操作。例如,响应于样本筛选操作得到至少一个终端训练样本,终端发送训练请求至服务器以进行模型训练;响应于样本筛选操作没有得到终端训练样本,则不进行模型训练。
图3为本公开至少一个实施例提供的一种终端和服务器进行交互的示意图。
在终端运行的一个端子模型可以对应多个终端训练样本,在某时刻,该多个终端训练样本包括依次生成的终端训练样本1~终端训练样本9。如图3所示,在该时刻,终端可以向服务器发送训练进度查询请求以查询终端中的端子模型的当前训练进度,服务器基于训练进度查询请求可以获取该端子模型的存储训练进度信息(即该端子模型的当前训练进度),并将该存储训练进度信息传输至终端,例如,若该端子模型的存储训练进度信息指示终端训练样本1~终端训练样本5已经用于对该端子模型进行训练,即该端子模型的当前训练进度可以为终端训练样本5对应的训练进度标识;然后,终端基于该存储训练进度信息进行样本筛选操作,以筛选出大于当前训练进度的终端训练样本,此时,筛选得到终端训练样本6~终端训练样本9;然后,终端基于
满足条件的部分终端训练样本(例如,终端训练样本6~终端训练样本8)的信息(样本标识等)向服务器发送训练请求以请求进行模型训练,还可以例如调用pull接口以从服务器获取端子模型的当前参数和与该端子模型对应的云输出。最后,终端可以例如调用push接口以将端子模型的当前训练进度和端子模型输出的端梯度返回给服务器,从而使得服务器进行参数调整,并基于该端子模型的当前训练进度调整该端子模型对应的存储训练进度信息。在当前训练过程结束之后,端子模型的当前训练进度变为终端训练样本8对应的训练进度标识。
当该终端再次向服务器发送训练进度查询请求以发起训练时,可以过滤上一轮已经完成训练的终端训练样本,即可以过滤上述终端训练样本1~终端训练样本8,从而基于终端训练样本9对端子模型进行训练。
例如,终端可以为每个端子模型设置终端样本阈值,终端样本阈值表示在每次训练过程中,能够用于训练该端子模型的终端训练样本的数量的最大值,即每次训练过程中用于训练该端子模型的终端训练样本的数量不能超过该终端样本阈值。
例如,服务器可以为每个端子模型设置云端样本阈值,云端样本阈值表示在每次训练过程中,能够用于训练该端子模型的云端训练特征的数量的最大值,即每次训练过程用于训练该端子模型的云端训练特征的数量不能超过该云端样本阈值。
需要说明的是,终端样本阈值和云端样本阈值可以相同,也可以不相同。
例如,在一个示例中,终端为端子模型设置的终端样本阈值可以为8,而服务器为该端子模型设置的云端样本阈值可以为6,此时,在一次训练过程中,终端向服务器发出训练请求,该训练请求指示利用8个终端训练样本对该端子模型进行训练,在一个示例中,训练样本集可以包括终端训练样本1~终端训练样本20,该8个终端训练样本可以为终端训练样本10~终端训练样本17。此时,由于端子模型对应的云端样本阈值为6,从而服务器获取与该8个终端训练样本中的前6个终端训练样本(即终端训练样本10~终端训练样本15)分别对应的6个云端训练特征以进行训练过程。对应地,在终端中,利用该8个终端训练样本中的前6个终端训练样本(即终端训练样本10~终端训练样本15)对端子模型进行训练,此时,该端子模型对应的当前训练
进度信息为终端训练样本15对应的样本标识。
在本公开的实施例提供的模型训练方法中,每个终端的上线时间是无法控制的,每个终端在线时会定时的访问服务器,查看是否有机器学习模型正在训练,然后将训练数据的信息(不包含敏感信息)上传到服务器,从而使得服务器获取云端训练特征以进行模型训练。
图4为本公开至少一个实施例提供的另一种模型训练方法的示意图。
例如,在一些实施例中,本公开的实施例提供的模型训练方法可以应用于终端(例如,第一终端),即由终端实现模型训练方法。关于终端的相关说明可以参考上述实施例中的描述。
例如,机器学习模型包括云子模型和第一端子模型,云子模型在服务器上运行,第一端子模型在第一终端上运行。需要说明的是,第一终端上还可以运行更多的端子模型。
如图4所示,模型训练方法可以包括以下步骤S200~S203。
在步骤S200中,获取至少一个终端训练样本。例如,每个终端训练样本包括终端训练特征和样本标签。
在步骤S201中,基于至少一个终端训练样本,发送训练请求至服务器。
在步骤S202中,从服务器接收与至少一个终端训练样本对应的云输出和第一端子模型的当前参数。例如,该云输出包括与至少一个终端训练样本一一对应的至少一个子云输出。
在步骤S203中,利用云输出、第一端子模型的当前参数和至少一个终端训练样本对第一端子模型进行训练,以得到第一端子模型输出的端梯度。例如,端梯度包括第一端子模型的参数梯度和云输出梯度,该云输出梯度可以为该云输出的梯度。
在步骤S204中,输出端梯度至服务器。当第一端子模型输出的端梯度被输出至服务器之后,可以使得服务器基于第一端子模型输出的端梯度和云输出计算得到云子模型的参数梯度,并利用第一端子模型的参数梯度和云子模型的参数梯度,分别对第一端子模型的当前参数和云子模型的当前参数进行调整。
步骤S200和步骤S203~S204表示第一端子模型的前向传播过程和反向传播过程。
本公开的实施例提供的模型训练方法通过将机器学习模型拆分为云子模型和端子模型,从而实现在服务器和终端之间进行联邦机器学习,实现用户隐私和数据安全保护,解决诸如车载娱乐设备之类的终端上的模型过大而无法训练的问题。在终端上运行的端子模型的结构较小,从而可以适应于算力较小的终端,使得联邦机器学习可以应用在具有较小算力的终端,进一步扩大联邦机器学习的应用范围和应用场景,能有效帮助多个终端在满足用户隐私保护和数据安全的要求下,进行数据使用和机器学习建模;而且,由于联合多个终端进行联邦训练,从而可以提高训练得到的机器学习模型的精度和准确度。
例如,第一终端可以存储有用于训练第一端子模型的训练样本集,训练样本集包括多个终端训练样本,每个终端训练样本包括终端训练特征和样本标签。终端训练样本可以是基于经验预先设置的,也可以是随着终端的使用过程而实时产生的。例如,在一些实施例中,第一终端可以为车载空调,在某个时刻,第一终端需要控制该车载空调所属的机动车的车内温度,此时,第一终端产生一个终端训练特征,该终端训练特征可以包括机动车的当前车内温度、当前车内人员数量等信息,对应于该终端训练特征,服务器产生对应的云端训练特征,此时,机器学习模型可以对该终端训练特征和云端训练特征进行处理,以得到一个预测温度,然后,车载空调可以调节到该预测温度。然后,当车内的人员发出一个反馈信息时,该反馈信息即为该终端训练特征对应的样本标签,反馈信息可以为温度不合适(温度较高或温度较低)或温度合适等;基于该预测温度和反馈信息,则可以生成梯度以调节机器学习模型的参数。该终端训练特征和样本标签即为一个终端训练样本,该终端训练样本对应的训练进度标识可以为产生该终端训练特征的时刻对应的时间戳或在第一终端进行累加的数字。
需要说明的是,当车内的人员没有发出反馈信息时,可以默认机器学习模型当前预测得到的结果达到用户的预期结果。例如,在上述示例中,当车内的人员没有发出反馈信息时,终端训练特征对应的样本标签为该预测温度合适。此外,终端训练特征的具体信息可以根据实际情况设置,本公开对此不作具体限定。
例如,在一些实施例中,步骤S200可以包括:发送训练进度查询请求至
服务器;从服务器接收第一端子模型对应的存储训练进度信息;基于存储训练进度信息进行样本筛选操作;响应于样本筛选操作得到K个终端训练样本,基于K个终端训练样本获取至少一个终端训练样本。例如,K为正整数。例如,第一终端可以基于存储训练进度信息对该第一端子模型对应的训练样本集进行样本筛选操作,以获得用于训练该第一端子模型的K个终端训练样本,然后,从该K个终端训练样本中选择得到至少一个终端训练样本,当第一端子模型设置有对应的终端样本阈值,则该至少一个终端训练样本的数量小于等于该终端样本阈值,例如,当K小于等于该终端样本阈值时,可以选择该K个终端训练样本以进行模型训练;当K大于该终端样本阈值时,可以选择该K个终端训练样本中的部分终端训练样本以进行模型训练。当响应于样本筛选操作没有得到终端训练样本,则不进行模型训练。
例如,在一些实施例中,在步骤S201中,当获取至少一个终端训练样本之后,第一终端可以基于至少一个终端训练样本向服务器发送训练请求,第一终端发出的训练请求包括第一终端的标识信息和样本标识列表等,样本标识列表用于指示该至少一个终端训练样本分别对应的样本标识,然后,服务器基于第一终端发出的训练请求,获取与该至少一个终端训练样本分别对应的至少一个子云端训练特征以进行训练,以得到至少一个终端训练样本对应的至少一个子云输出;此外,服务器还获取第一端子模型的当前参数,然后,服务器将至少一个终端训练样本对应的至少一个子云输出和第一端子模型的当前参数输出至第一终端。
例如,在一些实施例中,步骤S203可以包括:针对至少一个终端训练样本中的每个终端训练样本:利用具有第一端子模型的当前参数的第一端子模型对终端训练样本对应的子云输出和终端训练样本中的终端训练特征进行处理,以得到第一端子模型的输出;基于第一端子模型的输出和终端训练样本中的样本标签,得到第一端子模型的损失值;基于损失值和第一端子模型的输出,得到端梯度。
例如,在一些实施例中,模型训练方法还包括:基于至少一个终端训练样本,确定与第一端子模型对应的当前训练进度信息;发送当前训练进度信息至服务器,以使得服务器将第一端子模型对应的存储训练进度信息更新为当前训练进度信息。
需要说明的是,关于服务器执行的具体操作,可以参考上述应用于服务器的模型训练方法的实施例中的描述,重复之处不再赘述。
图5为本公开至少一个实施例提供的一种模型训练系统的示意图。
本公开至少一个实施例还提供一种模型训练系统,该模型训练系统用于对机器学习模型进行训练,机器学习模型包括云子模型和M个端子模型,M为正整数。如图5所示,该模型训练系统1100可以包括至少一个终端1101和服务器1102,云子模型在服务器1102上运行,M个端子模型在至少一个终端1101上运行。
例如,服务器1102被配置为:获取云端训练特征;利用云端训练特征对云子模型进行训练,以得到云子模型的云端输出结果;发送云端输出结果和M个端子模型的当前参数至至少一个终端;接收至少一个终端输出的M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,N个端子模型中的每个端子模型输出的端梯度包括端子模型的参数梯度和云输出梯度;基于N个端子模型分别输出的端梯度和云端输出结果计算得到云子模型的参数梯度;利用N个端子模型的参数梯度和云子模型的参数梯度,对N个端子模型的当前参数和云子模型的当前参数进行调整。
至少一个终端1101中的每个终端被配置为:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签,云端训练特征包括与至少一个终端训练样本一一对应的至少一个子云端训练特征;从服务器1102接收与至少一个终端训练样本对应的至少一个云输出和终端上运行的端子模型的当前参数,其中,云端输出结果包括至少一个云输出;利用至少一个云输出、终端上运行的端子模型的当前参数和至少一个终端训练样本对终端上运行的端子模型进行训练,以得到终端上运行的端子模型输出的端梯度;输出终端上运行的端子模型输出的端梯度至服务器1102。
服务器1102可以用于实现图2所示的模型训练方法,至少一个终端1101中的每个终端可以用于实现图4所示的模型训练方法。关于服务器1102和终端1101可以实现的具体操作请参考上述模型训练方法的实施例,重复之处不再赘述。
下面简单描述模型训练系统中的一个服务器和一个终端进行联邦训练的整体过程。
终端定期检查是否满足训练条件,训练条件包括样本量、网络环境等信息。如果满足训练条件,终端向服务器发现训练进度查询请求以查询是否可以进行训练,训练进度查询请求中包括终端的标识信息。服务器收到终端的训练进度查询请求时,会查找正在训练的模型,并根据终端的标识信息找到该终端上运行的端子模型的存储训练进度信息。然后将端子模型的名称和存储训练进度信息返回给终端。终端接收到服务器发送的端子模型的名称和存储训练进度信息后,查找本地的训练样本集的所有可以训练的终端训练样本。终端如果查找到可以训练的终端训练样本,将端子模型的名称和样本标识列表发送给服务器。样本标识列表可以包括各个终端训练样本的样本标识。
服务器收到终端的端子模型的名称和样本标识列表,通过样本标识列表在服务器端的样本库查找与样本标识列表中的样本标识指示的终端训练样本对应的云端训练样本,每个云端训练样本包括云端训练特征。然后通过参数模块获取云子模型的当前参数,将云端训练特征输入到云子模型中以得到云子模型的输出(如果需要合并训练,需要在此等待其他终端上报终端训练样本后,合并所有云端训练特征之后一并输入给云子模型)。同时,服务器还会通过参数模块获取端子模型的当前参数。最后,将云子模型的输出和端子模型的当前参数返回给终端,同时生成并返回一个会话标识信息。
终端收到云子模型的输出和端子模型的当前参数后,将终端训练样本的终端训练特征和云子模型的输出一同输入到端子模型以得到端子模型的输出(即预测结果),然后基于端子模型的输出和终端训练样本的样本标签得到端子模型的输出的损失值。通过损失值计算梯度,即反向传播得到端子模型输出的端梯度(包括端子模型的参数梯度和云输出梯度),最后将端子模型输出的端梯度返回给到服务器,同时带上之前的会话标识信息。
服务器收到端子模型输出的端梯度后,通过云输出梯度继续反向传播得到云子模型的参数梯度(如果需要合并训练,需要再次等待其他终端上报端梯度后,再进行反向传播)。这样,服务器就得到了云子模型的参数梯度和端子模型的参数梯度,也就是整个机器学习模型的梯度,然后,将云子模型的参数梯度和端子模型的参数梯度提交给参数服务器以更新机器学习模型的参数,同时更新该端子模型对应的存储训练进度信息(存储训练进度信息也一并存储在参数服务器)。
终端向服务器发送梯度后,即可认为一轮训练完成,等待下一次训练。
图6为本公开至少一个实施例提供的一种模型训练系统进行模型训练的整体流程的示意图。例如,模型训练系统可以为图5所示的模型训练系统。
在本公开的实施例中,模型训练系统进行模型训练的整体流程包括3个部分:云子模型的前向传播、端子模型的前向传播和反向传播以及云子模型的反向传播。
在一个示例中,机器学习模型可以包括一个云子模型和一个端子模型,如图6所示,云子模型可以包括第一层layer1和第二层layer2,端子模型可以包括第三层layer3,第一层layer1、第二层layer2和第三层layer3中的每一层可以为卷积层、全连接层、池化层等。云子模型在服务器上运行,端子模型在终端上运行。图6所示的云子模型和端子模型仅是示意性的,云子模型可以包括更多的层,端子模型也可以包括更多的层。
在云子模型的前向传播中,服务器将云端训练特征输入到云子模型,得到云子模型的前向传播结果(即上述云端输出结果),然后将该前向传播结果和端子模型的当前参数一起发到终端上。如图6所示,云子模型的第一层layer1对云端训练特征进行处理,以得到第一层layer1的输出O1,云子模型的第二层layer2对第一层layer1的输出O1进行处理,以得到第二层layer2的输出O2,该第二层layer2的输出O2可以表示云子模型的云端输出结果;然后,服务器将云端输出结果和端子模型的当前参数一起发送到用于运行端子模型的终端上。
在端子模型的前向传播和反向传播中,终端收到云子模型的云端输出结果和端子模型的当前参数,加上终端自身的端输入In(包括终端训练特征)以及样本标签一起输入到端子模型以进行前向传播和反向传播,从而得到端子模型输出的端梯度,然后将该端梯度一起发到服务器上。如图6所示,端子模型的第三层layer3对云端输出结果(即第二层layer2的输出O2)和端输入In进行处理,得到第三层layer3的输出O3,该第三层layer3的输出O3即为机器学习模型的预测结果,然后,基于预测结果和样本标签,利用损失函数计算得到机器学习模型的损失值;基于损失值和第三层layer3的输出O3,计算得到端子模型的参数梯度GL3(即第三层layer3的参数的梯度)和端子模型的云输出梯度GO;最后,该端子模型的参数梯度GL3和端子模型的云
输出梯度GO被传输至服务器。
在云子模型的反向传播中,服务器收到端子模型的参数梯度和云输出梯度后,执行云子模型的反向传播过程,从而得到云子模型的参数梯度。最后,通过参数优化器更新机器学习模型的参数从而完成一轮训练。如图6所示,在云子模型的反向传播中,首先,基于端子模型的云输出梯度GO以及第二层layer2的输出O2,计算得到第二层layer2的参数梯度GL2;接着,基于第二层layer2的参数梯度GL2以及第一层layer1的输出O1,计算得到第一层layer1的参数梯度GL1。云子模型的参数梯度包括第一层layer1的参数梯度GL1和第二层layer2的参数梯度GL2。最后,参数优化器基于端子模型的参数梯度GL3更新端子模型(第三层layer3)的参数,基于第一层layer1的参数梯度GL1和第二层layer2的参数梯度GL2更新云子模型(第一层layer1和第二层layer2)的参数。
图7为本公开一些实施例提供的一种模型训练系统进行模型训练的具体训练过程的示例图,图8为本公开一些实施例提供的一种模型训练系统进行模型训练的具体训练过程的示例图。图7和图8示出了对多个终端进行合并训练的过程,图7和图8以三个终端为例进行描述。下面结合图7和图8详细描述模型训练系统进行模型训练的整体流程。
如图7所示,至少一个终端包括终端Tem1、终端Tem2和终端Tem3,M个端子模型包括由终端Tem1运行的端子模型10、由终端Tem2运行的端子模型20和由终端Tem3运行的端子模型30。
如图7和图8所示,在时刻t1,服务器接收到终端Tem1发出的训练请求,然后,基于该终端Tem1发出的训练请求,获取与终端Tem1对应的至少一个子云端训练特征CTF1,图8示出了两个子云端训练特征CTF1(每个矩形框代表一个子云端训练特征);在时刻t2,服务器接收到终端Tem2发出的训练请求,然后,基于该终端Tem2发出的训练请求,获取与终端Tem2对应的至少一个子云端训练特征CTF2,图8示出了三个子云端训练特征CTF2;在时刻t3,服务器接收到终端Tem3发出的训练请求,然后,基于该终端Tem3发出的训练请求,获取与终端Tem3对应的至少一个子云端训练特征CTF3,图8示出了两个子云端训练特征CTF3。然后,执行输入合并,即对与终端Tem1对应的至少一个子云端训练特征CTF1、与终端Tem2对应的至少一个
子云端训练特征CTF2和与终端Tem3对应的至少一个子云端训练特征CTF3进行输入合并处理,以得到云端训练特征。
例如,时刻t1、时刻t2和时刻t3中的任意两个之间的时间差的绝对值处于预设时间差范围内。在一个示例中,时刻t1、时刻t2和时刻t3可以为同一时刻。
如图7和图8所示,在得到云端训练特征之后,可以从参数模块获取云子模型的当前参数,然后,基于云端训练特征和云子模型的当前参数,执行云子模型的前向传播,从而得到云端输出结果。然后,对云端输出结果执行输出拆分操作,以得到与端子模型10对应的云输出FCO1、与端子模型20对应的云输出FCO2和端子模型30对应的云输出FCO3。同时,可以从参数模块获取各个端子模型的当前参数,然后,将云输出FCO1和端子模型10的当前参数CP1传输至终端Tem1,以供终端Tem1执行端子模型10的前向传播和反向传播,从而得到端子模型10的参数梯度GP1和端子模型10的云输出梯度GO1;将云输出FCO2和端子模型20的当前参数CP2传输至终端Tem2,以供终端Tem2执行端子模型20的前向传播和反向传播,从而得到端子模型20的参数梯度GP2和端子模型20的云输出梯度GO2;将云输出FCO3和端子模型30的当前参数CP3传输至终端Tem3,以供终端Tem3执行端子模型30的前向传播和反向传播,从而得到端子模型30的参数梯度GP3和端子模型30的云输出梯度GO3。
如图7和图8所示,终端Tem1可以将端子模型10的参数梯度GP1和端子模型10的云输出梯度GO1传输至服务器,终端Tem2可以将端子模型20的参数梯度GP2和端子模型20的云输出梯度GO2传输至服务器,终端Tem3可以将端子模型30的参数梯度GP3和端子模型30的云输出梯度GO3传输至服务器。在接收到各个终端传输的梯度之后,服务器可以执行梯度合并。例如,服务器可以将端子模型10的参数梯度GP1、端子模型20的参数梯度GP2和端子模型30的参数梯度GP3进行合并,以得到合并参数梯度;将端子模型10的云输出梯度GO1、端子模型20的云输出梯度GO2和端子模型30的云输出梯度GO3进行合并,以得到合并输出梯度。
如图7和图8所示,在得到合并输出梯度之后,可以执行云子模型的反向传播,从而得到云子模型的参数梯度。机器学习模型的梯度可以包括云子
模型的参数梯度和各个端子模型的参数梯度(即GP1~GP3,合并参数梯度)。然后,参数模块可以包括参数优化器,参数优化器可以接收合并参数梯度和云子模型的参数梯度,然后基于云子模型的参数梯度对云子模型的参数进行调整,以更新云子模型的参数;基于合并参数梯度对端子模型10的参数、端子模型20的参数和端子模型30的参数进行调整,以更新端子模型10的参数、端子模型20的参数和端子模型30的参数。至此完成一次模型训练的过程。
需要说明的是,在另一些实施例中,也可以不对各个端子模型的参数梯度进行合并,而是直接将各个端子模型的参数梯度输入到参数优化器,参数优化器基于各个端子模型的参数梯度分别对各个端子模型的参数进行调整。例如,此时,参数优化器可以基于端子模型10的参数梯度GP1对端子模型10的参数进行调整,以更新端子模型10的参数;基于端子模型20的参数梯度GP2对端子模型20的参数进行调整,以更新端子模型20的参数;基于端子模型30的参数梯度GP3对端子模型30的参数进行调整,以更新端子模型30的参数。
图9为本公开至少一个实施例提供的一种模型训练装置的示意性框图。
本公开至少一个实施例还提供一种模型训练装置,如图9所示,该模型训练装置1000可以包括一个或多个存储器1001和一个或多个处理器1002。应当注意,模型训练装置1000的组件只是示例性的,而非限制性的,根据实际应用需要,该模型训练装置1000还可以具有其他组件,本公开的实施例对此不作具体限制。
例如,一个或多个存储器1001用于非瞬时性地存储有计算机可执行指令;一个或多个处理器1002被配置为运行计算机可执行指令。计算机可执行指令被一个或多个处理器1002运行时实现根据本公开任一实施例所述的模型训练方法中的一个或多个步骤。例如,该模型训练装置1000可以用于图2所示的模型训练方法和/或图4所示的模型训练方法。
关于模型训练方法的各个步骤的具体实现以及相关解释内容可以参见上述模型训练方法的实施例,重复之处在此不作赘述。
例如,存储器1001和处理器1002之间可以直接或间接地互相通信。例如,在一些实施例中,模型训练装置1000还可以包括通信接口和通信总线。
存储器1001、处理器1002和通信接口可以通过通信总线实现相互通信,存储器1001、处理器1002和通信接口等组件之间也可以通过网络连接进行通信,网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。本公开对网络的类型和功能在此不作限制。
例如,通信总线可以是外设部件互连标准(PCI)总线或扩展工业标准结构(EISA)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。
例如,通信接口用于实现模型训练装置1000与其他设备之间的通信。通信接口可以为通用串行总线(Universal Serial Bus,USB)接口等。
例如,存储器1001和处理器1002可以设置在服务器端(或云端)。
例如,处理器1002可以控制模型训练装置中的其它组件以执行期望的功能。处理器可以是中央处理器(CPU)、图形处理器(GPU)、网络处理器(NP)等;处理器还可以为具有模型训练能力和/或程序执行能力的其它形式的处理单元,例如,数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、张量处理单元(TPU)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理元(CPU)可以为X86或ARM架构等。
例如,存储器1001可以为计算机可读介质,且可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器可以运行所述计算机可读指令,以实现模型训练装置1000的各种功能。在存储介质中还可以存储各种应用程序和各种数据等。
关于模型训练装置可以实现的技术效果可以参考上述模型训练方法的实施例中的相关描述,重复之处不再赘述。
图10为本公开至少一个实施例提供的一种非瞬时性计算机可读存储介质的示意图。例如,如图10所示,在非瞬时性计算机可读存储介质2000上可以非暂时性地存储一个或多个计算机可执行指令2001。例如,当计算机可
执行指令2001由处理器执行时可以执行根据本公开任一实施例所述的模型训练方法中的一个或多个步骤。
例如,该非瞬时性计算机可读存储介质2000可以应用于上述模型训练装置1000中。例如,非瞬时性计算机可读存储介质2000可以包括上述模型训练装置1000中的存储器1001。
例如,关于非瞬时性计算机可读存储介质2000的说明可以参考模型训练装置1000的实施例中对于存储器1001的描述,重复之处不再赘述。
下面参考图11,图11示出了适于用来实现本公开实施例的电子设备3000的结构示意图。该电子设备3000可以为终端(例如,计算机)或处理器等,并可用于执行上述实施例的模型训练方法。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)、可穿戴电子设备等等的移动终端以及诸如数字TV、台式计算机、智能家居设备等等的固定终端。图11示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图11所示,电子设备3000可以包括处理装置(例如中央处理器、图形处理器等)3001,其可以根据存储在只读存储器(ROM)3002中的程序或者从存储装置3008加载到随机访问存储器(RAM)3003中的程序而执行各种适当的动作和处理。在RAM 3003中,还存储有电子设备3000操作所需的各种程序和数据。处理装置3001、ROM 3002以及RAM 3003通过总线3004彼此相连。输入/输出(I/O)接口3005也连接至总线3004。
通常,以下装置可以连接至I/O接口3005:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置3006;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置3007;包括例如磁带、硬盘等的存储装置3008;以及通信装置3009。通信装置3009可以允许电子设备3000与其他设备进行无线或有线通信以交换数据。虽然图11示出了具有各种装置的电子设备3000,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码,以执行根据上文所述的模型训练方法中的一个或多个步骤。在这样的实施例中,该计算机程序可以通过通信装置3009从网络上被下载和安装,或者从存储装置3008被安装,或者从ROM 3002被安装。在该计算机程序被处理装置3001执行时,可以使得处理装置3001执行本公开实施例的模型训练方法中限定的上述功能。
需要说明的是,在本公开的上下文中,计算机可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是,但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述介质的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络(包括局域网(LAN)或广域网(WAN))连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。例如,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
可以理解的是,在使用本公开各实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围、使用场景等告知用户并获得用户的授权。
例如,在响应于接收到用户的主动请求时,向用户发送提示信息,以明
确地提示用户,其请求执行的操作将需要获取和使用到用户的用户信息。从而,使得用户可以根据提示信息来自主地选择是否向执行本公开技术方案的操作的电子设备、应用程序、服务器或存储介质等软件或硬件提供用户信息。
作为一种可选的但非限定性的实现方式,响应于接收到用户的主动请求,向用户发送提示信息的方式例如可以是弹窗的方式,弹窗中可以以文字的方式呈现提示信息。此外,弹窗中还可以承载供用户选择“同意”或者“不同意”向电子设备提供用户信息的选择控件。
可以理解的是,上述通知和获取用户授权过程仅是示意性的,不对本公开的实现方式构成限定,其它满足相关法律法规的方式也可应用于本公开的实现方式中。
可以理解的是,本技术方案所涉及的数据(包括但不限于数据本身、数据的获取或使用)应当遵循相应法律法规及相关规定的要求。
第一方面,根据本公开的一个或多个实施例,一种模型训练方法,应用于服务器且用于对机器学习模型进行训练,其中,所述机器学习模型包括云子模型和M个端子模型,所述云子模型在所述服务器上运行,所述M个端子模型在至少一个终端上运行,M为正整数,所述模型训练方法包括:获取云端训练特征;利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果;发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端;接收所述至少一个终端输出的所述M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,所述N个端子模型中的每个端子模型输出的端梯度包括所述端子模型的参数梯度和云输出梯度;基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度;利用所述N个端子模型的参数梯度和所述云子模型的参数梯度,对所述N个端子模型的当前参数和所述云子模型的当前参数进行调整。
根据本公开的一个或多个实施例,所述M个端子模型分别一一对应M个存储训练进度信息,所述M个存储训练进度信息存储在所述服务器。
根据本公开的一个或多个实施例,模型训练方法还包括:针对所述至少一个终端中的每个终端:从所述终端接收在所述终端上运行的每个端子模型的当前训练进度信息;基于所述当前训练进度信息调整所述每个端子模型对
应的存储训练进度信息。
根据本公开的一个或多个实施例,每个终端存储有用于训练在所述终端上运行的所有端子模型的训练样本集,所述训练样本集包括多个终端训练样本,每个终端训练样本包括终端训练特征和样本标签,所述多个终端训练样本中的终端训练特征按顺序依次生成,且每个终端训练样本具有对应的训练进度标识,在所述每个端子模型的当前训练进度信息表示所述训练样本集中的已经用于训练所述每个端子模型的所有终端训练样本中的最后生成的终端训练样本的训练进度标识。
根据本公开的一个或多个实施例,所述至少一个终端包括第一终端,其中,获取云端训练特征包括:接收所述第一终端发出的训练请求,其中,所述第一终端发出的训练请求包括所述第一终端的标识信息;基于所述第一终端发出的训练请求,获取至少一个第一子云端训练特征,其中,所述云端训练特征包括所述至少一个第一子云端训练特征。
根据本公开的一个或多个实施例,所述至少一个终端还包括第二终端,其中,获取云端训练特征还包括:接收所述第二终端发出的训练请求,其中,所述第二终端发出的训练请求包括所述第二终端的标识信息;基于所述第二终端发出的训练请求,获取至少一个第二子云端训练特征;对所述至少一个第一子云端训练特征、所述至少一个第二子云端训练特征进行合并处理,以得到所述云端训练特征。
根据本公开的一个或多个实施例,所述第一终端发出的训练请求的时刻和所述第二终端发出的训练请求的时刻之间的时间差的绝对值处于时间差范围内。
根据本公开的一个或多个实施例,M大于1,所述M个端子模型包括第一端子模型和第二端子模型,所述至少一个终端包括第一终端和第二终端,所述第一端子模型在所述第一终端上运行,所述第二端子模型在所述第二终端上运行,发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端,包括:对所述云端输出结果进行拆分处理,以得到与所述第一端子模型对应的第一云输出和与所述第二端子模型对应的第二云输出;获取所述第一端子模型的当前参数和所述第二端子模型的当前参数;将所述第一云输出和所述第一端子模型的当前参数传输至所述第一终端;将所述第二
云输出和所述第二端子模型的当前参数传输至所述第二终端。
根据本公开的一个或多个实施例,所述M个端子模型包括第一端子模型和第三端子模型,所述至少一个终端包括第一终端,所述第一端子模型和所述第三端子模型均在所述第一终端上运行,发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端,包括:对所述云端输出结果进行拆分处理,以得到与所述第一端子模型对应的第一云输出和与所述第三端子模型对应的第三云输出;获取所述第一端子模型的当前参数和所述第三端子模型的当前参数;将所述第一云输出、所述第三云输出、所述第一端子模型的当前参数和所述第三端子模型的当前参数传输至所述第一终端。
根据本公开的一个或多个实施例,利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果,包括:获取所述云子模型的当前参数,其中,所述云子模型的当前参数表示在获取所述云端训练特征时所述云子模型的参数;利用所述云端训练特征对具有所述云子模型的当前参数的所述云子模型进行训练,以得到所述云子模型的云端输出结果。
根据本公开的一个或多个实施例,M大于1,N大于1,基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度,包括:对所述N个端子模型的云输出梯度进行合并处理以得到合并输出梯度;基于所述合并输出梯度和所述云端输出结果计算得到所述云子模型的参数梯度。
根据本公开的一个或多个实施例,所述M个端子模型的输入与所述云子模型的输出匹配。
根据本公开的一个或多个实施例,模型训练方法还包括:接收每个终端发送的与由所述终端运行的端子模型对应的训练进度查询请求;基于所述训练进度查询请求,获取与所述端子模型对应的存储训练进度信息;输出所述存储训练进度信息至所述终端,以供所述终端基于所述存储训练进度信息进行样本筛选操作;其中,响应于所述样本筛选操作得到至少一个终端训练样本,所述终端发送训练请求至所述服务器以进行模型训练。
第二方面,根据本公开的一个或多个实施例,一种模型训练方法,应用于第一终端且用于对机器学习模型进行训练,其中,所述机器学习模型包括云子模型和第一端子模型,所述云子模型在服务器上运行,所述第一端子模
型在所述第一终端上运行,其中,所述模型训练方法包括:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签;基于所述至少一个终端训练样本,发送训练请求至所述服务器;从所述服务器接收与所述至少一个终端训练样本对应的云输出和所述第一端子模型的当前参数;利用所述云输出、所述第一端子模型的当前参数和所述至少一个终端训练样本对所述第一端子模型进行训练,以得到所述第一端子模型输出的端梯度,其中,所述端梯度包括所述第一端子模型的参数梯度和云输出梯度;输出所述端梯度至所述服务器,以使得所述服务器基于所述端梯度和所述云输出计算得到所述云子模型的参数梯度,并利用所述第一端子模型的参数梯度和所述云子模型的参数梯度,对所述第一端子模型的当前参数和所述云子模型的当前参数进行调整。
根据本公开的一个或多个实施例,获取至少一个终端训练样本,包括:发送训练进度查询请求至所述服务器;从所述服务器接收所述第一端子模型对应的存储训练进度信息;基于所述存储训练进度信息进行样本筛选操作;响应于所述样本筛选操作得到K个终端训练样本,基于所述K个终端训练样本获取所述至少一个终端训练样本,其中,K为正整数。
根据本公开的一个或多个实施例,模型训练方法还包括:基于所述至少一个终端训练样本,确定与所述第一端子模型对应的当前训练进度信息;发送所述当前训练进度信息至所述服务器,以使得所述服务器将所述第一端子模型对应的存储训练进度信息更新为所述当前训练进度信息。
根据本公开的一个或多个实施例,所述云输出包括与所述至少一个终端训练样本一一对应的至少一个子云输出,利用所述云输出、所述第一端子模型的当前参数和所述至少一个终端训练样本对所述第一端子模型进行训练,以得到所述第一端子模型输出的端梯度,包括:针对所述至少一个终端训练样本中的每个终端训练样本:利用具有所述第一端子模型的当前参数的所述第一端子模型对所述终端训练样本对应的子云输出和所述终端训练样本中的终端训练特征进行处理,以得到所述第一端子模型的输出;基于所述第一端子模型的输出和所述终端训练样本中的样本标签,得到所述第一端子模型的损失值;基于所述损失值和所述第一端子模型的输出,得到所述端梯度。
第三方面,根据本公开的一个或多个实施例,一种模型训练装置,包括:
一个或多个存储器,非瞬时性地存储有计算机可执行指令;一个或多个处理器,配置为运行所述计算机可执行指令,其中,所述计算机可执行指令被所述一个或多个处理器运行时实现根据本公开任一实施例所述的模型训练方法。
第四方面,根据本公开的一个或多个实施例,一种模型训练系统,用于对机器学习模型进行训练且包括:至少一个终端和服务器,其中,所述机器学习模型包括云子模型和M个端子模型,所述云子模型在所述服务器上运行,所述M个端子模型在所述至少一个终端上运行,M为正整数,所述服务器被配置为:获取云端训练特征;利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果;发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端;接收所述至少一个终端输出的所述M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,所述N个端子模型中的每个端子模型输出的端梯度包括所述端子模型的参数梯度和云输出梯度;基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度;利用所述N个端子模型的参数梯度和所述云子模型的参数梯度,对所述N个端子模型的当前参数和所述云子模型的当前参数进行调整;所述至少一个终端中的每个终端被配置为:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签,所述云端训练特征包括与所述至少一个终端训练样本一一对应的至少一个子云端训练特征;从所述服务器接收与所述至少一个终端训练样本对应的云输出和所述终端上运行的端子模型的当前参数,其中,所述云端输出结果包括所述云输出;利用所述云输出、所述终端上运行的端子模型的当前参数和所述至少一个终端训练样本对所述终端上运行的端子模型进行训练,以得到所述终端上运行的端子模型输出的端梯度;输出所述终端上运行的端子模型输出的端梯度至所述服务器。
第五方面,根据本公开的一个或多个实施例,一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据本公开任一实施例所述的模型训练方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领
域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。
对于本公开,还有以下几点需要说明:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。
Claims (20)
- 一种模型训练方法,应用于服务器且用于对机器学习模型进行训练,其中,所述机器学习模型包括云子模型和M个端子模型,所述云子模型在所述服务器上运行,所述M个端子模型在至少一个终端上运行,M为正整数,所述模型训练方法包括:获取云端训练特征;利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果;发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端;接收所述至少一个终端输出的所述M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,所述N个端子模型中的每个端子模型输出的端梯度包括所述端子模型的参数梯度和云输出梯度;基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度;利用所述N个端子模型的参数梯度和所述云子模型的参数梯度,对所述N个端子模型的当前参数和所述云子模型的当前参数进行调整。
- 根据权利要求1所述的模型训练方法,其中,所述M个端子模型分别一一对应M个存储训练进度信息,所述M个存储训练进度信息存储在所述服务器。
- 根据权利要求2所述的模型训练方法,还包括:针对所述至少一个终端中的每个终端:从所述终端接收在所述终端上运行的每个端子模型的当前训练进度信息;基于所述当前训练进度信息调整所述每个端子模型对应的存储训练进度信息。
- 根据权利要求3所述的模型训练方法,其中,每个终端存储有用于训练在所述终端上运行的所有端子模型的训练样本集,所述训练样本集包括多 个终端训练样本,每个终端训练样本包括终端训练特征和样本标签,所述多个终端训练样本中的终端训练特征按顺序依次生成,且每个终端训练样本具有对应的训练进度标识,在所述每个端子模型的当前训练进度信息表示所述训练样本集中的已经用于训练所述每个端子模型的所有终端训练样本中的最后生成的终端训练样本的训练进度标识。
- 根据权利要求1~4任一项所述的模型训练方法,其中,所述至少一个终端包括第一终端,其中,所述获取云端训练特征包括:接收所述第一终端发出的训练请求,其中,所述第一终端发出的训练请求包括所述第一终端的标识信息;基于所述第一终端发出的训练请求,获取至少一个第一子云端训练特征,其中,所述云端训练特征包括所述至少一个第一子云端训练特征。
- 根据权利要求5所述的模型训练方法,其中,所述至少一个终端还包括第二终端,其中,所述获取云端训练特征还包括:接收所述第二终端发出的训练请求,其中,所述第二终端发出的训练请求包括所述第二终端的标识信息;基于所述第二终端发出的训练请求,获取至少一个第二子云端训练特征;对所述至少一个第一子云端训练特征、所述至少一个第二子云端训练特征进行合并处理,以得到所述云端训练特征。
- 根据权利要求6所述的模型训练方法,其中,所述第一终端发出的训练请求的时刻和所述第二终端发出的训练请求的时刻之间的时间差的绝对值处于时间差范围内。
- 根据权利要求1~7任一项所述的模型训练方法,其中,M大于1,所述M个端子模型包括第一端子模型和第二端子模型,所述至少一个终端包括第一终端和第二终端,所述第一端子模型在所述第一终端上运行,所述第二端子模型在所述第二终端上运行,所述发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端,包括:对所述云端输出结果进行拆分处理,以得到与所述第一端子模型对应的第一云输出和与所述第二端子模型对应的第二云输出;获取所述第一端子模型的当前参数和所述第二端子模型的当前参数;将所述第一云输出和所述第一端子模型的当前参数传输至所述第一终端;将所述第二云输出和所述第二端子模型的当前参数传输至所述第二终端。
- 根据权利要求1~7任一项所述的模型训练方法,其中,所述M个端子模型包括第一端子模型和第三端子模型,所述至少一个终端包括第一终端,所述第一端子模型和所述第三端子模型均在所述第一终端上运行,所述发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端,包括:对所述云端输出结果进行拆分处理,以得到与所述第一端子模型对应的第一云输出和与所述第三端子模型对应的第三云输出;获取所述第一端子模型的当前参数和所述第三端子模型的当前参数;将所述第一云输出、所述第三云输出、所述第一端子模型的当前参数和所述第三端子模型的当前参数传输至所述第一终端。
- 根据权利要求1~9任一项所述的模型训练方法,其中,所述利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果,包括:获取所述云子模型的当前参数,其中,所述云子模型的当前参数表示在获取所述云端训练特征时所述云子模型的参数;利用所述云端训练特征对具有所述云子模型的当前参数的所述云子模型进行训练,以得到所述云子模型的云端输出结果。
- 根据权利要求1~10任一项所述的模型训练方法,其中,M大于1,N大于1,所述基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度,包括:对所述N个端子模型的云输出梯度进行合并处理以得到合并输出梯度;基于所述合并输出梯度和所述云端输出结果计算得到所述云子模型的 参数梯度。
- 根据权利要求1~11任一项所述的模型训练方法,其中,所述M个端子模型的输入与所述云子模型的输出匹配。
- 根据权利要求1~12任一项所述的模型训练方法,还包括:接收每个终端发送的与由所述终端运行的端子模型对应的训练进度查询请求;基于所述训练进度查询请求,获取与所述端子模型对应的存储训练进度信息;输出所述存储训练进度信息至所述终端,以供所述终端基于所述存储训练进度信息进行样本筛选操作;其中,响应于所述样本筛选操作得到至少一个终端训练样本,所述终端发送训练请求至所述服务器以进行模型训练。
- 一种模型训练方法,应用于第一终端且用于对机器学习模型进行训练,其中,所述机器学习模型包括云子模型和第一端子模型,所述云子模型在服务器上运行,所述第一端子模型在所述第一终端上运行,其中,所述模型训练方法包括:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签;基于所述至少一个终端训练样本,发送训练请求至所述服务器;从所述服务器接收与所述至少一个终端训练样本对应的云输出和所述第一端子模型的当前参数;利用所述云输出、所述第一端子模型的当前参数和所述至少一个终端训练样本对所述第一端子模型进行训练,以得到所述第一端子模型输出的端梯度,其中,所述端梯度包括所述第一端子模型的参数梯度和云输出梯度;输出所述端梯度至所述服务器,以使得所述服务器基于所述端梯度和所述云输出计算得到所述云子模型的参数梯度,并利用所述第一端子模型的参数梯度和所述云子模型的参数梯度,对所述第一端子模型的当前参数和所述云子模型的当前参数进行调整。
- 根据权利要求14所述的模型训练方法,其中,所述获取至少一个终端训练样本,包括:发送训练进度查询请求至所述服务器;从所述服务器接收所述第一端子模型对应的存储训练进度信息;基于所述存储训练进度信息进行样本筛选操作;响应于所述样本筛选操作得到K个终端训练样本,基于所述K个终端训练样本获取所述至少一个终端训练样本,其中,K为正整数。
- 根据权利要求14所述的模型训练方法,还包括:基于所述至少一个终端训练样本,确定与所述第一端子模型对应的当前训练进度信息;发送所述当前训练进度信息至所述服务器,以使得所述服务器将所述第一端子模型对应的存储训练进度信息更新为所述当前训练进度信息。
- 根据权利要求14~16任一项所述的模型训练方法,其中,所述云输出包括与所述至少一个终端训练样本一一对应的至少一个子云输出,所述利用所述云输出、所述第一端子模型的当前参数和所述至少一个终端训练样本对所述第一端子模型进行训练,以得到所述第一端子模型输出的端梯度,包括:针对所述至少一个终端训练样本中的每个终端训练样本:利用具有所述第一端子模型的当前参数的所述第一端子模型对所述终端训练样本对应的子云输出和所述终端训练样本中的终端训练特征进行处理,以得到所述第一端子模型的输出;基于所述第一端子模型的输出和所述终端训练样本中的样本标签,得到所述第一端子模型的损失值;基于所述损失值和所述第一端子模型的输出,得到所述端梯度。
- 一种模型训练装置,包括:一个或多个存储器,非瞬时性地存储有计算机可执行指令;一个或多个处理器,配置为运行所述计算机可执行指令,其中,所述计算机可执行指令被所述一个或多个处理器运行时实现根据权利要求1~17任一项所述的模型训练方法。
- 一种模型训练系统,用于对机器学习模型进行训练且包括:至少一个终端和服务器,其中,所述机器学习模型包括云子模型和M个端子模型,所述云子模型 在所述服务器上运行,所述M个端子模型在所述至少一个终端上运行,M为正整数,所述服务器被配置为:获取云端训练特征;利用所述云端训练特征对所述云子模型进行训练,以得到所述云子模型的云端输出结果;发送所述云端输出结果和所述M个端子模型的当前参数至所述至少一个终端;接收所述至少一个终端输出的所述M个端子模型中的N个端子模型分别输出的端梯度,其中,N为正整数,且小于等于M,所述N个端子模型中的每个端子模型输出的端梯度包括所述端子模型的参数梯度和云输出梯度;基于所述N个端子模型分别输出的端梯度和所述云端输出结果计算得到所述云子模型的参数梯度;利用所述N个端子模型的参数梯度和所述云子模型的参数梯度,对所述N个端子模型的当前参数和所述云子模型的当前参数进行调整;所述至少一个终端中的每个终端被配置为:获取至少一个终端训练样本,其中,每个终端训练样本包括终端训练特征和样本标签,所述云端训练特征包括与所述至少一个终端训练样本一一对应的至少一个子云端训练特征;从所述服务器接收与所述至少一个终端训练样本对应的云输出和所述终端上运行的端子模型的当前参数,其中,所述云端输出结果包括所述云输出;利用所述云输出、所述终端上运行的端子模型的当前参数和所述至少一个终端训练样本对所述终端上运行的端子模型进行训练,以得到所述终端上运行的端子模型输出的端梯度;输出所述终端上运行的端子模型输出的端梯度至所述服务器。
- 一种非瞬时性计算机可读存储介质,其中,所述非瞬时性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令被处理器执行时实现根据权利要求1~17任一项所述的模型训练方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211117189.3 | 2022-09-14 | ||
CN202211117189.3A CN117744826A (zh) | 2022-09-14 | 2022-09-14 | 模型训练方法、装置以及系统和存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024055979A1 true WO2024055979A1 (zh) | 2024-03-21 |
Family
ID=90274295
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/118478 WO2024055979A1 (zh) | 2022-09-14 | 2023-09-13 | 模型训练方法、装置以及系统和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117744826A (zh) |
WO (1) | WO2024055979A1 (zh) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241580A (zh) * | 2020-01-09 | 2020-06-05 | 广州大学 | 一种基于可信执行环境的联邦学习方法 |
CN112446544A (zh) * | 2020-12-01 | 2021-03-05 | 平安科技(深圳)有限公司 | 交通流预测模型训练方法、装置、电子设备及存储介质 |
US20210406767A1 (en) * | 2020-06-28 | 2021-12-30 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Distributed Training Method and System, Device and Storage Medium |
CN114202018A (zh) * | 2021-11-29 | 2022-03-18 | 新智我来网络科技有限公司 | 一种模块化的联合学习方法及系统 |
CN114530245A (zh) * | 2022-02-25 | 2022-05-24 | 山东浪潮科学研究院有限公司 | 一种基于边缘计算和联邦学习的云边协调医疗系统 |
-
2022
- 2022-09-14 CN CN202211117189.3A patent/CN117744826A/zh active Pending
-
2023
- 2023-09-13 WO PCT/CN2023/118478 patent/WO2024055979A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111241580A (zh) * | 2020-01-09 | 2020-06-05 | 广州大学 | 一种基于可信执行环境的联邦学习方法 |
US20210406767A1 (en) * | 2020-06-28 | 2021-12-30 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Distributed Training Method and System, Device and Storage Medium |
CN112446544A (zh) * | 2020-12-01 | 2021-03-05 | 平安科技(深圳)有限公司 | 交通流预测模型训练方法、装置、电子设备及存储介质 |
CN114202018A (zh) * | 2021-11-29 | 2022-03-18 | 新智我来网络科技有限公司 | 一种模块化的联合学习方法及系统 |
CN114530245A (zh) * | 2022-02-25 | 2022-05-24 | 山东浪潮科学研究院有限公司 | 一种基于边缘计算和联邦学习的云边协调医疗系统 |
Also Published As
Publication number | Publication date |
---|---|
CN117744826A (zh) | 2024-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016094105A1 (en) | Context-driven multi-user communication | |
CN113327598B (zh) | 模型的训练方法、语音识别方法、装置、介质及设备 | |
US10387161B2 (en) | Techniques for capturing state information and performing actions for threads in a multi-threaded computing environment | |
CN115130065B (zh) | 供应端特征信息处理方法、装置、设备和计算机可读介质 | |
US20220101199A1 (en) | Point-of-interest recommendation | |
CN116703131A (zh) | 电力资源分配方法、装置、电子设备和计算机可读介质 | |
WO2024199314A1 (zh) | 推荐、推荐模型的训练方法、装置、电子设备及存储介质 | |
CN115759444A (zh) | 电力设备分配方法、装置、电子设备和计算机可读介质 | |
WO2024198920A1 (zh) | 内容推送模型的训练方法、装置、设备及存储介质 | |
WO2024188197A1 (zh) | 虚拟资源的处理方法、装置、设备、存储介质和程序产品 | |
WO2017206893A1 (zh) | 界面刷新同步方法、装置、终端及存储介质 | |
CN117236805B (zh) | 电力设备控制方法、装置、电子设备和计算机可读介质 | |
US20150188991A1 (en) | Simulated tethering of computing devices | |
WO2024055979A1 (zh) | 模型训练方法、装置以及系统和存储介质 | |
CN117241092A (zh) | 一种视频处理方法、装置、存储介质及电子设备 | |
CN114528893A (zh) | 机器学习模型训练方法、电子设备及存储介质 | |
CN115907136B (zh) | 电动汽车调度方法、装置、设备和计算机可读介质 | |
CN115022328B (zh) | 服务器集群以及服务器集群的测试方法、装置和电子设备 | |
WO2017063142A1 (zh) | 一种账户注册的方法、终端和服务器 | |
CN114697206B (zh) | 物联网节点管理方法、装置、设备和计算机可读介质 | |
CN112346870A (zh) | 模型处理方法及系统 | |
JP2017129921A (ja) | タスク実行方法およびタスク実行システム | |
CN118283355B (zh) | 观众列表管理方法、装置、设备、可读存储介质 | |
CN117692672B (zh) | 基于快照的视频信息发送方法、装置、电子设备和介质 | |
CN116521377B (zh) | 业务计算卸载方法、系统、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23864711 Country of ref document: EP Kind code of ref document: A1 |