CN114943274A

CN114943274A - Model training method, device, storage medium, server, terminal and system

Info

Publication number: CN114943274A
Application number: CN202210394839.2A
Authority: CN
Inventors: 郑龙飞; 胡晓龙; 王力
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2022-08-26

Abstract

One or more embodiments of the specification disclose a model training method, device, storage medium, server, terminal and system. According to the first output layer data output by the first model in each feature training terminal, the second output layer data output by the second model in the server can be obtained, then according to the real loss function returned by the label training terminal based on the second output layer data, the first gradient of the real loss function relative to the second output layer data of the second model in the server can be obtained, further, the first simulation loss function of the second model can be obtained based on the first gradient, then the second model can be updated based on the first simulation loss function, and the first model can also be updated.

Description

Model training method, device, storage medium, server, terminal and system

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a model training method, an apparatus, a storage medium, a server, a terminal, and a system.

Background

With the development of artificial intelligence technology, Deep learning has been gradually applied to the fields of risk assessment, speech recognition, face recognition, natural language processing and the like, wherein Deep Neural Networks (DNNs) are the basis of Deep learning.

In the related art, in order to solve the data islanding problem in deep learning, a deep neural network model can be split into different devices for split learning training, but a model training method with higher computational efficiency needs to be provided in the split learning training process.

Disclosure of Invention

One or more embodiments of the present disclosure provide a model training method, an apparatus, a storage medium, a server, a terminal, and a system, which can improve the computation efficiency of a deep neural network model in a split learning training process.

One or more embodiments of the present specification provide a model training method, applied to a server, the method including:

acquiring first output layer data output by a first model in each characteristic training terminal, and carrying out forward propagation on a second model in the server based on each first output layer data to obtain second output layer data output by the second model;

sending the second output layer data to a label training terminal, and acquiring a first gradient of a real loss function relative to the second output layer data from the label training terminal;

obtaining a first simulation loss function of the second model according to the first gradient and the second output layer data, and updating the second model based on the first simulation loss function;

and calculating second gradients of the first simulation loss function relative to the first output layer data, and sending the second gradients to the feature training terminals, so that the feature training terminals update the first models based on the second gradients.

One or more embodiments of the present specification provide a model training method applied to a feature training terminal, where the method includes:

forward propagation of a first model in the feature training terminal is carried out based on feature data in a feature data set, and first output layer data output by the first model is obtained;

sending the first output layer data to a server, and obtaining a second gradient for the first output layer data from the server;

updating the first model based on the second gradient;

the second gradient is that the server forwards transmits a second model in the server based on the first output layer data to obtain second output layer data output by the second model, the second output layer data is sent to a label training terminal, a first gradient of a real loss function relative to the second output layer data is obtained from the label training terminal, a first simulation loss function of the second model is obtained according to the first gradient and the second output layer data, and the gradient of the first simulation loss function relative to the first output layer data is calculated.

One or more embodiments of the present specification provide a model training apparatus, applied to a server, the apparatus including:

the server propagation module is used for acquiring first output layer data output by the first model in each feature training terminal, and performing forward propagation on a second model in the server based on each first output layer data to obtain second output layer data output by the second model;

the first gradient acquisition module is used for sending the second output layer data to a label training terminal and acquiring a first gradient of a real loss function relative to the second output layer data from the label training terminal;

the second model updating module is used for solving a first simulation loss function of the second model according to the first gradient and the second output layer data and updating the second model based on the first simulation loss function;

and the second gradient sending module is used for calculating second gradients of the first simulation loss functions relative to the data of the first output layers and sending the second gradients to the feature training terminals so that the feature training terminals update the first models based on the second gradients.

One or more embodiments of the present specification provide a model training apparatus, applied to a feature training terminal, the apparatus including:

the terminal forward propagation module is used for carrying out forward propagation on a first model in the characteristic training terminal based on characteristic data in a characteristic data set to obtain first output layer data output by the first model;

a second gradient acquisition module to send the first output layer data to a server and to acquire a second gradient for the first output layer data from the server;

a first model update module to update the first model based on the second gradient;

the second gradient is obtained by the server through forward propagation of a second model in the server based on the first output layer data, the second output layer data output by the second model is obtained, the second output layer data is sent to a label training terminal, a first gradient of a real loss function relative to the second output layer data is obtained from the label training terminal, a first simulation loss function of the second model is obtained according to the first gradient and the second output layer data, and the gradient of the first simulation loss function relative to the first output layer data is calculated.

One or more embodiments of the present specification provide a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the steps of performing the method described above.

One or more embodiments of the present specification provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to carry out the steps of the method as described above.

One or more embodiments of the present specification provide a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being adapted to be loaded by the processor and to perform the steps of the method as described above.

One or more embodiments of the present specification provide a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being adapted to be loaded by the processor and to perform the steps of the method as described above.

One or more embodiments of the present specification provide a model training system, which includes the server and the terminal.

The technical scheme provided by one or more embodiments of the present specification has the following beneficial effects:

one or more embodiments of the present disclosure provide a type training method, which includes obtaining first output layer data output by a first model in each feature training terminal, and performing forward propagation of a second model in a server based on the first output layer data to obtain second output layer data output by the second model; sending the second output layer data to a label training terminal, and acquiring a first gradient of a real loss function relative to the second output layer data from the label training terminal; obtaining a first simulation loss function of the second model according to the first gradient and the second output layer data, and updating the second model based on the first simulation loss function; and calculating second gradients of the first simulation loss function relative to the first output layer data, and sending the second gradients to the feature training terminals, so that the feature training terminals update the first models based on the second gradients. After the first gradient of the real loss function relative to the second output layer data is obtained, the first simulation loss function of the second model can be obtained, the second model is updated based on the first simulation loss function, the first model is updated, the calculated amount of the model during updating can be reduced, and the training efficiency of the model in the splitting learning training process is effectively improved.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the description below are only some examples of one or more embodiments of the present specification, and that other drawings may be derived by those skilled in the art without inventive effort.

FIG. 1 is an exemplary system architecture diagram of a model training method provided in an exemplary embodiment of the present description;

FIG. 2 is a schematic diagram of a DNN model provided in an exemplary embodiment of the present description;

FIG. 3 is a diagram illustrating a split learning training provided by an exemplary embodiment of the present description;

FIG. 4 is a schematic flow chart diagram illustrating a model training method provided in an exemplary embodiment of the present disclosure;

FIG. 5 is a schematic flow chart diagram illustrating a model training method provided in an exemplary embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a method for training a model according to an exemplary embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of a model training apparatus provided in an exemplary embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of a model training apparatus provided in an exemplary embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a server according to an exemplary embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of a terminal according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to make the features and advantages of one or more embodiments of the present disclosure more apparent and understandable, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is apparent that the described embodiments are only a part of the embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in one or more embodiments of the present specification without making any creative effort fall within the protection scope of one or more embodiments of the present specification.

The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as set forth in the claims below.

Fig. 1 is an exemplary system architecture diagram of a model training method provided in an exemplary embodiment of the present specification.

As shown in fig. 1, the system architecture may include a terminal 101, a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminals 101 and servers 103. Network 102 may include various types of wired or wireless communication links, such as: the wired communication link includes an optical fiber, a twisted pair wire or a coaxial cable, and the Wireless communication link includes a bluetooth communication link, a Wireless-Fidelity (Wi-Fi) communication link, a microwave communication link, or the like.

The terminal 101 may interact with the server 103 through the network 102 to receive messages from the server 103 or to send messages to the server 103. The terminal 101 may be hardware or software. When the terminal 101 is hardware, it can be a variety of electronic devices including, but not limited to, smart watches, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal 101 is software, it may be installed in the electronic device listed above, and it may be implemented as multiple software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, and is not limited in this respect.

Optionally, in one or more embodiments of the present specification, the terminal 101 needs to train based on a part of the model issued after the server 103 splits the original DNN model, so the terminal 101 is also referred to as a training terminal, and the training terminal may be divided into a feature training terminal and a tag training terminal according to a data type (feature data or tag data) of data existing in the training terminal, where the number of the feature training terminal and the tag training terminal may not be specifically limited.

The server 103 may be a server that provides various services. The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, to provide distributed services), or may be implemented as a single software or software module, and is not limited in particular herein.

Optionally, after the server 103 splits the original DNN model into partial models, the server 103 sends the partial models to the feature training terminal and the tag training terminal, and the partial models also exist in the server 103, when training each partial model, the data directions among the feature training terminal, the server, and the tag training terminal are that data output by the feature training terminal is used as input of the server, and data output by the server is used as input of the tag training terminal.

It should be understood that the number of terminals, networks, and servers in fig. 1 is merely illustrative, and that any number of terminals, networks, and servers may be used, as desired for an implementation.

Fig. 2 is a schematic diagram of a DNN model provided in an exemplary embodiment of the present disclosure.

The basis of the DNN model is a model of the perceptron, wherein the model of the perceptron is provided with a model with a plurality of inputs and an output, a linear relation is learned between the output and the input to obtain an intermediate output result, but the model of the perceptron can only be used for binary classification, and cannot learn a more complex nonlinear model, so that the model cannot be used in the industry.

The DNN model is expanded on the model of the perceptron, and the following three main points are summarized: firstly, a hidden layer is added, the hidden layer can have multiple layers, the expression capability of the model is enhanced, and the complexity of the hidden layer model is increased greatly. Second, the neurons of the output layer in the DNN model may have more than one output, and there may be multiple outputs, so that the model may be flexibly applied to classification regression, and other machine learning fields such as descent, clustering, and the like. Thirdly, the activation function is expanded, and the activation function of the perceptron is simple but has limited processing capability, so other complex activation functions commonly used in the neural network are provided.

From the DNN model, which is divided according to the positions of different layers, the neural network layers inside the DNN model can be divided into three types, an input layer, a hidden layer and an output layer, each of which has neurons, as shown in fig. 2, generally, the first layer is the input layer, the last layer is the output layer, and the middle layers are all hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. Although the DNN model appears complex, from a small local model, it is the same as that of the perceptron, i.e. the relationship between output and input between neurons in different layers is: a linear relationship z ═ Σ w _i x _i + b plus an activation function σ (z).

The linear relation comprises a coefficient w and a bias b, wherein the coefficient w represents the linear relation between neurons in different layers, and the bias b represents the specific position of the neuron in the layer. In the DNN model, the model needs to be trained through a forward propagation algorithm, where the forward propagation algorithm is to perform a series of linear operations and activation operations on an input value vector x by using a plurality of weight coefficient matrices W and a bias vector b, and calculate backward layer by layer from an input layer until an output result is obtained by operation to an output layer.

In the DNN model, another problem to be solved is that, assuming m training samples, where x is the input vector, the characteristic dimension is n _ in, and y is the output vector, the characteristic dimension is n _ out. We need to train a model by using the m samples, and when there is a new test sample, the output vector corresponding to the test sample can be predicted based on the model.

If a DNN model is used, even if the input layer has n _ in neurons and the output layer has n _ out neurons, and some hidden layers containing several neurons are added, then it is necessary to find the appropriate linear coefficient matrix W corresponding to all hidden layers and output layers, the bias vector b, and update the model to make the output calculated by all training sample inputs equal to or very close to the sample output as possible, then in order to find the appropriate parameters, in the DNN model, the linear coefficient matrix W and the bias vector b are determined by a back propagation algorithm, specifically, an appropriate loss function can be used to measure the output loss of the training sample, and then the loss function is optimized to obtain the minimum extremum, and the corresponding series of linear coefficient matrices W and bias vector b are the final result we need to obtain the linear coefficient matrix W, After the bias vector b, the model can be updated based on the linear coefficient matrix W, the bias vector b. In the DNN model, the process of solving the loss function optimization extremum is most commonly performed step by iteration through a gradient descent method.

In one or more embodiments of the present disclosure, the DNN model structure under different application scenarios is relatively fixed, and more training data is required to achieve better model performance. In the same field, for example, in the fields of medical treatment or finance and the like, different enterprises or institutions have different data samples, and if the data are subjected to joint training, the accuracy of the DNN model is greatly improved, and great economic benefits are brought to the enterprises. However, these raw training data contain a lot of privacy and business secrets of the user, which, once the information is compromised, would result in irreparable negative effects. Therefore, protecting data privacy while solving the data islanding problem through multi-party joint training is an important research point in recent years.

For example, in the financial field, data between institutions is often vertically sliced, i.e., sample space is the same, feature space is different, e.g., between a nationally owned bank and a folk payment institution, or between a social application and a payment application, etc. Aiming at the distributed task under the vertical scene, the whole model can be split based on the concept of split learning, wherein part of the model is calculated by a training terminal (training member), part of the model is calculated by a server, and only a hidden layer and the hidden layer gradient thereof are transmitted between the server and the training terminal.

Fig. 3 is a schematic diagram of split learning training provided in an exemplary embodiment of the present specification.

As shown in fig. 3, the original overall model is split into a plurality of partial models, and then the partial models are respectively placed in a training terminal or a server. However, since the model in the training terminal or the server also has a complete input layer, hidden layer and output layer, the model in the training terminal and the model in the server are split, and the computation graph is incomplete, so that the back propagation of the whole model cannot be automatically performed, which is a feasible implementation: and the training terminal calculates a Jacobian matrix of the hidden layer about the model parameters, and performs matrix multiplication with the gradient returned in the backward propagation to obtain the gradient of the model, so as to update the model.

However, the dimension of the jacobian matrix of the training terminal is the hidden layer dimension multiplied by the model dimension, and for large-scale data and complex models, the jacobian matrix is large in calculation amount and high in time complexity; in addition, the space complexity of the jacobian matrix is high, and the occupied memory space often exceeds the upper limit of the memory of the equipment, so that the machine of the training terminal is down, and the normal operation of model training is seriously influenced. One or more embodiments of the present disclosure provide a model training method, an apparatus, a storage medium, a server, and a terminal, which can improve the computation efficiency of a deep neural network model in a split learning training process.

Fig. 4 is a schematic flowchart of a model training method provided in an exemplary embodiment of the present disclosure.

As shown in fig. 4, the model training method is applied to the server, and the model training method includes:

s402, first output layer data output by the first model in each characteristic training terminal is obtained, forward propagation of the second model in the server is carried out on the basis of the first output layer data, and second output layer data output by the second model is obtained.

For convenience of understanding, in one or more embodiments of the present specification, a split learning training in a vertical scenario is taken as an example, and an execution subject is taken as a server to describe first. Firstly, the training terminal and the server can be divided, so that the training terminal processes the sample data in advance. For example, the number of the training terminals is k, and the training terminals may process data owned by the training terminals based on a privacy Protection Set Interaction (PSI) to obtain the same or similar data in the data.

The training terminals may be classified into feature training terminals and label training terminals according to the data type (feature data or label data) of the data in which they exist. For example, for its sample space, there is a feature data set Xi on the feature training terminal i, where { i ═ 1, 2.. k }, the label training terminal may set a single terminal, and there is a label data set corresponding to the feature data set on the label training terminal. The purpose of independently setting the label training terminal is that the label training terminal can be independently set in order to prevent privacy leakage caused by the fact that the characteristic data set and the label data set exist in the same terminal or a server due to the fact that the label data set exists on the label training terminal.

Optionally, in one or more embodiments of the present description, the DNN model may be implemented by using an open source framework, for example, the DNN model may be trained by using the open source framework for tenswaflow.

After the feature training terminals, the tag training terminals and the server are divided, the server may initialize the original DNN model, and then split the original DNN model into a plurality of partial models according to the number of the feature training terminals, the number of the servers and the number of the tag training terminals. Namely, the original DNN model is divided into first models with the same number as that of the feature training terminals, second models with the same number as that of the servers, and third models with the same number as that of the tag training terminals, each first model is issued to the feature training terminals, each second model is issued to the servers, and each third model is issued to the tag training terminals.

For convenience of description, in one or more embodiments of the present specification, the number of feature training terminals is k, and then the number of first models is k; the number of servers is 1, then the number of second models is also 1; the number of label training terminals is 1, and then the number of third models is also 1. The first model, the second model and the third model are mainly different in that the number of hidden layers in the first model, the second model and the third model is different, and the specific structure of the models can be divided according to the requirement on data privacy.

After the divided models of each part are sent to corresponding terminals or servers, the models can be trained, and because the data directions among the characteristic training terminals, the servers and the label training terminals are that the data output by the characteristic training terminals are used as the input of the servers, and the data output by the servers are used as the input of the label training terminals. Therefore, in a training iteration process of a certain time, a first model in each feature training terminal firstly reads a batch of same feature data based on a feature data set existing in each feature training terminal, each feature training terminal takes the feature data read by each feature training terminal as input layer data, then each feature training terminal conducts forward propagation of the first model in each feature training terminal based on the input layer data, the process that each first model propagates forward is that each first model conducts a series of linear operation and activation operation with the input layer data based on a plurality of weight coefficient matrixes W in each first model by biasing a vector b, the output layer data output by each first model is obtained by calculating to an output layer from the input layer and calculating backwards layer by layer, and in order to correspond the output layer data output by each first model with each first model, the output layer data output by each first model is referred to as first output layer data. After each first model outputs first output-layer data, each first model may also send the output first output-layer data to the server.

For example, in the jth training iteration process (j is 0,1,..., N), the feature training terminal i reads the same sample data of one batch of the feature data set Xi based on the open source framework, the sample size is ni, forward propagation of the first model is performed, and the first output layer data of the first model is obtained as Li, where { i is 1,2,.., k }, then the feature training terminal i transmits the first output layer data Li to the server.

The server obtains first output layer data output by the first models in all the feature training terminals, and the first output layer data are processed to serve as input layer data of the second models in the server. Optionally, the server may fuse features in the first output layer data by using methods such as averaging, summing, or a concatee function to obtain the input layer data of the second model.

After the server obtains the input layer data of the second model, the server may forward propagate the second model in the server based on the input layer data of the second model, the forward propagation of the second model is similar to the forward propagation of the first model in the feature training terminal, that is, the second model performs a series of linear operations and activation operations on the bias vector b and the input layer data based on a plurality of weight coefficient matrices W in the second model, and calculates backward layer by layer from the input layer until the output layer data is calculated to obtain the output layer data output by the second model, where in order to correspond the output layer data output by the second model to the second model, the output layer data output by the second model is referred to as second output layer data.

S404, sending the second output layer data to a label training terminal, and obtaining a first gradient of the real loss function relative to the second output layer data from the label training terminal.

After the second output layer data is output by the second model, the second model may further send the output second output layer data to the label training terminal, and after the label training terminal receives the second output layer data sent by the server, the second output layer data may be used as input layer data input by a third model in the label training terminal, and forward propagation of the third model is performed based on the input layer data input by the third model, where the forward propagation of the third model is similar to the forward propagation of the first model in the feature training terminal, that is, the third model performs a series of linear operations and activation operations with the input layer data based on a plurality of weight coefficient matrices W in the third model, and from the input layer, backward calculation layer by layer is performed until the output layer data is calculated to obtain output layer data output by the third model, where in order to correspond the output layer data output by the third model to the third model, and (3) the output layer data output by the third model is called third output layer data, and the third output layer data is also prediction data of the feature data set in the feature training terminal.

Because a real label data set corresponding to the feature data set in the feature training terminal exists in the label training terminal, the label training terminal calculates a real loss function based on the third output layer data and the label data set, and sends the real loss function to the server.

Further, after the third model calculates the real loss function, the third model may be subjected to back propagation based on a calculation graph corresponding to the third model, that is, the real loss function is optimized to obtain the minimum extremum, and then the corresponding series of linear coefficient matrices W and bias vectors b are the final results required by the user, and after the linear coefficient matrices W and the bias vectors b are obtained, the third model may be updated based on the linear coefficient matrices W and the bias vectors b.

In the DNN model, the most common process of solving the extreme value of the loss function optimization is generally completed step by step through a gradient descent method, so that the true loss function is optimized to obtain a minimum extreme value, specifically, a third model gradient of the true loss function with respect to a third model is obtained, and then a linear coefficient matrix W and a bias vector b corresponding to the third model can be determined based on the third model gradient, that is, the third model can be updated based on the third model gradient.

Further, after the DNN model is split into a plurality of partial models, the DNN model is subjected to back propagation, and in addition to the need to perform back propagation on a third model in the tag training terminal, it is also necessary to perform back propagation on a second model in the server and the first model in the feature training terminal, so that after the tag training terminal obtains a true loss function, a first gradient of the true loss function with respect to data of the second output layer needs to be obtained, and the first gradient is sent to the server, so that after the server and the feature training terminal perform correlation processing according to the first gradient, the second model in the server is subjected to back propagation and the first model in the feature training terminal is subjected to back propagation.

S406, a first simulation loss function of the second model is obtained according to the first gradient and the second output layer data, and the second model is updated based on the first simulation loss function.

After the server obtains a first gradient of a real loss function obtained by the training terminal relative to second output layer data, in the process of performing back propagation on a second model in the server, a first simulation loss function of the second model can be obtained according to the first gradient and the second output layer data, the back propagation is similar to that of a third model according to the real loss function, the second model can perform back propagation based on the first simulation loss function, namely, the first simulation loss function is optimized to obtain a minimum extreme value, then a corresponding series of linear coefficient matrixes W and bias vectors b are final results required by people, and after the linear coefficient matrixes W and the bias vectors b are obtained, the second model can be updated based on the linear coefficient matrixes W and the bias vectors b.

In the DNN model, the most common process of solving the extreme value of the loss function optimization is generally completed step by step through a gradient descent method, so that the first simulated loss function is optimized to solve the minimum extreme value, specifically, the second model gradient of the first simulated loss function with respect to the second model is obtained, and then the linear coefficient matrix W and the bias vector b corresponding to the second model can be determined based on the second model gradient, that is, the second model can be updated based on the second model gradient.

S408, calculating second gradients of the first simulation loss function relative to the first output layer data, and sending the second gradients to the feature training terminals, so that the feature training terminals update the first models based on the second gradients.

In order to implement back propagation of the first model in each feature training terminal, the server may further calculate a second gradient of the first simulation loss function with respect to each first output layer data, and send each second gradient to each feature training terminal, similar to the process of back propagation of the second model in the server, and in the process of back propagation of the first model in each feature training terminal, each feature training terminal may obtain a second simulation loss function of each first model according to the second gradient and the first output layer data output by each first model, similar to the process of back propagation of the third model according to the real loss function, each first model may perform back propagation based on each second simulation loss function, that is, each first model optimizes each second simulation loss function to obtain an extremum of minimization, so that a corresponding series of linear coefficient matrices W, and W are linear coefficients, The bias vector b is the final result required by the user, and after the linear coefficient matrix W and the bias vector b are obtained, each first model can be updated based on the linear coefficient matrix W and the bias vector b.

In the DNN model, the most common process of solving the loss function optimization extremum is generally completed step by iteration through a gradient descent method, and then each second simulation loss function is optimized to solve the minimized extremum, specifically, each first model gradient of each first model of each second simulation loss function is obtained, then the linear coefficient matrix W and the bias vector b corresponding to each first model can be determined based on each first model gradient, and each first model can be updated based on each first model gradient.

Optionally, the training process is only specific to the description of (j 0, 1.. times.n) in the jth training iteration, and repeating the above steps may complete the training iteration for a preset number of times until the DNN model converges to complete the DNN model training.

Because the simulation loss function is constructed in the back propagation process of the second model in the server and the first model in the feature training terminal, when the second model in the server and the first model in the feature training terminal are updated based on the simulation loss function, the calculation amount of the model during updating can be reduced, and the training efficiency of the model during the splitting learning training process is effectively improved.

In addition, in one or more embodiments of the present disclosure, in addition to constructing the simulation loss function, a simulation layer may be constructed, where the output of forward propagation is equal to the input, and the gradient of backward propagation is equal to the gradient transmitted back by the server or the feature training member, and then the updating of the second model in the server and the first model in the feature training terminal may also be implemented based on the gradient.

Fig. 5 is a schematic flowchart of a model training method provided in an exemplary embodiment of the present disclosure.

As shown in fig. 5, the model training method is applied to the server, and the model training method includes:

s502, first output layer data output by a first model in each characteristic training terminal is obtained, input layer data input by a second model in a server is obtained based on the first output layer data, and forward propagation of the second model is carried out based on the input layer data input by the second model.

S504, the second output layer data is sent to the label training terminal, and a first gradient of the real loss function relative to the second output layer data is obtained from the label training terminal.

Please refer to the description in step S404 for step S504.

S506, splitting the second output layer data and the first gradient based on the preset feature data quantity corresponding to each first output layer to respectively obtain a second output layer data set and a first gradient set.

After the server obtains the first gradient of the real loss function obtained by the label training terminal with respect to the second output layer data, in the process of back propagation of the second model in the server, the first simulation loss function of the second model can be obtained according to the first gradient and the second output layer data.

Specifically, in the process of obtaining the first simulation loss function, the preset feature data quantity corresponding to each first output layer, that is, the sample size read when each first model propagates forward, may be obtained first, and then the second output layer data is split based on the preset feature data quantity corresponding to each first output layer, specifically, the second output layer data is obtained by splitting according to rows, and all the second output layer data is stored in a set form, so as to obtain a second output layer data set; similarly, the first gradient may be split based on the number of preset feature data corresponding to each first output layer, specifically, the first gradient may be split by rows to obtain corresponding first gradient sub-data, and all the first gradient sub-data are stored in a set form to obtain a first gradient set.

For example, the server obtains the true loss function obtained by the label training terminal with respect to the second output layer data L _out First gradient of G _k Then, the second output layer data L is divided according to the number ni of the preset feature data corresponding to each first output layer _out Splitting by rows to obtain { l ₀ ，l ₁ ，...，l _ni Correspondingly, according to the preset feature data number ni corresponding to each first output layer, the first gradient is G _kt Split by row to get { g ₀ ，g ₁ ，...，g _ni }。

And S508, calculating a first simulation loss function of the second model based on the second output layer data set and the first gradient set.

After a second output layer data set containing second output layer sub-data and a first gradient set containing first sub-gradients are obtained, second output layer sub-data in the second output layer data set and first sub-gradients in the first gradient set can be obtained, transpositions of the first sub-gradients are calculated respectively, products of the second output layer sub-data and the transpositions of the first sub-gradients are calculated respectively, and finally a first simulation loss function of a second model is obtained according to the sum of the products.

For example, when the second output layer data L is to be applied _out Splitting by rows to obtain { l ₀ ，l ₁ ，...，l _ni H, let the first gradient be G _kt Split by row to get { g ₀ ，g ₁ ，...，g _ni Then the first simulated loss function Ls of the second model is: ls ═ Σ l _t g _t ^T Wherein, (t ═ 0, 1.., ni).

And S510, solving the gradient of the second model based on the first simulation loss function to obtain the gradient of the second model.

After the second model calculates the first simulation loss function of the second model, the second model may perform back propagation based on the first simulation loss function, that is, optimize the first simulation loss function to obtain a minimum extremum, and then the corresponding series of linear coefficient matrix W and bias vector b are the final results, and after the linear coefficient matrix W and the bias vector b are obtained, the second model may be updated based on the linear coefficient matrix W and the bias vector b.

In the DNN model, the most common process of solving the loss function optimization extremum is generally completed step by iteration through a gradient descent method, and then the first simulated loss function is optimized to solve the minimized extremum, specifically, a second model gradient of the first simulated loss function with respect to the second model is solved.

The method for obtaining the gradient of the second model may not be limited, and a feasible implementation manner is that, when the open source frame is tensoflow, the gradient of the second model may be obtained according to the first simulation loss function and a tape function of the open source frame tensoflow, so as to obtain the gradient of the second model.

And S512, updating the second model based on the second model gradient.

Since the linear coefficient matrix W and the bias vector b corresponding to the second model can be determined based on the gradient of the second model, the second model can be updated based on the gradient of the second model.

And S514, calculating a second gradient of the first simulated loss function relative to each first output layer data according to the gradient of the first simulated loss function relative to the input layer data input by the second model.

The server may calculate gradients of the first simulated loss function with respect to the input layer data at the input of the second model, with each first model corresponding to a first output layer data, and then the server may further calculate a second gradient of the first simulated loss function with respect to each first output layer data.

And S516, sending the second gradients to the feature training terminals, so that the feature training terminals update the first models based on the second gradients.

The server may send each second gradient to each feature training terminal, similar to a process of performing back propagation on the second model in the server, and in the process of performing back propagation on the first model in each feature training terminal, each feature training terminal may obtain a second simulated loss function of each first model according to the second gradient and the first output layer data output by each first model, similar to a third model performing back propagation according to a real loss function, and each first model may perform back propagation based on each second simulated loss function, that is, each first model optimizes each second simulated loss function to obtain a minimum extremum, and then a series of corresponding linear coefficient matrices W, bias vectors b are final results required by us, and after obtaining the linear coefficient matrices W and the bias vectors b, the minimum extremum may be optimized based on the linear coefficient matrices W, The bias vector b updates each first model.

In the DNN model, the most common process of solving the loss function optimization extremum is generally completed step by iteration through a gradient descent method, and then each second simulation loss function is optimized to obtain the minimized extremum, specifically, each first model gradient of each second simulation loss function with respect to each first model is obtained, then the linear coefficient matrix W and the bias vector b corresponding to each first model can be determined based on each first model gradient, and each first model can be updated based on each first model gradient.

In one or more embodiments of the present disclosure, a simulation loss function is constructed in a back propagation process of a second model in a server and a first model in a feature training terminal, so that when the second model in the server and the first model in the feature training terminal are updated based on the simulation loss function, a calculation amount for updating the models can be reduced, and training efficiency of the models in a split learning training process is effectively improved.

Fig. 6 is a flowchart illustrating a model training method according to an exemplary embodiment of the present disclosure.

As shown in fig. 6, the model training method is applied to any one of the feature training terminals in the foregoing embodiments, and the model training method includes:

s602, forward propagation of the first model in the feature training terminal is carried out based on feature data in the feature data set, and first output layer data output by the first model is obtained.

It can be understood that, for the splitting of the DNN model and the issuing of the split model, please refer to the description in the above embodiments, which is not described herein again.

After the divided models of each part are sent to corresponding terminals or servers, the models can be trained, and because the data directions among the characteristic training terminals, the servers and the label training terminals are that the data output by the characteristic training terminals are used as the input of the servers, and the data output by the servers are used as the input of the label training terminals. Therefore, in a training iteration process of a certain time, a first model in the characteristic training terminal reads a batch of same characteristic data on the basis of a characteristic data set existing in the characteristic training terminal, the characteristic training terminal takes the read characteristic data as input layer data, then the feature training terminal conducts forward propagation of a first model in the feature training terminal based on data of input layers, the forward propagation process of the first model is that the first model conducts a series of linear operation and activation operation with the data of the input layers based on a plurality of weight coefficient matrixes W and bias vectors b in the first model, backward calculation is conducted layer by layer from the input layers to the output layers, and data of the output layers output by the first model are obtained, here, in order to correspond the output layer data output by the first model to the first model, the output layer data output by the first model is referred to as first output layer data.

For example, in the jth training iteration process (j ═ 0, 1.. multidot.n), the feature training terminal i reads the same sample data of one batch in the feature data set Xi based on the open source framework, the sample size is ni, and forward propagation of the first model is performed, so that the first output layer data of the first model is obtained, where { i ═ 1, 2.. multidot.k }.

S604, sending the first output layer data to a server, and obtaining a second gradient related to the first output layer data from the server.

After the first output layer data is output by the first model, the feature training terminal can also send the output first output layer data to the server, so that the server can obtain the first output layer data output by the first model in each feature training terminal, and forward propagation of a second model in the server is carried out on the basis of each first output layer data to obtain second output layer data output by a second model; then the server sends the second output layer data to the label training terminal, and a first gradient of a real loss function about the second output layer data is obtained from the label training terminal; the server obtains a first simulation loss function of the second model according to the first gradient and the second output layer data, and updates the second model based on the first simulation loss function; and finally, the server calculates second gradients of the first simulation loss function relative to the data of the first output layers and sends the second gradients to the characteristic training terminals.

Accordingly, any one of the feature training terminals may also obtain the second gradient with respect to its corresponding first output layer data from the server.

And S606, updating the first model based on the second gradient.

After the second gradient of the first output layer data is obtained, in the process of performing back propagation on the first model in the training terminal, similar to the process of performing back propagation on the second model in the server, a second simulation loss function of the first model may be obtained according to the second gradient and the first output layer data.

Specifically, similar to obtaining the first simulation loss function of the second model according to the first gradient and the second output layer data, the preset feature data quantity corresponding to the first output layer, that is, the sample size read when each first model propagates forward, may be obtained first, then the first output layer data is split based on the preset feature data quantity corresponding to the first output layer, specifically, the first output layer data is obtained by splitting according to rows, and all the first output layer data is stored in a set form, so as to obtain a first output layer data set; similarly, the second gradient may be split based on the number of preset feature data corresponding to the first output layer, specifically, the second gradient may be split by rows to obtain corresponding second gradient sub-data, and all the second gradient sub-data are stored in a set form to obtain a second gradient set.

For example, the feature training terminal obtains the second gradient G about the first output layer data Li from the server _i Then, according to the preset feature data number ni corresponding to the first output layer, the data Li of the first output layer is split according to the rows to obtain { l } _i,0 ，l _i,1 ，...，l _i,ni Correspondingly, the second gradient G is divided into a plurality of second gradients according to the preset feature data number ni corresponding to the first output layer _i Split by row to get { g _i,0 ，g _i,1 ，...，g _i,ni }。

Finally, a second simulated loss function of the first model is calculated based on the first output layer data set and the second gradient set. Specifically, first output layer sub-data in a first output layer data set and a second sub-gradient in a second gradient set are obtained; then, respectively calculating the transpositions of the second sub-gradients, and respectively calculating the products of the sub-data of the first output layers and the transpositions of the second sub-gradients; and finally, obtaining a second simulation loss function of the first model according to the sum of the products.

For example, when the first output layer data Li is split by rows, { l } _i,0 ，l _i,1 ，...，l _i,ni H, applying a second gradient G _i Split by row to get { g _i,0 ，g _i,1 ，...，g _i,ni Then, the second simulation loss function Li of the first model is: li ═ Σ l _i，t g _i，t ^T Wherein, (t ═ 0, 1.., ni).

After the second simulation loss function of the first model is obtained, the first model can perform back propagation based on the second simulation loss function, that is, the second simulation loss function is optimized to obtain a minimized extreme value, then the corresponding series of linear coefficient matrices W and bias vectors b are the final results required by us, and after the linear coefficient matrices W and the bias vectors b are obtained, the first model can be updated based on the linear coefficient matrices W and the bias vectors b.

In the DNN model, the most common process of solving the loss function optimization extremum is generally completed step by iteration through a gradient descent method, and then the second simulated loss function is optimized to solve the minimized extremum, specifically, the first model gradient of the second simulated loss function with respect to the first model is solved.

The method for obtaining the gradient of the second model may not be limited, and a feasible implementation manner is that, when the open source frame is tensoflow, the gradient of the first model may be obtained according to the second simulation loss function and a tape function of the open source frame tensoflow, so as to obtain the gradient of the first model. Since the linear coefficient matrix W and the bias vector b corresponding to the first model can be determined based on the gradient of the first model, the first model can be updated based on the gradient of the first model.

One or more embodiments of the present specification provide a type training method, where forward propagation of a first model in a feature training terminal is performed based on feature data in a feature data set, so as to obtain first output layer data output by the first model; sending the first output layer data to a server, and obtaining a second gradient for the first output layer data from the server; the first model is updated based on the second gradient. After the first gradient of the real loss function relative to the second output layer data is obtained, the first simulation loss function of the second model can be obtained, the second model is updated based on the first simulation loss function, the first model is updated, the calculated amount of the model during updating can be reduced, and the training efficiency of the model in the splitting learning training process is effectively improved.

Fig. 7 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment of the present disclosure.

As shown in fig. 7, the model training apparatus 700 is applied to a server, and the model training apparatus 700 includes:

the server propagation module 710 is configured to obtain first output layer data output by the first model in each feature training terminal, and perform forward propagation of the second model in the server based on each first output layer data to obtain second output layer data output by the second model.

A first gradient obtaining module 720, configured to send the second output layer data to the tag training terminal, and obtain a first gradient of the real loss function with respect to the second output layer data from the tag training terminal;

and a second model updating module 730, configured to obtain a first simulation loss function of the second model according to the first gradient and the second output layer data, and update the second model based on the first simulation loss function.

The second gradient sending module 740 is configured to calculate second gradients of the first simulation loss function with respect to each first output layer data, and send each second gradient to each feature training terminal, so that each feature training terminal updates each first model based on each second gradient.

Optionally, the second model updating module 730 is further configured to send each second gradient to each feature training terminal, so that each feature training terminal obtains a second simulation loss function of each first model according to each second gradient and the first output layer data output by each first model, and updates each first model based on each second simulation loss function.

Optionally, the second model updating module 730 is further configured to split the second output layer data and the first gradient based on the number of preset feature data corresponding to each first output layer, so as to obtain a second output layer data set and a first gradient set, respectively; a first simulated loss function for the second model is calculated based on the second set of output layer data and the first set of gradients.

Optionally, the second model updating module 730 is further configured to obtain second output layer sub-data in the second output layer data set and a first sub-gradient in the first gradient set; calculating transpositions of the first sub-gradients respectively, and calculating products of the second output layer sub-data and the transpositions of the first sub-gradients respectively; and obtaining a first simulation loss function of the second model according to the sum of the products.

Optionally, the second model updating module 730 is further configured to obtain a second model gradient of the second model by performing gradient calculation on the second model based on the first simulation loss function; the second model is updated based on the second model gradient.

Optionally, the server propagation module 710 is further configured to obtain input layer data input by the second model in the server based on each first output layer data, and perform forward propagation of the second model based on the input layer data input by the second model.

Optionally, the second gradient sending module 740 is further configured to calculate a second gradient of the first simulated loss function with respect to each first output layer data according to the gradient of the first simulated loss function with respect to the input layer data of the second model input.

Optionally, the first gradient obtaining module 720 is further configured to send the second output layer data to the tag training terminal, so that the tag training terminal performs forward propagation of a third model in the tag training terminal based on the second output layer data, and obtains third output layer data output by the third model, and so that the tag training terminal calculates a true loss function based on the third output layer data and the tag data set, calculates a first gradient of the true loss function with respect to the second output layer data, and sends the first gradient to the server.

One or more embodiments of the present specification provide a model training apparatus including: the server propagation module is used for acquiring first output layer data output by the first model in each feature training terminal and performing forward propagation of the second model in the server based on each first output layer data to obtain second output layer data output by the second model; the first gradient acquisition module is used for sending the second output layer data to the label training terminal and acquiring a first gradient of a real loss function relative to the second output layer data from the label training terminal; the second model updating module is used for obtaining a first simulation loss function of the second model according to the first gradient and the second output layer data and updating the second model based on the first simulation loss function; and the second gradient sending module is used for calculating second gradients of the first simulation loss functions relative to the data of the first output layers and sending the second gradients to the feature training terminals so that the feature training terminals update the first models based on the second gradients.

After the first gradient of the real loss function relative to the second output layer data is obtained, the first simulation loss function of the second model can be obtained, the second model is updated based on the first simulation loss function, the first model is updated, the calculated amount of the model during updating can be reduced, and the training efficiency of the model in the splitting learning training process is effectively improved.

Fig. 8 is a schematic structural diagram of a model training apparatus according to an exemplary embodiment of the present disclosure.

As shown in fig. 8, the model training apparatus 800 is applied to a feature training terminal, and the model training apparatus 800 includes:

the terminal forward propagation module 810 is configured to perform forward propagation of a first model in the feature training terminal based on feature data in the feature data set, so as to obtain first output layer data output by the first model.

A second gradient retrieval module 820 for sending the first output layer data to the server and retrieving a second gradient in respect of the first output layer data from the server.

A first model updating module 830 for updating the first model based on the second gradient.

The second gradient is that the server forwards transmits a second model in the server based on first output layer data to obtain second output layer data output by the second model, the second output layer data is sent to the label training terminal, a first gradient of a real loss function relative to the second output layer data is obtained from the label training terminal, a first simulation loss function of the second model is obtained according to the first gradient and the second output layer data, and the gradient of the first simulation loss function relative to the first output layer data is calculated.

Optionally, the first model updating module 830 is further configured to obtain a second simulation loss function of the first model according to the second gradient and the first output layer data; the first model is updated based on the second simulated loss function.

Optionally, the first model updating module 830 is further configured to split the first output layer data and the second gradient based on a preset feature data amount corresponding to the first output layer data, so as to obtain a first output layer data set and a second gradient set, respectively; a second simulated loss function for the first model is calculated based on the first set of output layer data and the second set of gradients.

Optionally, the first model updating module 830 is further configured to obtain first output layer sub-data in the first output layer data set and a second sub-gradient in the second gradient set; respectively calculating the transpositions of the second sub-gradients, and respectively calculating the products of the first output layer sub-data and the transpositions of the second sub-gradients; and obtaining a second simulation loss function of the first model according to the sum of the products.

Optionally, the first model updating module 830 is further configured to calculate a gradient of the first model based on the second simulation loss function, so as to obtain a first model gradient of the first model; the first model is updated based on the first model gradient.

One or more embodiments of the present specification provide a model training apparatus including: the terminal forward propagation module is used for performing forward propagation of a first model in the feature training terminal based on feature data in the feature data set to obtain first output layer data output by the first model; a second gradient acquisition module to send the first export layer data to a server and to acquire a second gradient in relation to the first export layer data from the server; a first model update module to update the first model based on the second gradient. After the first gradient of the real loss function relative to the second output layer data is obtained, the first simulation loss function of the second model can be obtained, the second model is updated based on the first simulation loss function, the first model is updated, the calculated amount of the model during updating can be reduced, and the training efficiency of the model in the splitting learning training process is effectively improved.

One or more embodiments of the present specification also provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method of any of the above-described embodiments.

Further, please refer to fig. 9, where fig. 9 is a schematic structural diagram of a server according to an exemplary embodiment of the present disclosure. As shown in fig. 9, the server 900 may include: at least one processor 901, at least one network interface 904, a user interface 903, memory 905, at least one communication bus 902.

Wherein a communication bus 902 is used to enable connective communication between these components.

The user interface 903 may also include a standard wired interface or a wireless interface.

The network interface 904 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Processor 901 may include one or more processing cores, among other things. The processor 901 connects various portions within the overall server 900 using various interfaces and lines, performs various functions of the server 900 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 905, and calling data stored in the memory 905. Optionally, the processor 901 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 901 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 901, but may be implemented by a single chip.

The Memory 905 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 905 includes a non-transitory computer-readable medium. The memory 905 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 905 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described method embodiments, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 905 may optionally be at least one memory device located remotely from the processor 901. As shown in fig. 9, the memory 905, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a model training program.

In the server 900 shown in fig. 9, the user interface 903 is mainly used for providing an input interface for a user to obtain data input by the user; and the processor 901 may be configured to invoke the model training program stored in the memory 905, and specifically perform the following operations:

acquiring first output layer data output by a first model in each characteristic training terminal, and carrying out forward propagation on a second model in a server based on the first output layer data to obtain second output layer data output by the second model; sending the second output layer data to a label training terminal, and acquiring a first gradient of a real loss function relative to the second output layer data from the label training terminal; obtaining a first simulation loss function of the second model according to the first gradient and the second output layer data, and updating the second model based on the first simulation loss function; and calculating second gradients of the first simulation loss functions relative to the data of the first output layers, and sending the second gradients to the feature training terminals, so that the feature training terminals update the first models based on the second gradients.

Optionally, sending each second gradient to each feature training terminal, so that each feature training terminal updates each first model based on each second gradient, including: and sending each second gradient to each feature training terminal so that each feature training terminal obtains a second simulation loss function of each first model according to each second gradient and the first output layer data output by each first model, and updating each first model based on each second simulation loss function.

Optionally, deriving a first simulated loss function for the second model from the first gradient and the second output layer data comprises: splitting the second output layer data and the first gradient based on the number of preset feature data corresponding to each first output layer to respectively obtain a second output layer data set and a first gradient set; a first simulated loss function for the second model is calculated based on the second set of output layer data and the first set of gradients.

Optionally, calculating a first simulated loss function of the second model based on the second set of output layer data and the first set of gradients comprises: acquiring second output layer sub-data in a second output layer data set and a first sub-gradient in a first gradient set; calculating transpositions of the first sub-gradients respectively, and calculating products of the second output layer sub-data and the transpositions of the first sub-gradients respectively; and obtaining a first simulation loss function of the second model according to the sum of the products.

Optionally, updating the second model based on the first simulated loss function includes: solving the gradient of the second model based on the first simulation loss function to obtain a second model gradient of the second model; the second model is updated based on the second model gradient.

Optionally, the forward propagation of the second model in the server based on each first output layer data includes: and obtaining input layer data input by a second model in the server based on the first output layer data, and carrying out forward propagation of the second model based on the input layer data input by the second model.

Optionally, calculating a second gradient of the first simulated loss function with respect to each of the first output layer data comprises: a second gradient of the first simulated loss function with respect to each of the first output layer data is calculated based on the gradients of the first simulated loss function with respect to the input layer data of the second model input.

Optionally, sending the second output layer data to the tag training terminal includes: and sending the second output layer data to a label training terminal, so that the label training terminal forwards propagates a third model in the label training terminal based on the second output layer data, obtains third output layer data output by the third model, calculates a real loss function based on the third output layer data and a label data set, calculates a first gradient of the real loss function relative to the second output layer data, and sends the first gradient to a server.

Further, please refer to fig. 10, where fig. 10 is a schematic structural diagram of a terminal according to an exemplary embodiment of the present disclosure. As shown in fig. 10, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.

Wherein a communication bus 1002 is used to enable connective communication between these components.

The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.

The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Processor 1001 may include one or more processing cores, among other things. Processor 1001 interfaces various parts throughout terminal 1000 using various interfaces and lines, and performs various functions of terminal 1000 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in memory 1005, and calling data stored in memory 1005. Alternatively, the processor 1001 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 1001 may integrate one or a combination of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 1001, but may be implemented by a single chip.

The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 10, a memory 1005, which is one type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a model training program therein.

In the terminal 1000 shown in fig. 10, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and processor 1001 may be configured to invoke a model training program stored in memory 1005 and perform the following operations:

forward propagation of a first model in the feature training terminal is carried out based on feature data in the feature data set, and first output layer data output by the first model are obtained; sending the first output layer data to a server, and obtaining a second gradient for the first output layer data from the server; updating the first model based on the second gradient; the second gradient is that the server forwards transmits a second model in the server based on first output layer data to obtain second output layer data output by the second model, the second output layer data is sent to the label training terminal, a first gradient of a real loss function relative to the second output layer data is obtained from the label training terminal, a first simulation loss function of the second model is obtained according to the first gradient and the second output layer data, and the gradient of the first simulation loss function relative to the first output layer data is calculated.

Optionally, updating the first model based on the second gradient comprises: obtaining a second simulation loss function of the first model according to the second gradient and the first output layer data; the first model is updated based on the second simulated loss function.

Optionally, deriving a second simulated loss function for the first model from the second gradient and the first output layer data comprises: splitting the first output layer data and the second gradient based on the preset characteristic data quantity corresponding to the first output layer data to respectively obtain a first output layer data set and a second gradient set; a second simulated loss function for the first model is calculated based on the first set of output layer data and the second set of gradients.

Optionally, calculating a second simulated loss function of the first model based on the first set of output layer data and the second set of gradients comprises: acquiring first output layer sub-data in a first output layer data set and a second sub-gradient in a second gradient set; respectively calculating the transpositions of the second sub-gradients, and respectively calculating the products of the first output layer sub-data and the transpositions of the second sub-gradients; and obtaining a second simulation loss function of the first model according to the sum of the products.

Optionally, updating the first model based on a second simulated loss function includes: solving the gradient of the first model based on a second simulation loss function to obtain a first model gradient of the first model; the first model is updated based on the first model gradient.

One or more embodiments of the present specification further provide a model training system, which includes any one or more of the servers and any one or more of the terminals.

In one or more embodiments of the present disclosure, it should be understood that the disclosed apparatus and methods may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions described above in accordance with the embodiments of this specification are all or partially performed when the computer program instructions described above are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more of the available media. The usable medium may be a magnetic medium (e.g., a flexible Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a Digital Versatile Disk (DVD)), a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It should be noted that, for simplicity and convenience of description, the foregoing method embodiments are described as a series of acts, but it should be understood by those skilled in the art that one or more of the embodiments of the present disclosure are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiment or embodiments of the present disclosure. Further, those of skill in the art will recognize that the embodiments described in this specification are presently preferred embodiments and that acts or modules are not necessarily required in any particular embodiment or embodiments of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the above description of the model training method, apparatus, storage medium, server, terminal and system provided in one or more embodiments of the present disclosure, for those skilled in the art, according to the idea of one or more embodiments of the present disclosure, there may be changes in the specific implementation and application scope, and in summary, the content of the present disclosure should not be construed as a limitation to one or more embodiments of the present disclosure.

Claims

1. A model training method is applied to a server and comprises the following steps:

2. The method according to claim 1, wherein the sending each second gradient to each feature training terminal so that each feature training terminal updates each first model based on each second gradient comprises:

and sending each second gradient to each feature training terminal so that each feature training terminal obtains a second simulation loss function of each first model according to each second gradient and the first output layer data output by each first model, and updating each first model based on each second simulation loss function.

3. The method of claim 1, the developing a first simulated loss function for the second model from the first gradient and the second output layer data comprising:

splitting the second output layer data and the first gradient based on the number of preset feature data corresponding to each first output layer to respectively obtain a second output layer data set and a first gradient set;

a first simulated loss function for the second model is calculated based on the second set of output layer data and the first set of gradients.

4. The method of claim 3, the calculating a first simulated loss function for the second model based on the second set of output layer data and the first set of gradients comprising:

acquiring second output layer sub-data in the second output layer data set and a first sub-gradient in the first gradient set;

calculating transpositions of the first sub-gradients respectively, and calculating products of the second output layer sub-data and the transpositions of the first sub-gradients respectively;

and obtaining a first simulation loss function of the second model according to the sum of the products.

5. The method of any of claims 1 to 4, the updating the second model based on the first simulated loss function, comprising:

obtaining a second model gradient of the second model by solving the gradient of the second model based on the first simulation loss function;

updating the second model based on the second model gradient.

6. The method of claim 1, the propagating forward of the second model in the server based on the respective first output layer data, comprising:

and obtaining input layer data input by a second model in the server based on the first output layer data, and carrying out forward propagation on the second model based on the input layer data input by the second model.

7. The method of claim 6, the calculating a second gradient of the first simulated loss function with respect to each first output layer data, comprising:

and calculating a second gradient of the first simulated loss function relative to each first output layer data according to the gradient of the first simulated loss function relative to the input layer data input by the second model.

8. The method of claim 1, wherein sending the second output layer data to a label training terminal comprises:

and sending the second output layer data to a label training terminal, so that the label training terminal forwards propagates a third model in the label training terminal based on the second output layer data, obtains third output layer data output by the third model, calculates a real loss function based on the third output layer data and a label data set, calculates a first gradient of the real loss function relative to the second output layer data, and sends the first gradient to the server.

9. A model training method is applied to a feature training terminal and comprises the following steps:

sending the first export layer data to a server and obtaining a second gradient in relation to the first export layer data from the server;

updating the first model based on the second gradient;

10. The method of claim 9, the updating the first model based on the second gradient, comprising:

obtaining a second simulation loss function of the first model according to the second gradient and the first output layer data;

updating the first model based on the second simulated loss function.

11. The method of claim 10, the developing a second simulated loss function for the first model from the second gradient and the first output layer data, comprising:

splitting the first output layer data and the second gradient based on the preset characteristic data quantity corresponding to the first output layer data to respectively obtain a first output layer data set and a second gradient set;

a second simulated loss function for the first model is calculated based on the first set of output layer data and the second set of gradients.

12. The method of claim 11, the calculating a second simulated loss function for the first model based on the first set of output layer data and the second set of gradients, comprising:

acquiring first output layer sub-data in the first output layer data set and second sub-gradients in the second gradient set;

calculating transpositions of the second sub-gradients respectively, and calculating products of the first output layer sub-data and the transpositions of the second sub-gradients respectively;

and obtaining a second simulation loss function of the first model according to the sum of the products.

13. The method of any of claims 9 to 12, the updating the first model based on the second simulated loss function, comprising:

calculating a gradient of the first model based on the second simulation loss function to obtain a first model gradient of the first model;

updating the first model based on the first model gradient.

14. A model training device applied to a server comprises:

the server transmission module is used for acquiring first output layer data output by a first model in each feature training terminal, and performing forward transmission of a second model in the server based on each first output layer data to obtain second output layer data output by the second model;

15. A model training device is applied to a feature training terminal and comprises:

16. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to carry out the method steps according to any one of claims 1 to 8 or 9 to 13.

17. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method according to any of claims 1 to 8 or 9 to 13.

18. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of the method according to any of claims 1 to 8.

19. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program implementing the steps of the method according to any of claims 9 to 13.

20. A model training system comprises a server and a terminal, wherein: the server is the server of claim 17 and the terminal is the terminal of claim 18.