WO2022193432A1 - Model parameter updating method, apparatus and device, storage medium, and program product - Google Patents

Model parameter updating method, apparatus and device, storage medium, and program product Download PDF

Info

Publication number
WO2022193432A1
WO2022193432A1 PCT/CN2021/094936 CN2021094936W WO2022193432A1 WO 2022193432 A1 WO2022193432 A1 WO 2022193432A1 CN 2021094936 W CN2021094936 W CN 2021094936W WO 2022193432 A1 WO2022193432 A1 WO 2022193432A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
parameter
local
loss
gradient value
Prior art date
Application number
PCT/CN2021/094936
Other languages
French (fr)
Chinese (zh)
Inventor
梁新乐
刘洋
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022193432A1 publication Critical patent/WO2022193432A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the technical field of machine learning, and in particular, to a method, apparatus, device, storage medium and program product for updating model parameters.
  • vertical federated learning vertical federated learning is to take out the part of the users and data that have the same user but different user data characteristics to jointly train the machine learning model when the data features of the participants overlap less and the users overlap more.
  • the party with the label data needs to communicate with other parties multiple times to transmit the intermediate results required by the other party to update the parameters, such as the model output or the gradient value corresponding to the model output. Both parties need to perform multiple rounds of joint parameter update, that is, multiple communications are required, so the communication cost is relatively large.
  • a scheme is proposed in which participants use an intermediate result sent by other participants to perform multiple rounds of local iterations locally. By increasing the number of local iterations, the number of jointly updating parameters is reduced, thereby reducing communication costs.
  • the main purpose of this application is to provide a model parameter updating method, apparatus, device, storage medium and program product, aiming at the problem that communication cost and model performance are difficult to balance in the current vertical federated learning scheme.
  • the present application provides a method for updating model parameters.
  • the method is applied to a first device participating in vertical federated learning, and the first device is communicatively connected to a second device participating in vertical federated learning.
  • the method includes: The following steps:
  • the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter values in the preset historical rounds of local iterations the amount of change;
  • the parameters are updated using the gradient values to complete the current round of local iterations.
  • the present application provides a user risk prediction method, the method is applied to a first device participating in longitudinal federated learning, the first device is communicatively connected to a second device participating in longitudinal federated learning, and the method includes: The following steps:
  • the local-end risk prediction model is obtained by jointly performing longitudinal federated learning with the second device based on the near-end optimization loss, wherein the near-end optimization loss represents the comparison of the parameter values of the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in the local iteration of the preset historical round;
  • the risk value of the user to be predicted is obtained by using the local end risk prediction model to predict.
  • the present application provides a model parameter updating device, the device is deployed in a first device participating in the vertical federated learning, the first device is communicatively connected with the second device participating in the vertical federated learning, and the device includes: :
  • a first calculation module configured to calculate a near-end optimization loss, wherein the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter value in the preset historical round The amount of change in the parameter value in the next local iteration;
  • the second calculation module is configured to calculate the corresponding parameter based on the near-end optimization loss, the model output of the first model in this local iteration, and the longitudinal federation intermediate result received from the second device the gradient value of ;
  • An update module configured to update the parameter by using the gradient value to complete the current round of local iteration.
  • the present application provides a user risk prediction device, the device is deployed in a first device participating in vertical federated learning, the first device is in communication connection with a second device participating in vertical federated learning, and the device includes: :
  • the federated learning module is used to jointly perform vertical federated learning with the second device based on the near-end optimization loss to obtain a local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in compared to the parameter value in the local iteration of the preset historical round;
  • a prediction module configured to use the local risk prediction model to predict and obtain the risk value of the user to be predicted.
  • the present application also provides a model parameter update device
  • the model parameter update device includes: a memory, a processor, and a model parameter update program stored on the memory and running on the processor, The model parameter update program, when executed by the processor, implements the steps of the model parameter update method described above.
  • the present application also provides a user risk prediction device
  • the user risk prediction device includes: a memory, a processor, and a user risk prediction program stored on the memory and running on the processor, The user risk prediction program implements the steps of the user risk prediction method described above when executed by the processor.
  • the present application also proposes a computer-readable storage medium, where a model parameter update program is stored on the computer-readable storage medium, and when the model parameter update program is executed by a processor, the above-mentioned Steps of the model parameter update method.
  • the present application also proposes a computer-readable storage medium, where a user risk prediction program is stored on the computer-readable storage medium, and when the user risk prediction program is executed by a processor, the above-mentioned Steps of the User Risk Prediction Method.
  • the present application also proposes a computer program product, including a computer program, which implements the steps of the above-mentioned model parameter updating method when the computer program is executed by a processor.
  • the present application also proposes a computer program product, including a computer program, which implements the steps of the above-mentioned user risk prediction method when the computer program is executed by a processor.
  • the federal intermediate result calculates the gradient value corresponding to the parameter in the first model, and updates the parameter according to the gradient value, that is, increases the near-end optimization loss to constrain the variation of the parameter of the first model in the local iteration, thereby avoiding the parameter value during the local iteration.
  • Excessive changes lead to distortion, which can reduce the communication cost by increasing the number of local iterations, and at the same time ensure the prediction accuracy of the model.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of the first embodiment of the model parameter updating method of the present application
  • FIG. 3 is a schematic diagram of updating joint parameters by a participant involved in an embodiment of the present application.
  • FIG. 4 is a hardware architecture diagram of vertical federated learning performed by a first device and a second device involved in an embodiment of the application;
  • FIG. 5 is a schematic diagram of an interaction flow of multiple rounds of joint parameter update between a first device and a second device according to an embodiment of the present application
  • FIG. 6 is a schematic diagram of functional modules of a preferred embodiment of a model parameter updating device of the present application.
  • FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.
  • the device for updating model parameters in the embodiment of the present application may be devices such as smart phones, personal computers, and servers, which are not specifically limited here, and the device for updating model parameters may be the first device participating in vertical federated learning.
  • the model parameter updating device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 .
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
  • the device structure shown in FIG. 1 does not constitute a limitation on the model parameter updating device, and may include more or less components than those shown in the figure, or combine some components, or arrange different components .
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module and a model parameter updating program.
  • the operating system is a program that manages and controls the hardware and software resources of the device, and supports the operation of the model parameter update program and other software or programs.
  • the user interface 1003 is mainly used for data communication with the client;
  • the network interface 1004 is mainly used to establish a communication connection with the second device participating in the vertical federated learning;
  • the processor 1001 can be used to call the memory 1005
  • the model parameter updater stored in and does the following:
  • the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter values in the preset historical rounds of local iterations the amount of change;
  • the parameters are updated using the gradient values to complete the current round of local iterations.
  • the near-end optimization loss is calculated, wherein the near-end optimization loss represents that the parameter values of the parameters of the first model in the first device in the current round of local iterations are compared with the values in the preset historical rounds.
  • the steps for changing the amount of parameter value in the local iteration include:
  • the parameter vector of the parameter of the first model in the first device in the current round of local iteration and the parameter vector in the local iteration of the preset historical round are subtracted by corresponding elements to obtain a difference vector;
  • a sum of squares of each element in the difference vector is calculated, and the near-end optimization loss is obtained based on the sum of squares.
  • the vertical federation intermediate result is the output of the model in the second device
  • the step of calculating the gradient value corresponding to the parameter based on the proximal optimization loss, the model output of the first model in the current round of local iterations, and the longitudinal federated intermediate result received from the second device include:
  • a total loss is obtained by adding the prediction loss and the near-end optimization loss, and a gradient value corresponding to the parameter is calculated based on the total loss.
  • the vertical federation intermediate result is the predicted loss in the second device relative to the first device sent during the current round of joint parameter update.
  • the step of calculating the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the model in the current round of local iterations, and the longitudinal federated intermediate result received from the second device includes:
  • a second sub-gradient value of the proximal optimization loss relative to the parameter is calculated, and the first sub-gradient value and the second sub-gradient value are added to obtain a gradient value corresponding to the parameter.
  • the step of adding the first sub-gradient value and the second sub-gradient value to obtain the gradient value corresponding to the parameter includes:
  • the gradient value corresponding to the parameter is obtained by multiplying the second sub-gradient value by a preset adjustment coefficient and then adding the first sub-gradient value.
  • the embodiment of the present application also proposes a user risk prediction device, the user risk prediction device is a first device participating in longitudinal federated learning, the first device establishes a communication connection with a second device participating in longitudinal federated learning, and the user risk prediction device
  • the device includes: a memory, a processor, and a user risk prediction program stored on the memory and executable on the processor, where the user risk prediction program is executed by the processor to implement the following steps:
  • the local-end risk prediction model is obtained by jointly performing longitudinal federated learning with the second device based on the near-end optimization loss, wherein the near-end optimization loss represents the comparison of the parameter values of the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in the local iteration of the preset historical round;
  • the risk value of the user to be predicted is obtained by using the local end risk prediction model to predict.
  • the step of jointly performing vertical federated learning with the second device based on the near-end optimization loss to obtain the local-end risk prediction model includes:
  • the local to-be-trained model after updating the parameters is used as the local-end risk prediction model
  • the step of locally iteratively updating the parameters in the model to be trained at the local end with a preset number of rounds based on the near-end optimization loss and the vertical federated intermediate result includes:
  • FIG. 2 is a schematic flowchart of a first embodiment of a method for updating model parameters of the present application. It should be noted that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be performed in an order different from that herein.
  • the model parameter update method of the present application is applied to a first device participating in vertical federated learning, the first device is connected to the second device participating in vertical federated learning, and the first device and the second device can be devices such as smart phones, personal computers, and servers.
  • the model parameter updating method includes:
  • Step S10 calculating a near-end optimization loss, wherein the near-end optimization loss represents the parameter values of the parameters of the first model in the first device in this round of local iterations compared to local iterations in preset historical rounds The amount of change in the parameter value;
  • the participants in the vertical federated learning are divided into two categories: one is a data application participant with labeled data, and the other is a data provider without labeled data.
  • a data application participant There is one, and there are one or more data-providing parties.
  • Each participant deploys a data set and a machine learning model based on their respective data features, and the machine learning models of each participant are combined to form a complete model, which is used to complete model tasks such as prediction or classification.
  • the sample dimensions of the data sets of each participant are aligned, that is, the sample IDs of each data set are the same, but the data characteristics of each participant may be different.
  • Each participant may use the encrypted sample alignment method in advance to construct a sample dimension-aligned data set, which will not be described in detail here.
  • the machine learning models deployed by the participants can be ordinary machine learning models, such as linear regression models, neural network models, etc., or models used in automatic machine learning, such as search networks.
  • the search network refers to the network used for model parameter update (NAS); the search network includes multiple units, each unit corresponds to a network layer, and some units are provided with connection operations. Taking two units as an example, The connection operation before these two units can be preset N connection operations, and the corresponding weight of each connection operation is defined. The weight is the structural parameter of the search network, and the network layer parameters in the unit are the model parameters of the search network.
  • model parameters need to be updated to optimize and update structural parameters and model parameters, and the final network structure can be determined based on the final updated structural parameters, that is, which connection operation or operations to retain. Since the structure of the network is determined after a network search, each participant does not need to set the network structure of the model like designing a traditional vertical federated learning model, thus reducing the difficulty of designing the model.
  • the first device may be a data application participant with tag data
  • the second device may be a data provider without tag data
  • the model in the first device is referred to as the first model
  • the model in the second device is referred to as the second model.
  • the parameters of the model in each participant are initialized and set in advance, and each participant performs multiple rounds of joint parameter update to continuously update the parameters in their respective models and improve the performance of the entire model, such as the prediction accuracy.
  • the parameters updated in each round of joint parameter update process are model parameters, such as weight parameters in a neural network.
  • the parameters updated in each round of joint parameter update may be structural parameters and/or model parameters.
  • the update sequence of the structural parameters and the model parameters is not limited.
  • the structural parameters can be updated in the first few rounds of joint parameter updating, and the model parameters can be updated in the subsequent rounds of joint parameter updating.
  • structural parameters and model parameters may be updated together in each round of joint parameter update.
  • each participant firstly interacts with the intermediate results used to update the parameters in their respective models (hereinafter also referred to as the intermediate results of vertical federation); A round of local iteration is performed, and after the local iteration, the next round of joint parameter update is performed. That is, during a round of joint parameter update process, a participant only receives an intermediate result sent by other participants, and subsequent rounds of local iterations use the intermediate result to participate in the calculation.
  • the intermediate result can be the gradient or the output of the model.
  • the intermediate result sent to the other party can be the output of the model in the participant; when the participant is a data application participant, the intermediate result sent to the other party can be calculated.
  • Party K is a data application participant
  • Party 1 to Party K-1 are data providing participants
  • Net K is a data application participant
  • Net j is the model deployed in the data provider participant
  • Net c is the model deployed in the data application participant to calculate the prediction result (Y out ) based on the model output of each party
  • N j is the output of the model
  • the first device may calculate the proximal optimization loss while performing one round of local iterations.
  • the near-end optimization loss can represent the amount of change between the parameter values of the first model in the current round of local iterations and the parameter values in the preset historical rounds of local iterations.
  • By minimizing the near-end optimization loss it is possible to constrain the variation range of the parameter values of the first model in the current round of local iterations compared to the previous historical parameter values, that is, to make the parameter values of the first model change during the current round of local iterations smaller to avoid distortion of parameter values after more rounds of local iterations.
  • the calculation method of the near-end optimization loss is not limited, and the method for minimizing the near-end optimization loss is also not limited.
  • the near-end optimization loss can be used as a loss function, and the near-end optimization loss can be minimized by a method of minimizing the loss function, such as calculating the difference between the near-end optimization loss and the parameters in the first model by a gradient descent algorithm. Gradient value, optimize parameters according to the gradient value, and then achieve the purpose of minimizing the loss of proximal optimization.
  • other methods can also be used to minimize the proximal optimization loss. For example, by randomly changing the parameter values of the parameters in the first model, it is calculated whether the proximal loss function becomes smaller, and a random experiment is performed to obtain the minimized proximal loss. Parameter value for optimization loss.
  • the preset historical round may be a round preset in the first device, and the round is earlier than the round of the local iteration of the current round. If the local iteration of the current round is the t-th round, the predetermined historical round less than t.
  • the preset historical round may be fixed, that is, each round of local iteration in this round of joint parameter update is calculated based on the parameter values of the same historical round of local iteration.
  • the end optimization loss for example, the preset historical round is fixed to 1 to ensure that the parameter values in the subsequent rounds of local iterations are smaller than the parameter values in the first round of local iterations.
  • the preset historical round may not be fixed, that is, for each round of local iterations in this round of joint parameter update, different preset historical rounds can be set;
  • the historical preset round is set separately for the round of local iteration, if the near-end optimization loss is calculated based on the previous local iteration of this round of local iteration, the gradient value of the parameter calculated according to the near-end optimization loss may be 0. That is to say, the near-end optimization loss does not play a role as a constraint. Therefore, in a preferred embodiment, for all rounds or partial rounds of local iterations, the rounds of this round of local iterations are subtracted from the corresponding preset historical rounds. Should be greater than 1.
  • the first device may not need to calculate the near-end optimization loss for each round of local iterations. For example, there is no local iteration of historical rounds in the first round of local iteration, so there is no need to calculate the near-end optimization. loss.
  • step S10 includes:
  • Step S101 the parameter vector of the parameter of the first model in the first device in the current round of local iteration and the parameter vector in the local iteration of the preset historical round are subtracted by corresponding elements to obtain a difference vector;
  • Step S102 Calculate the sum of squares of each element in the difference vector, and obtain the near-end optimization loss based on the sum of squares.
  • the first device can perform the corresponding element subtraction of the parameter vector of the parameters of the first model in the local iteration of the current round and the parameter vector of the local iteration of the preset historical round to obtain a difference vector formed by each difference value, and then Computes the sum of squares of the elements in the difference vector.
  • the first device may directly use the sum of squares as the proximal optimization loss, or may calculate the square root of the sum of squares as the proximal optimization loss. It should be noted that, in the process of calculating the near-end optimization loss, the first device participates in the calculation by using each element of the parameter vector in the current round of local iterations as unknown variables.
  • the first device may also use other calculation methods capable of calculating the variation between vectors to calculate the near-end optimization loss.
  • Step S20 calculating the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the first model in the current round of local iteration, and the longitudinal federation intermediate result received from the second device;
  • the first device inputs the training data in the data set into the first model, and the model output is obtained after processing by the first model.
  • the vertical federation intermediate result received from the second device is the intermediate result sent by the second device during the current round of joint parameter update.
  • the first device is a data application participant with label data
  • the vertical federation intermediate result is the output of the second model sent by the second device, and the first device can obtain the data according to the vertical federation intermediate result and the current local iteration.
  • the model output calculates the predicted loss; in one embodiment, the first device can add the near-end optimization loss and the predicted loss to obtain a total loss, and then calculate the gradient value of the total loss relative to the parameters in the first model; in another In an embodiment, the first device separately calculates the gradient values of the proximal optimization loss and the prediction loss relative to the parameters in the first model, and adds the two gradient values to obtain the final gradient value.
  • the vertical federated intermediate result is the gradient value of the predicted loss calculated by the second device relative to the output of the first model ;
  • the first device can calculate the gradient value of the predicted loss relative to the parameters in the first model according to the intermediate results of the longitudinal federation and the model output in the current round of local iterations, and then calculate the gradient value of the proximal optimization loss relative to the parameters in the first model. , add the two gradient values to get the final gradient value.
  • the method for calculating the gradient value according to the loss may refer to the existing gradient calculation method, and details are not repeated.
  • Step S30 using the gradient value to update the parameter to complete the current round of local iteration.
  • the first device uses each gradient value to update each parameter. That is, each parameter corresponds to a gradient value, and the first device uses the gradient value corresponding to the parameter to update the parameter. Specifically, the first device may add the parameter value of the parameter after the local iterative update in the previous round and the gradient value corresponding to the parameter multiplied by the learning rate to obtain the parameter value of the parameter after the local iterative update of the current round. After updating each parameter, the current round of local iteration is completed.
  • the parameters can be changed in the direction of minimizing the near-end optimization loss, thereby constraining the amount of parameter change and avoiding excessive parameter value changes and distorted.
  • the first device after completing the current round of local iteration, if the first device detects that the number of local iteration rounds of the current round of joint parameter update has been reached, it can perform the next round of joint parameter update; if it is detected that the current round of joint parameter update has not been reached. the number of local iteration rounds, the next round of local iteration can be performed.
  • a maximum number of rounds for jointly updating parameters can be set, and when the number of rounds is reached, the first device stops updating the model parameters.
  • the first device may detect whether the prediction loss has converged after one round of joint parameter updating or after one round of local iteration has ended, and if so, stop updating the parameters. After the parameter update is stopped, the first device takes the current parameter value as the final parameter value of the first model, and after determining the parameter value of the first model, the first model can be used to complete the prediction task.
  • Figure 4 shows the hardware architecture diagram of the first device and the second device participating in the vertical federated learning in one embodiment.
  • the first device and the second device interact with intermediate results. Based on the intermediate results sent by the other party, each of them performs multiple local Rounds of local iterations, in each round of local iterations, the calculation of the near-end optimization loss is added to constrain the variation of the parameters, so as to avoid the parameter value changing too much and causing distortion.
  • the first device participating in the vertical federated learning when the first device participating in the vertical federated learning performs local iteration, it increases the calculation of the parameter values that can characterize the parameters of the first model in the first device in this round of local iterations.
  • a near-end optimization loss compared to the amount of change in parameter values in a preset historical round of local iterations, and based on the near-end optimization loss, the model output of the first model in this round of local iterations, and the data received from the second device.
  • the longitudinal federated intermediate result calculates the gradient values corresponding to the parameters in the first model, and updates the parameters according to the gradient values, that is, increases the proximal optimization loss to constrain the changes of the parameters of the first model in the local iteration, thereby avoiding the local iteration parameters. Excessive changes in the value lead to distortion, which can reduce the communication cost by increasing the number of local iterations, and at the same time ensure the prediction accuracy of the model.
  • the intermediate result of the vertical federation is:
  • the output of the model in the second device includes:
  • Step S201 input the training data of the first device into the first model in the first device for processing, and obtain the model output of the first model in the current round of local iteration;
  • the intermediate result of vertical federated learning is that the second device is in the second device during this round of joint parameter update.
  • the first device may input its training data into the first model for processing, and obtain the model output of the first model in this round of local iteration.
  • Step S202 calculating a prediction result according to the model output and the vertical federation intermediate result, and calculating a prediction loss based on the prediction result and the label data corresponding to the training data;
  • the first device calculates and obtains the prediction result according to the model output and the longitudinal federation intermediate result, and calculates the prediction loss based on the prediction result and the label data corresponding to the training data.
  • the calculation method of the prediction result is different; for example, when the machine learning model of the longitudinal federation learning is a linear regression model, the first device adds the model output and the intermediate result of the longitudinal federation to obtain the prediction result
  • the machine learning model of longitudinal federated learning is a neural network model
  • the first model in the first device includes two parts, Net K and Net c as shown in Figure 3, and the first device inputs the training data into the first device.
  • the Net K part of the model is processed to obtain the model output N K , and then N K and the vertical federated intermediate result N j are input to the Net c part for processing to obtain the prediction result Y out .
  • the first device calculates and obtains the prediction loss according to the prediction result and the label data corresponding to the training data.
  • the prediction loss can be calculated by using a common loss function calculation method, such as a cross entropy loss function, and different loss functions can be used according to different machine learning models being trained.
  • Step S203 adding the prediction loss and the near-end optimization loss to obtain a total loss, and calculating a gradient value corresponding to the parameter based on the total loss.
  • the first device adds the predicted loss and the proximal optimization loss to obtain a total loss.
  • the first device may directly add the two losses, or may weight and sum the two losses, and the weights of the two losses may be set as required.
  • the first device may add the predicted loss to the product of the near-end optimization loss and an adjustment coefficient to obtain the total loss, wherein the adjustment coefficient may be preset and adjusted flexibly in each round of local iterations, for example, In a round of joint parameter update, the adjustment coefficient can be initialized to 0.1 first, and then increase with the increase of the local iteration round, so as to realize that the larger the local iteration round, the greater the constraint on the parameter change. .
  • the first device calculates the gradient value corresponding to the parameter based on the total loss, and the specific calculation process is not described in detail here.
  • the first device may calculate the gradient value of the predicted loss relative to the parameters of the first model, and then calculate the proximal optimization loss relative to the first model. The gradient value of the parameter, and then add the two gradient values, or weighted sum, to obtain the gradient value corresponding to the parameter.
  • FIG. 5 it is a schematic diagram of an interaction flow of a first device and a second device jointly performing multiple rounds of joint parameter update in an embodiment.
  • the first device calculates the gradient value corresponding to the parameters of the first model by using the prediction loss and the near-end optimization loss, and then updates the parameters according to the gradient value, so as to realize the direction of minimizing the prediction loss and the near-end optimization loss Updating the parameters not only improves the prediction accuracy of the model, but also constrains the variation of the parameters to avoid distortion due to excessive parameter changes, so as to ensure that the communication cost is reduced by increasing the number of local iterations, and the prediction accuracy of the model can also be guaranteed. Rate.
  • the second device is a participant with tag data
  • the The longitudinal federated intermediate result is the gradient value of the predicted loss in the second device relative to the output of the first model sent by the first device during the current round of joint parameter update.
  • the step S20 includes:
  • Step S204 input the training data of the first device into the first model of the first device for processing, and obtain the model output of the first model in the current round of local iteration;
  • the first device when the first device is a data provider without label data, and the second device is a data application participant with label data, the first device inputs the training data to the first device during this round of joint parameter update.
  • a model is processed to obtain an output, and the output is sent to the second device as an intermediate result.
  • the second device calculates the gradient value of the predicted loss relative to the output, and sends it to the first device as an intermediate result.
  • the intermediate result is the vertical federation Intermediate results.
  • the first device may input its training data into the first model for processing, and obtain the model output of the first model in this round of local iteration.
  • the first device inputs the training data into Net j for processing, and obtains the model output N j .
  • Step S205 calculating and obtaining the first sub-gradient value of the predicted loss relative to the parameter according to the model output and the longitudinal federation intermediate result;
  • the first device calculates the gradient value of the predicted loss relative to the parameters in the first model according to the model output and the longitudinal federated intermediate result (the difference is not shown and is hereinafter referred to as the first sub-gradient value).
  • the first device calculates the first sub-gradient value corresponding to each parameter in the first model according to the back-propagation method according to the longitudinal federation intermediate result and the model output.
  • the first sub-gradient value can be calculated according to the following formula:
  • w is the parameter in the first model
  • N j is the intermediate result sent to the second device during the current round of joint parameter update (that is, the model output of the first model before the local iteration)
  • G(N j ) is the first
  • the second device returns the gradient value of the predicted loss relative to Nj , that is, the vertical federated intermediate result
  • Nb is the model output of the first model in this round of local iterations.
  • Step S206 Calculate a second sub-gradient value of the near-end optimization loss relative to the parameter, and add the first sub-gradient value and the second sub-gradient value to obtain a gradient value corresponding to the parameter.
  • the first device calculates the gradient value of the proximal optimization loss relative to the parameters in the first model (the difference is not shown and is hereinafter referred to as the second sub-gradient value).
  • the first device adds the first sub-gradient value and the second sub-gradient value of the parameter to obtain the gradient value corresponding to the parameter.
  • each parameter has a corresponding first sub-gradient value and a second sub-gradient value, and the respective first sub-gradient value and second sub-gradient value of each parameter are added to obtain The corresponding gradient values for each parameter.
  • the first device calculates the gradient value corresponding to the parameters of the first model by using the prediction loss and the near-end optimization loss, and then updates the parameters according to the gradient value, so as to realize the direction of minimizing the prediction loss and the near-end optimization loss Updating the parameters not only improves the prediction accuracy of the model, but also constrains the variation of the parameters to avoid distortion due to excessive parameter changes, so as to ensure that the communication cost is reduced by increasing the number of local iterations, and the prediction accuracy of the model can also be guaranteed. Rate.
  • step S206 includes:
  • Step S2061 Multiply the second sub-gradient value by a preset adjustment coefficient and then add the first sub-gradient value to obtain a gradient value corresponding to the parameter.
  • an adjustment coefficient may be set in the first device to adjust the degree of constraint on the parameter variation during each round of local iteration.
  • the first device may multiply the second sub-gradient value by the adjustment coefficient and then add the first sub-gradient value to obtain the gradient value corresponding to the parameter.
  • the first device can adjust the adjustment coefficient according to the round of local iteration. For example, in one round of joint parameter update, the adjustment coefficient can be initialized to 0.1 first, and then increase with the increase of the round of local iteration, so as to realize the The larger the local iteration round, the stronger the constraint on the variation of the parameters.
  • a fourth embodiment of the user risk prediction method of the present application is proposed.
  • the method is applied to the first device participating in vertical federated learning
  • the first device is connected in communication with the second device participating in the longitudinal federated learning
  • the first device and the second device may be devices such as a smart phone, a personal computer, and a server.
  • the user risk prediction method includes the following steps:
  • Step A10 based on the near-end optimization loss and the second device jointly perform vertical federated learning to obtain the local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end model to be trained in the current local iteration The amount of change in the value compared to the parameter value in the local iteration of the preset historical round;
  • the first device may be a data application participant or a data providing participant.
  • the first device is deployed with a first data set and a first model (hereinafter also referred to as the model to be trained at the local end) based on the data of each user under the first data feature, and the second device is deployed in the second device based on the data of each user.
  • the second data set and the second model (hereinafter also referred to as the model to be trained on the other end) constructed from the data under the data feature, the user dimensions of the two data sets are the same; the first data feature and the second data feature are related to predicting user risks The first data feature and the second data feature are different; the first model and the second model are two parts of a machine learning model.
  • a commonly used machine learning model can be selected according to the needs to realize, such as linear A regression model or a neural network model, the prediction result of the model is set in a data form that can characterize the user's risk degree, such as a risk value; the first device and the second device jointly use the first data set and the second data set to train the first model and the second data set.
  • the second model after the training is completed, can use the two models to jointly predict the user's risk.
  • the risk may be the credit risk before the user's loan, the default repayment risk in the user's loan, and the like.
  • the first device is a device deployed in a bank
  • the first data feature is a feature related to banking services, such as the number of historical loans of the user, the number of historical defaults of the user, etc.
  • the second device is deployed in an e-commerce business Device
  • the second data feature is the feature related to e-commerce business, such as the user's historical purchase times, amount, etc.
  • the first device and the second device use their own data sets to perform vertical federated learning, and train the pre-loan credit risk prediction system. Model.
  • the first device and the second device jointly perform longitudinal federated learning based on the near-end optimization loss to obtain a local-end risk prediction model.
  • the first device may perform each round of local iteration in each round of joint parameter updating according to the model parameter updating method in the above-mentioned first, second or third embodiment, so as to update the parameters in the first model. Describe in detail. After performing multiple rounds of joint parameter update, the first device uses the first model after the parameters are finally updated as the local-end risk prediction model.
  • Step A20 using the local end risk prediction model to predict and obtain the risk value of the user to be predicted.
  • the first device may use the local-end risk prediction model to predict the risk value of the user to be predicted.
  • the second device also performs each round of local iteration in each round of joint parameter updating according to the model parameter updating method in the above-mentioned embodiment to update the parameters in the second model.
  • the second device uses the second model after the parameters are finally updated as the risk prediction model of the other end (wherein, the other end refers to the second device); the first device can adopt the risk prediction model of the local end, combined with the risk prediction model of the other end in the second device Predict to get the risk value of the user to be predicted.
  • the risk value may be a value representing the user's risk level.
  • the second device may send the other-end risk prediction model to the first device, and the first device obtains a model output by inputting the user data of the user to be predicted under the first data feature into the local-end risk prediction model. , and then input the user data of the user to be predicted under the second data feature into the risk prediction model at the other end to obtain a model output, and obtain the risk value of the user to be predicted according to the outputs of the two models, for example, directly adding the two models.
  • the first device inputs the user data of the user to be predicted under the first data feature into the local risk prediction model to obtain a model output;
  • the second device inputs the user data of the user to be predicted under the second data feature into the risk prediction model of the other end to obtain a model output, and sends it to the first device;
  • the first device calculates the risk of the user to be predicted according to the two model outputs For example, when the local risk prediction model of the first device includes NetK and Netc as shown in FIG. 3 , the first device inputs the output of each model to the Netc part for processing to obtain the risk value of the user to be predicted.
  • the first device will predict the user of the user under the first data feature.
  • the data is input into the risk prediction model of the local end to obtain a model output, and the model output is sent to the second device;
  • the second device inputs the user data of the user to be predicted under the second data feature into the risk prediction model of the other end to obtain a model Output, the second device calculates and obtains the risk value of the user to be predicted according to the output of the two models, and returns the risk value to the first device.
  • the first device increases the parameter value of the parameter that can characterize the first model in the first device in this round of local iteration, compared with the parameter value in the preset
  • the near-end optimization loss of the change of parameter values in the local iterations of the historical rounds is used to constrain the change of the parameters of the first model in the local iteration to avoid distortion caused by excessive parameter value changes during the local iteration. Therefore, the communication cost in the user risk prediction can be reduced, and the accuracy of the user risk prediction can also be guaranteed.
  • the step A10 includes:
  • Step A101 receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device;
  • the first device When the first device performs a round of joint parameter update, the first device receives the vertical federation intermediate result of the current round of joint parameter update sent by the second device. Specifically, if the first device is a data application participant with label data, the second device inputs its training data into the second model for processing in this round of joint parameter update to obtain the output of the model, and sends the output as an intermediate result To the first device, the intermediate result is the vertical federated intermediate result.
  • the first device If the first device is a data provider without label data, the first device inputs its training data into the first model for processing to obtain the output of the model, and sends the output as an intermediate result to the second device, and the second device calculates The gradient value of the prediction loss relative to this output is sent to the first device as an intermediate result, which is the longitudinal federated intermediate result.
  • Step A102 based on the near-end optimization loss and the vertical federation intermediate result, perform local iterative update of a preset number of rounds of parameters in the local model to be trained;
  • the first device performs a local iterative update of a preset number of rounds of parameters in the model to be trained at the local end based on the near-end optimization loss and the vertical federation intermediate result.
  • the method performs each round of local iteration on the model to be trained at the local end, which will not be described in detail here.
  • the preset number of rounds may be a number set in advance as required.
  • Step A103 detecting whether the local model to be trained after updating the parameters satisfies the preset model condition
  • the preset model condition may be a preset condition, such as the convergence of prediction loss, or the round of joint parameter update reaches a predetermined round, or the duration of joint parameter update reaches a predetermined duration.
  • Step A104 if it is satisfied, the local to-be-trained model after updating the parameters is used as the local-end risk prediction model;
  • the first device may use the local-end to-be-trained model after updating the parameters as the local-end risk prediction model.
  • the second device uses the other-end to-be-trained model after updating the parameters as the other-end risk prediction model.
  • Step A105 if not satisfied, return to the step of receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device.
  • the first device If it is detected that the preset model condition is not met, the first device returns to the above-mentioned step A101, that is, performs the next round of joint parameter update.
  • the step A102 includes:
  • Step A1021 Calculate the near-end optimization loss, and calculate the gradient corresponding to the parameter based on the near-end optimization loss, the model output of the local model to be trained in this round of local iterations, and the vertical federation intermediate result value;
  • Step A1022 using the gradient value to update the parameter to complete the current round of local iteration
  • the first device can calculate the near-end optimization loss and the gradient value corresponding to each parameter according to the specific implementation process of step S10 and step S20 in the above-mentioned first embodiment, and according to the specific implementation process of the above-mentioned step S30, according to the gradient value
  • the update parameters are not described in detail in this embodiment.
  • Step A1023 detecting whether the number of local iteration rounds reaches a preset number of rounds
  • Step A1024 if so, execute the step of detecting whether the local model to be trained after updating the parameters meets the preset model conditions;
  • Step A1025 if not reached, return to the step of calculating the near-end optimization loss, and increment the number of local iteration rounds by 1.
  • the first device After completing one round of local iteration, the first device detects whether the current number of local iteration rounds has reached the preset number of rounds; if so, the first device executes step A103; if not, then increments the number of local iteration rounds by one, And return to step A1021, that is, perform the next round of local iteration.
  • an embodiment of the present application also proposes an apparatus for updating model parameters.
  • the apparatus is deployed on a first device that participates in vertical federated learning, and the first device is communicatively connected to a second device that participates in vertical federated learning.
  • the device includes:
  • the first calculation module 10 is configured to calculate a near-end optimization loss, wherein the near-end optimization loss represents that the parameter values of the parameters of the first model in the first device in this round of local iterations are compared with those in the preset history. The amount of change in the parameter value in the local iteration of the round;
  • the second calculation module 20 is configured to calculate the parameter based on the proximal optimization loss, the model output of the first model in the current local iteration, and the longitudinal federation intermediate result received from the second device corresponding gradient value;
  • the updating module 30 is configured to update the parameter by using the gradient value to complete the current round of local iteration.
  • the first computing module 10 includes:
  • the first calculation unit is used to perform the corresponding element subtraction of the parameter vector of the parameter of the first model in the first device in this round of local iteration and the parameter vector in the local iteration of the preset historical round to obtain the difference. vector;
  • the second calculation unit is configured to calculate the sum of squares of each element in the difference vector, and obtain the near-end optimization loss based on the sum of squares.
  • the vertical federated intermediate result is the output of the model in the second device
  • the second calculation module 20 includes:
  • a first processing unit configured to input the training data of the first device into the first model in the first device for processing, and obtain the model output of the first model in the current round of local iteration;
  • a third computing unit configured to calculate and obtain a prediction result according to the model output and the vertical federation intermediate result, and calculate and obtain a prediction loss based on the prediction result and the label data corresponding to the training data;
  • the fourth calculation unit is configured to add the prediction loss and the near-end optimization loss to obtain a total loss, and calculate the gradient value corresponding to the parameter based on the total loss.
  • the vertical federation intermediate result is the predicted loss in the second device relative to the first device sent during the current round of joint parameter update.
  • the second computing module 20 includes:
  • the second processing unit is used to input the training data of the first device into the first model of the first device for processing, and obtain the model output of the first model in this round of local iterations;
  • a fifth calculation unit configured to calculate and obtain a first sub-gradient value of the predicted loss relative to the parameter according to the model output and the longitudinal federation intermediate result
  • the sixth calculation unit is configured to calculate the second sub-gradient value of the near-end optimization loss relative to the parameter, and add the first sub-gradient value and the second sub-gradient value to obtain the corresponding value of the parameter. gradient value.
  • sixth computing unit is also used for:
  • the gradient value corresponding to the parameter is obtained by multiplying the second sub-gradient value by a preset adjustment coefficient and then adding the first sub-gradient value.
  • an embodiment of the present application further proposes a user risk prediction device, the device is deployed on a first device participating in vertical federated learning, the first device is in communication connection with a second device participating in vertical federated learning, and the device includes:
  • the federated learning module is used to jointly perform vertical federated learning with the second device based on the near-end optimization loss to obtain a local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in compared to the parameter value in the local iteration of the preset historical round;
  • a prediction module configured to use the local risk prediction model to predict and obtain the risk value of the user to be predicted.
  • the federated learning module includes:
  • a receiving unit configured to receive the vertical federation intermediate result of the current round of joint parameter update sent by the second device
  • a local iterative unit configured to perform local iterative update of a preset number of rounds of parameters in the model to be trained at the local end based on the near-end optimization loss and the vertical federated intermediate result;
  • a detection unit configured to detect whether the local model to be trained after updating the parameters satisfies the preset model conditions
  • a determination unit configured to use the local-end to-be-trained model after updating the parameters as the local-end risk prediction model if it is satisfied;
  • the returning unit is configured to, if not satisfied, return to the step of receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device.
  • the local iterative unit includes:
  • the calculation subunit is used to calculate the near-end optimization loss, and based on the near-end optimization loss, the model output of the local to-be-trained model in the current round of local iterations, and the intermediate results of the vertical federation, the parameters are calculated and obtained corresponding gradient value;
  • an update subunit configured to update the parameter by using the gradient value to complete the current round of local iteration
  • an execution subunit configured to execute the step of detecting whether the model to be trained at the local end after updating the parameters meets the preset model condition if it is reached;
  • the returning subunit is used for returning to the step of calculating the near-end optimization loss if not reached, and incrementing the number of local iteration rounds by 1.
  • an embodiment of the present application also proposes a computer-readable storage medium, where a model parameter update program is stored on the storage medium, and when the model parameter update program is executed by a processor, the steps of the above-mentioned model parameter update method are implemented .
  • the present application also proposes a computer program product, including a computer program, which implements the steps of the above-mentioned model parameter updating method when the computer program is executed by a processor.
  • an embodiment of the present application also proposes a computer-readable storage medium, where a user risk prediction program is stored on the storage medium, and when the user risk prediction program is executed by a processor, the steps of the user risk prediction method described above are implemented .
  • the present application also proposes a computer program product, comprising a computer program, when the computer program is executed by a processor, the steps of the above-mentioned user risk prediction method are implemented.
  • a computer program product comprising a computer program, when the computer program is executed by a processor, the steps of the above-mentioned user risk prediction method are implemented.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the embodiments of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A model parameter updating method, apparatus and device, a storage medium, and a program product. The method comprises: calculating a proximal optimization loss, wherein the proximal optimization loss represents a change amount of a parameter value of a parameter of a first model in a first device in local iteration of a current round compared with a parameter value in local iteration of a preset historical round (S10); on the basis of the proximal optimization loss, a model output of the first model in the local iteration of the current round and a longitudinal federated intermediate result received from a second device, calculating a gradient value corresponding to the parameter (S20); and updating the parameter by using the gradient value to complete the local iteration of the current round (S30).

Description

模型参数更新方法、装置、设备、存储介质及程序产品Model parameter updating method, apparatus, equipment, storage medium and program product
本申请要求2021年3月17日申请的,申请号为202110287041.3,名称为“模型参数更新方法、装置、设备、存储介质及程序产品”的中国专利申请的优先权,在此将其全文引入作为参考。This application claims the priority of the Chinese patent application filed on March 17, 2021, the application number is 202110287041.3, and the title is "Model Parameter Update Method, Apparatus, Equipment, Storage Medium and Program Product", which is hereby incorporated in its entirety as refer to.
技术领域technical field
本申请涉及机器学习技术领域,尤其涉及一种模型参数更新方法、装置、设备、存储介质及程序产品。The present application relates to the technical field of machine learning, and in particular, to a method, apparatus, device, storage medium and program product for updating model parameters.
背景技术Background technique
随着人工智能的发展,人们为解决数据孤岛的问题,提出了“联邦学习”的概念,使得联邦双方在不用给出己方数据的情况下,也可进行模型训练得到模型参数,并且可以避免数据隐私泄露的问题。纵向联邦学习,纵向联邦学习是在参与者的数据特征重叠较小,而用户重叠较多的情况下,取出参与者用户相同而用户数据特征不同的那部分用户及数据进行联合训练机器学习模型。With the development of artificial intelligence, people put forward the concept of "federated learning" in order to solve the problem of data islands, so that both sides of the federation can train models to obtain model parameters without giving their own data, and can avoid data privacy breaches. Vertical federated learning, vertical federated learning is to take out the part of the users and data that have the same user but different user data characteristics to jointly train the machine learning model when the data features of the participants overlap less and the users overlap more.
在纵向联邦学习过程中,拥有标签数据的参与方需要与其他参与方之间进行多次通信,以传输对方更新参数时所需的中间结果,例如模型输出或模型输出对应的梯度值。双方需要进行多轮联合参数更新,也即需要进行多次通信,因此通信成本较大。针对这一问题,目前提出了参与方利用其他参与方发送的一次中间结果在本地进行多轮本地迭代的方案,通过增加本地迭代次数来减少联合更新参数的次数,从而减少通信成本。In the vertical federated learning process, the party with the label data needs to communicate with other parties multiple times to transmit the intermediate results required by the other party to update the parameters, such as the model output or the gradient value corresponding to the model output. Both parties need to perform multiple rounds of joint parameter update, that is, multiple communications are required, so the communication cost is relatively large. In response to this problem, a scheme is proposed in which participants use an intermediate result sent by other participants to perform multiple rounds of local iterations locally. By increasing the number of local iterations, the number of jointly updating parameters is reduced, thereby reducing communication costs.
但是,该方案中当参与方本地迭代次数较多时,容易出现参数失真的问题,导致模型的性能无法保障,而当本地迭代次数较少时,又无法有效地减少通信成本。However, in this scheme, when the number of local iterations of the participants is large, the problem of parameter distortion is prone to occur, which leads to the inability to guarantee the performance of the model, and when the number of local iterations is small, the communication cost cannot be effectively reduced.
发明内容SUMMARY OF THE INVENTION
本申请的主要目的在于提供一种模型参数更新方法、装置、设备、存储介质及程序产品,旨在目前纵向联邦学习方案中通信成本和模型性能难以兼顾的问题。The main purpose of this application is to provide a model parameter updating method, apparatus, device, storage medium and program product, aiming at the problem that communication cost and model performance are difficult to balance in the current vertical federated learning scheme.
为实现上述目的,本申请提供一种模型参数更新方法,所述方法应用于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述方法包括以下步骤:In order to achieve the above object, the present application provides a method for updating model parameters. The method is applied to a first device participating in vertical federated learning, and the first device is communicatively connected to a second device participating in vertical federated learning. The method includes: The following steps:
计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;Calculate a near-end optimization loss, wherein the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter values in the preset historical rounds of local iterations the amount of change;
基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;Calculate the gradient value corresponding to the parameter based on the proximal optimization loss, the model output of the first model in the current local iteration, and the longitudinal federated intermediate result received from the second device;
采用所述梯度值更新所述参数,以完成本轮本地迭代。The parameters are updated using the gradient values to complete the current round of local iterations.
为实现上述目的,本申请提供一种用户风险预测方法,所述方法应用于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述方法包括以下步骤:In order to achieve the above object, the present application provides a user risk prediction method, the method is applied to a first device participating in longitudinal federated learning, the first device is communicatively connected to a second device participating in longitudinal federated learning, and the method includes: The following steps:
基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The local-end risk prediction model is obtained by jointly performing longitudinal federated learning with the second device based on the near-end optimization loss, wherein the near-end optimization loss represents the comparison of the parameter values of the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in the local iteration of the preset historical round;
采用所述本端风险预测模型预测得到待预测用户的风险值。The risk value of the user to be predicted is obtained by using the local end risk prediction model to predict.
为实现上述目的,本申请提供一种模型参数更新装置,所述装置部署于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述装置包括:In order to achieve the above object, the present application provides a model parameter updating device, the device is deployed in a first device participating in the vertical federated learning, the first device is communicatively connected with the second device participating in the vertical federated learning, and the device includes: :
第一计算模块,用于计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;A first calculation module, configured to calculate a near-end optimization loss, wherein the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter value in the preset historical round The amount of change in the parameter value in the next local iteration;
第二计算模块,用于基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;The second calculation module is configured to calculate the corresponding parameter based on the near-end optimization loss, the model output of the first model in this local iteration, and the longitudinal federation intermediate result received from the second device the gradient value of ;
更新模块,用于采用所述梯度值更新所述参数,以完成本轮本地迭代。An update module, configured to update the parameter by using the gradient value to complete the current round of local iteration.
为实现上述目的,本申请提供一种用户风险预测装置,所述装置部署于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述装置包括:In order to achieve the above object, the present application provides a user risk prediction device, the device is deployed in a first device participating in vertical federated learning, the first device is in communication connection with a second device participating in vertical federated learning, and the device includes: :
联邦学习模块,用于基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The federated learning module is used to jointly perform vertical federated learning with the second device based on the near-end optimization loss to obtain a local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in compared to the parameter value in the local iteration of the preset historical round;
预测模块,用于采用所述本端风险预测模型预测得到待预测用户的风险值。A prediction module, configured to use the local risk prediction model to predict and obtain the risk value of the user to be predicted.
为实现上述目的,本申请还提供一种模型参数更新设备,所述模型参数更新设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如上所述的模型参数更新方法的步骤。In order to achieve the above object, the present application also provides a model parameter update device, the model parameter update device includes: a memory, a processor, and a model parameter update program stored on the memory and running on the processor, The model parameter update program, when executed by the processor, implements the steps of the model parameter update method described above.
为实现上述目的,本申请还提供一种用户风险预测设备,所述用户风险预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的用户风险预测程序,所述用户风险预测程序被所述处理器执行时实现如上所述的用户风险预测方法的步骤。In order to achieve the above object, the present application also provides a user risk prediction device, the user risk prediction device includes: a memory, a processor, and a user risk prediction program stored on the memory and running on the processor, The user risk prediction program implements the steps of the user risk prediction method described above when executed by the processor.
此外,为实现上述目的,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有模型参数更新程序,所述模型参数更新程序被处理器执行时实现如上所述的模型参数更新方法的步骤。In addition, in order to achieve the above purpose, the present application also proposes a computer-readable storage medium, where a model parameter update program is stored on the computer-readable storage medium, and when the model parameter update program is executed by a processor, the above-mentioned Steps of the model parameter update method.
此外,为实现上述目的,本申请还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有用户风险预测程序,所述用户风险预测程序被 处理器执行时实现如上所述的用户风险预测方法的步骤。In addition, in order to achieve the above purpose, the present application also proposes a computer-readable storage medium, where a user risk prediction program is stored on the computer-readable storage medium, and when the user risk prediction program is executed by a processor, the above-mentioned Steps of the User Risk Prediction Method.
此外,为实现上述目的,本申请还提出一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上所述的模型参数更新方法的步骤。In addition, in order to achieve the above object, the present application also proposes a computer program product, including a computer program, which implements the steps of the above-mentioned model parameter updating method when the computer program is executed by a processor.
此外,为实现上述目的,本申请还提出一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上所述的用户风险预测方法的步骤。In addition, in order to achieve the above object, the present application also proposes a computer program product, including a computer program, which implements the steps of the above-mentioned user risk prediction method when the computer program is executed by a processor.
相比于现有方案,在本申请中,参与纵向联邦学习的第一设备在进行本地迭代时,增加计算能够表征第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量的近端优化损失,并基于近端优化损失、第一模型在本轮本地迭代中的模型输出以及从第二设备接收到的纵向联邦中间结果计算第一模型中参数对应的梯度值,根据梯度值来更新参数,也即增加近端优化损失来约束第一模型的参数在本地迭代中的变化量,从而避免本地迭代时参数值变化过大导致失真,实现了在通过增加本地迭代次数减少通信成本的同时,还能够保证模型的预测准确率。Compared with the existing solution, in the present application, when the first device participating in the longitudinal federated learning performs local iteration, the calculation of the parameters that can characterize the first model in the first device is increased compared with the parameter values in this round of local iteration. A near-end optimization loss based on the amount of change in parameter values in a preset historical round of local iterations, and based on the near-end optimization loss, the model output of the first model in this round of local iterations, and the longitudinal direction received from the second device The federal intermediate result calculates the gradient value corresponding to the parameter in the first model, and updates the parameter according to the gradient value, that is, increases the near-end optimization loss to constrain the variation of the parameter of the first model in the local iteration, thereby avoiding the parameter value during the local iteration. Excessive changes lead to distortion, which can reduce the communication cost by increasing the number of local iterations, and at the same time ensure the prediction accuracy of the model.
附图说明Description of drawings
图1为本申请实施例方案涉及的硬件运行环境的结构示意图;FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution according to an embodiment of the present application;
图2为本申请模型参数更新方法第一实施例的流程示意图;FIG. 2 is a schematic flowchart of the first embodiment of the model parameter updating method of the present application;
图3为本申请实施例涉及的一种参与方进行联合参数更新的示意图;3 is a schematic diagram of updating joint parameters by a participant involved in an embodiment of the present application;
图4为本申请实施例涉及的一种第一设备和第二设备进行纵向联邦学习的硬件架构图;FIG. 4 is a hardware architecture diagram of vertical federated learning performed by a first device and a second device involved in an embodiment of the application;
图5为本申请实施例涉及的一种第一设备与第二设备进行多轮联合参数更新的交互流程示意图;FIG. 5 is a schematic diagram of an interaction flow of multiple rounds of joint parameter update between a first device and a second device according to an embodiment of the present application;
图6为本申请模型参数更新装置较佳实施例的功能模块示意图。FIG. 6 is a schematic diagram of functional modules of a preferred embodiment of a model parameter updating device of the present application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics and advantages of the purpose of the present application will be further described with reference to the accompanying drawings in conjunction with the embodiments.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图。As shown in FIG. 1 , FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in the solution of the embodiment of the present application.
需要说明的是,本申请实施例模型参数更新设备可以是智能手机、个人计算机和服务器等设备,在此不做具体限制,模型参数更新设备可以是参与纵向联邦学习的第一设备。It should be noted that the device for updating model parameters in the embodiment of the present application may be devices such as smart phones, personal computers, and servers, which are not specifically limited here, and the device for updating model parameters may be the first device participating in vertical federated learning.
如图1所示,该模型参数更新设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接 口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1 , the model parameter updating device may include: a processor 1001 , such as a CPU, a network interface 1004 , a user interface 1003 , a memory 1005 , and a communication bus 1002 . Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be high-speed RAM memory, or may be non-volatile memory, such as disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001 .
本领域技术人员可以理解,图1中示出的设备结构并不构成对模型参数更新设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the device structure shown in FIG. 1 does not constitute a limitation on the model parameter updating device, and may include more or less components than those shown in the figure, or combine some components, or arrange different components .
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及模型参数更新程序。其中,操作系统是管理和控制设备硬件和软件资源的程序,支持模型参数更新程序以及其它软件或程序的运行。在图1所示的设备中,用户接口1003主要用于与客户端进行数据通信;网络接口1004主要用于与参与纵向联邦学习的第二设备建立通信连接;处理器1001可以用于调用存储器1005中存储的模型参数更新程序,并执行以下操作:As shown in FIG. 1 , the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module and a model parameter updating program. Among them, the operating system is a program that manages and controls the hardware and software resources of the device, and supports the operation of the model parameter update program and other software or programs. In the device shown in FIG. 1 , the user interface 1003 is mainly used for data communication with the client; the network interface 1004 is mainly used to establish a communication connection with the second device participating in the vertical federated learning; the processor 1001 can be used to call the memory 1005 The model parameter updater stored in , and does the following:
计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;Calculate a near-end optimization loss, wherein the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter values in the preset historical rounds of local iterations the amount of change;
基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;Calculate the gradient value corresponding to the parameter based on the proximal optimization loss, the model output of the first model in the current local iteration, and the longitudinal federated intermediate result received from the second device;
采用所述梯度值更新所述参数,以完成本轮本地迭代。The parameters are updated using the gradient values to complete the current round of local iterations.
进一步地,所述计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量的步骤包括:Further, the near-end optimization loss is calculated, wherein the near-end optimization loss represents that the parameter values of the parameters of the first model in the first device in the current round of local iterations are compared with the values in the preset historical rounds. The steps for changing the amount of parameter value in the local iteration include:
将所述第一设备中第一模型的参数在本轮本地迭代中的参数向量与在预设历史轮次的本地迭代中的参数向量进行对应元素相减,得到差向量;The parameter vector of the parameter of the first model in the first device in the current round of local iteration and the parameter vector in the local iteration of the preset historical round are subtracted by corresponding elements to obtain a difference vector;
计算所述差向量中各元素的平方和,基于所述平方和得到所述近端优化损失。A sum of squares of each element in the difference vector is calculated, and the near-end optimization loss is obtained based on the sum of squares.
进一步地,当所述第一设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中模型的输出,Further, when the first device is a participant with label data, the vertical federation intermediate result is the output of the model in the second device,
所述基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值的步骤包括:The step of calculating the gradient value corresponding to the parameter based on the proximal optimization loss, the model output of the first model in the current round of local iterations, and the longitudinal federated intermediate result received from the second device include:
将所述第一设备的训练数据输入所述第一设备中的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;inputting the training data of the first device into the first model in the first device for processing to obtain the model output of the first model in this round of local iterations;
根据所述模型输出和所述纵向联邦中间结果计算得到预测结果,并基于所述预测结果和所述训练数据对应的标签数据计算得到预测损失;Calculate the predicted result according to the model output and the vertical federated intermediate result, and calculate the predicted loss based on the predicted result and the label data corresponding to the training data;
将所述预测损失和所述近端优化损失相加得到总损失,基于所述总损失计算得到所述参数对应的梯度值。A total loss is obtained by adding the prediction loss and the near-end optimization loss, and a gradient value corresponding to the parameter is calculated based on the total loss.
进一步地,当所述第二设备为拥有标签数据的参与方时,所述纵向联邦 中间结果为所述第二设备中预测损失相对于所述第一设备在本轮联合参数更新时发送的第一模型的输出的梯度值,Further, when the second device is a participant with tag data, the vertical federation intermediate result is the predicted loss in the second device relative to the first device sent during the current round of joint parameter update. The gradient value of the output of a model,
所述基于所述近端优化损失、所述模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值的步骤包括:The step of calculating the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the model in the current round of local iterations, and the longitudinal federated intermediate result received from the second device includes:
将所述第一设备的训练数据输入所述第一设备的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;Input the training data of the first device into the first model of the first device for processing, and obtain the model output of the first model in this round of local iterations;
根据所述模型输出和所述纵向联邦中间结果计算得到所述预测损失相对于所述参数的第一子梯度值;Calculate the first sub-gradient value of the predicted loss with respect to the parameter according to the model output and the longitudinal federated intermediate result;
计算所述近端优化损失相对于所述参数的第二子梯度值,将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值。A second sub-gradient value of the proximal optimization loss relative to the parameter is calculated, and the first sub-gradient value and the second sub-gradient value are added to obtain a gradient value corresponding to the parameter.
进一步地,所述将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值的步骤包括:Further, the step of adding the first sub-gradient value and the second sub-gradient value to obtain the gradient value corresponding to the parameter includes:
将所述第二子梯度值乘以预设调节系数后加上所述第一子梯度值得到所述参数对应的梯度值。The gradient value corresponding to the parameter is obtained by multiplying the second sub-gradient value by a preset adjustment coefficient and then adding the first sub-gradient value.
本申请实施例还提出一种用户风险预测设备,所述用户风险预测设备是参与纵向联邦学习的第一设备,第一设备与参与纵向联邦学习的第二设备建立通信连接,所述用户风险预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的用户风险预测程序,所述用户风险预测程序被所述处理器执行时实现如下步骤:The embodiment of the present application also proposes a user risk prediction device, the user risk prediction device is a first device participating in longitudinal federated learning, the first device establishes a communication connection with a second device participating in longitudinal federated learning, and the user risk prediction device The device includes: a memory, a processor, and a user risk prediction program stored on the memory and executable on the processor, where the user risk prediction program is executed by the processor to implement the following steps:
基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The local-end risk prediction model is obtained by jointly performing longitudinal federated learning with the second device based on the near-end optimization loss, wherein the near-end optimization loss represents the comparison of the parameter values of the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in the local iteration of the preset historical round;
采用所述本端风险预测模型预测得到待预测用户的风险值。The risk value of the user to be predicted is obtained by using the local end risk prediction model to predict.
进一步地,所述基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型的步骤包括:Further, the step of jointly performing vertical federated learning with the second device based on the near-end optimization loss to obtain the local-end risk prediction model includes:
接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果;receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device;
基于近端优化损失和所述纵向联邦中间结果对所述本端待训练模型中的参数进行预设轮数的本地迭代更新;Based on the near-end optimization loss and the vertical federated intermediate result, locally iteratively update the parameters in the local model to be trained with a preset number of rounds;
检测更新参数后的本端待训练模型是否满足预设模型条件;Check whether the local model to be trained after updating the parameters meets the preset model conditions;
若满足,则将更新参数后的本端待训练模型作为所述本端风险预测模型;If it is satisfied, the local to-be-trained model after updating the parameters is used as the local-end risk prediction model;
若不满足,则返回执行所述接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果的步骤。If not satisfied, return to the step of receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device.
进一步地,所述基于近端优化损失和所述纵向联邦中间结果对所述本端待训练模型中的参数进行预设轮数的本地迭代更新的步骤包括:Further, the step of locally iteratively updating the parameters in the model to be trained at the local end with a preset number of rounds based on the near-end optimization loss and the vertical federated intermediate result includes:
计算近端优化损失,并基于所述近端优化损失、所述本端待训练模型在本轮本地迭代中的模型输出以及所述纵向联邦中间结果,计算得到所述参数对应的梯度值;Calculate the near-end optimization loss, and calculate the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the local model to be trained in the current round of local iteration, and the intermediate result of the vertical federation;
采用所述梯度值更新所述参数以完成本轮本地迭代;Use the gradient value to update the parameter to complete the current round of local iteration;
检测本地迭代轮数是否达到预设轮数;Check whether the number of local iteration rounds reaches the preset number of rounds;
若达到,则执行所述检测更新参数后的本端待训练模型是否满足预设模型条件的步骤;If so, perform the step of detecting whether the local model to be trained after updating the parameters meets the preset model conditions;
若未达到,则返回执行所述计算近端优化损失的步骤,并将所述本地迭代轮数自增1。If not, return to the step of calculating the near-end optimization loss, and increment the number of local iteration rounds by 1.
基于上述的结构,提出模型参数更新方法的各实施例。Based on the above structure, various embodiments of the model parameter updating method are proposed.
参照图2,图2为本申请模型参数更新方法第一实施例的流程示意图。需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。本申请模型参数更新方法应用于参与纵向联邦学习的第一设备,第一设备与参与纵向联邦学习的第二设备通信连接,第一设备和第二设备可以是智能手机、个人计算机和服务器等设备。在本实施例中,模型参数更新方法包括:Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a first embodiment of a method for updating model parameters of the present application. It should be noted that although a logical order is shown in the flowcharts, in some cases, the steps shown or described may be performed in an order different from that herein. The model parameter update method of the present application is applied to a first device participating in vertical federated learning, the first device is connected to the second device participating in vertical federated learning, and the first device and the second device can be devices such as smart phones, personal computers, and servers. . In this embodiment, the model parameter updating method includes:
步骤S10,计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;Step S10, calculating a near-end optimization loss, wherein the near-end optimization loss represents the parameter values of the parameters of the first model in the first device in this round of local iterations compared to local iterations in preset historical rounds The amount of change in the parameter value;
在本实施例中,纵向联邦学习中的参与方分为两类,一类是拥有标签数据的数据应用参与方,一类是没有标签数据的数据提供参与方,一般情况下,数据应用参与方有一个,数据提供参与方有一个或多个。各个参与方分别部署有基于各自数据特征构建的数据集和机器学习模型,各个参与方的机器学习模型组合起来构成一个完整的模型,用于完成预测或分类等模型任务。其中,各个参与方的数据集的样本维度是对齐的,也即,各个数据集的样本ID是相同的,但是各个参与方的数据特征可各不相同。各个参与方预先可采用加密样本对齐的方式来构建样本维度对齐的数据集,在此不进行详细赘述。In this embodiment, the participants in the vertical federated learning are divided into two categories: one is a data application participant with labeled data, and the other is a data provider without labeled data. In general, a data application participant There is one, and there are one or more data-providing parties. Each participant deploys a data set and a machine learning model based on their respective data features, and the machine learning models of each participant are combined to form a complete model, which is used to complete model tasks such as prediction or classification. The sample dimensions of the data sets of each participant are aligned, that is, the sample IDs of each data set are the same, but the data characteristics of each participant may be different. Each participant may use the encrypted sample alignment method in advance to construct a sample dimension-aligned data set, which will not be described in detail here.
参与方部署的机器学习模型可以是普通的机器学习模型,例如线性回归模型、神经网络模型等,也可以是自动机器学习中使用的模型,例如搜索网络。搜索网络是指用于进行模型参数更新(NAS)的网络;搜索网络中包括多个单元,每个单元对应一个网络层,其中部分单元之间设置有连接操作,以其中两个单元为例,这两个单元之前的连接操作可以是预先设置的N种连接操作,并定义了每种连接操作对应的权重,该权重即搜索网络的结构参数,单元内的网络层参数即搜索网络的模型参数;在模型训练过程中,需要进行模型参数更新以优化更新结构参数和模型参数,基于最终更新的结构参数即可确定最终的网络结构,即确定保留哪个或哪些连接操作。由于该网络的结构是经过网络搜索之后才确定的,各个参与方不需要像设计传统纵向联邦学习的模型一样去设置模型的网络结构,从而降低了设计模型的难度。The machine learning models deployed by the participants can be ordinary machine learning models, such as linear regression models, neural network models, etc., or models used in automatic machine learning, such as search networks. The search network refers to the network used for model parameter update (NAS); the search network includes multiple units, each unit corresponds to a network layer, and some units are provided with connection operations. Taking two units as an example, The connection operation before these two units can be preset N connection operations, and the corresponding weight of each connection operation is defined. The weight is the structural parameter of the search network, and the network layer parameters in the unit are the model parameters of the search network. ; In the process of model training, model parameters need to be updated to optimize and update structural parameters and model parameters, and the final network structure can be determined based on the final updated structural parameters, that is, which connection operation or operations to retain. Since the structure of the network is determined after a network search, each participant does not need to set the network structure of the model like designing a traditional vertical federated learning model, thus reducing the difficulty of designing the model.
在本实施例中,第一设备可以是拥有标签数据的数据应用参与方,对应地第二设备是没有标签数据的数据提供参与方,此时第二设备可以有多个;或者,第一设备也可以是没有标签数据的数据提供参与方,对应地第二设备是拥有标签数据的数据应用参与方。未示区分,以下将第一设备中的模型称为第一模型,将第二设备中的模型称为第二模型。In this embodiment, the first device may be a data application participant with tag data, and correspondingly, the second device may be a data provider without tag data. In this case, there may be multiple second devices; or, the first device It may also be a data providing participant without tag data, and correspondingly, the second device is a data application participant with tag data. No distinction is shown, hereinafter, the model in the first device is referred to as the first model, and the model in the second device is referred to as the second model.
各参与方中模型的参数预先进行初始化设置,各个参与方进行多轮联合参数更新,以不断更新各自模型中的参数,提高整个模型的性能,如预测准确率。当各个参与方的模型是普通的机器学习模型时,各轮联合参数更新过程中所更新的参数为模型参数,例如神经网络中的权重参数。当各个参与方的模型是搜索网络时,各轮联合参数更新时所更新的参数可以是结构参数和/或模型参数。具体地,在本实施例中并不限制对结构参数和模型参数的更新顺序。例如,可以在前几轮联合参数更新时对结构参数进行更新,后几轮联合参数更新时对模型参数进行更新。又如,可以每一轮联合参数更新时都是对结构参数和模型参数一起更新。The parameters of the model in each participant are initialized and set in advance, and each participant performs multiple rounds of joint parameter update to continuously update the parameters in their respective models and improve the performance of the entire model, such as the prediction accuracy. When the model of each participant is an ordinary machine learning model, the parameters updated in each round of joint parameter update process are model parameters, such as weight parameters in a neural network. When the model of each participant is a search network, the parameters updated in each round of joint parameter update may be structural parameters and/or model parameters. Specifically, in this embodiment, the update sequence of the structural parameters and the model parameters is not limited. For example, the structural parameters can be updated in the first few rounds of joint parameter updating, and the model parameters can be updated in the subsequent rounds of joint parameter updating. For another example, structural parameters and model parameters may be updated together in each round of joint parameter update.
在一轮联合更新参数的过程中,各参与方先交互用于更新各自模型中参数的中间结果(以下也称为纵向联邦中间结果);各个参与方分别根据接收到的中间结果在本地进行多轮本地迭代,本地迭代后,再进行下一轮联合参数更新。也即,在一轮联合参数更新过程中,参与方只接收到其他参与方发送的一次中间结果,后续的多轮本地迭代都采用该中间结果来参与计算。其中,中间结果可以是梯度,也可以是模型的输出。具体地,当参与方是数据提供参与方时,发送给对方的中间结果可以是该参与方中模型的输出;当参与方是数据应用参与方时,发送给对方的中间结果可以是计算得到的数据提供方所发送模型输出对应的梯度。由于传递的是中间结果而不是数据集中的原始数据,使得各个参与方互相之间并没有泄露各自的数据隐私,保护了各个参与方的数据安全。如图3所示,为一实施例中参与方进行联合参数更新的示意图,其中,Party K是数据应用参与方,Party 1~Party K-1是数据提供参与方,Net K是数据应用参与方中部署的模型,Net j是数据提供参与方中部署的模型,Net c是数据应用参与方中部署的用于基于各方模型输出计算预测结果(Y out)的模型,N j是模型的输出,
Figure PCTCN2021094936-appb-000001
是模型输出对应的梯度值。
In the process of jointly updating parameters in a round, each participant firstly interacts with the intermediate results used to update the parameters in their respective models (hereinafter also referred to as the intermediate results of vertical federation); A round of local iteration is performed, and after the local iteration, the next round of joint parameter update is performed. That is, during a round of joint parameter update process, a participant only receives an intermediate result sent by other participants, and subsequent rounds of local iterations use the intermediate result to participate in the calculation. Among them, the intermediate result can be the gradient or the output of the model. Specifically, when a participant is a data provider, the intermediate result sent to the other party can be the output of the model in the participant; when the participant is a data application participant, the intermediate result sent to the other party can be calculated. The gradient corresponding to the model output sent by the data provider. Since the intermediate results are transmitted instead of the original data in the data set, each participant does not disclose their data privacy to each other, and the data security of each participant is protected. As shown in FIG. 3 , it is a schematic diagram of updating joint parameters by participants in an embodiment, wherein Party K is a data application participant, Party 1 to Party K-1 are data providing participants, and Net K is a data application participant Net j is the model deployed in the data provider participant, Net c is the model deployed in the data application participant to calculate the prediction result (Y out ) based on the model output of each party, N j is the output of the model ,
Figure PCTCN2021094936-appb-000001
is the gradient value corresponding to the model output.
第一设备在进行一轮本地迭代时,可计算近端优化损失。近端优化损失能够表征第一模型的参数在本轮本地迭代中的参数值与预设历史轮次的本地迭代中参数值之间的变化量。通过最小化近端优化损失,就可以约束本轮本地迭代中第一模型的参数值相比于前面历史参数值的变化幅度,也即,使得第一模型的参数值在本轮本地迭代时变化较小,从而避免在进行较多轮次的本地迭代后参数值失真。在本实施例中,对近端优化损失的计算方式并不做限制,对近端优化损失的最小化方法也不做限制。例如,在一实施方式中,可将近端优化损失作为损失函数,通过最小化损失函数的方法最小化近端优化损失,例如通过梯度下降算法计算近端优化损失相对于第一模型中参数的梯度值,根据梯度值优化参数,进而达到最小化近端优化损失的目的。在其他实施方式中,也可以采用其他的方法来最小化近端优化损失,例如,通过随机改变第一模型中参数的参数值,计算近端损失函数是否变小,随机试验获得最小化近端优化损失的参数值。The first device may calculate the proximal optimization loss while performing one round of local iterations. The near-end optimization loss can represent the amount of change between the parameter values of the first model in the current round of local iterations and the parameter values in the preset historical rounds of local iterations. By minimizing the near-end optimization loss, it is possible to constrain the variation range of the parameter values of the first model in the current round of local iterations compared to the previous historical parameter values, that is, to make the parameter values of the first model change during the current round of local iterations smaller to avoid distortion of parameter values after more rounds of local iterations. In this embodiment, the calculation method of the near-end optimization loss is not limited, and the method for minimizing the near-end optimization loss is also not limited. For example, in one embodiment, the near-end optimization loss can be used as a loss function, and the near-end optimization loss can be minimized by a method of minimizing the loss function, such as calculating the difference between the near-end optimization loss and the parameters in the first model by a gradient descent algorithm. Gradient value, optimize parameters according to the gradient value, and then achieve the purpose of minimizing the loss of proximal optimization. In other embodiments, other methods can also be used to minimize the proximal optimization loss. For example, by randomly changing the parameter values of the parameters in the first model, it is calculated whether the proximal loss function becomes smaller, and a random experiment is performed to obtain the minimized proximal loss. Parameter value for optimization loss.
其中,预设历史轮次可以是在第一设备中预先设置的一个轮次,该轮次早于本轮本地迭代的轮次,如本轮本地迭代是第t轮,则预设历史轮次小于t。 在一轮联合参数更新中,预设历史轮次可以是固定的,也即,该轮联合参数更新中的各轮本地迭代,都以同一历史轮次的本地迭代时的参数值为基础计算近端优化损失,例如预设历史轮次固定设置为1,以保证后面各轮本地迭代时参数值相比于第一轮本地迭代时的参数值变化量较小。或者,在一轮联合参数更新中,预设历史轮次也可以不是固定的,也即,针对该轮联合参数更新中的各轮本地迭代,可设置不同的预设历史轮次;当针对各轮本地迭代单独设置历史预设轮次时,若以该轮本地迭代的上一轮本地迭代为基础计算近端优化损失,则依据该近端优化损失计算出的参数的梯度值可能为0,也即近端优化损失没有起到约束作用,因此,在一优选实施方式中,对于全部轮次或部分轮次的本地迭代,该轮本地迭代的轮次减去其对应的预设历史轮次应当大于1。需要说明的是,在一些实施方式中,第一设备可以不用每一轮本地迭代都计算近端优化损失,例如,第一轮本地迭代时没有历史轮次的本地迭代,故无需计算近端优化损失。The preset historical round may be a round preset in the first device, and the round is earlier than the round of the local iteration of the current round. If the local iteration of the current round is the t-th round, the predetermined historical round less than t. In one round of joint parameter update, the preset historical round may be fixed, that is, each round of local iteration in this round of joint parameter update is calculated based on the parameter values of the same historical round of local iteration. The end optimization loss, for example, the preset historical round is fixed to 1 to ensure that the parameter values in the subsequent rounds of local iterations are smaller than the parameter values in the first round of local iterations. Alternatively, in a round of joint parameter update, the preset historical round may not be fixed, that is, for each round of local iterations in this round of joint parameter update, different preset historical rounds can be set; When the historical preset round is set separately for the round of local iteration, if the near-end optimization loss is calculated based on the previous local iteration of this round of local iteration, the gradient value of the parameter calculated according to the near-end optimization loss may be 0. That is to say, the near-end optimization loss does not play a role as a constraint. Therefore, in a preferred embodiment, for all rounds or partial rounds of local iterations, the rounds of this round of local iterations are subtracted from the corresponding preset historical rounds. Should be greater than 1. It should be noted that, in some embodiments, the first device may not need to calculate the near-end optimization loss for each round of local iterations. For example, there is no local iteration of historical rounds in the first round of local iteration, so there is no need to calculate the near-end optimization. loss.
进一步地,在一实施方式中,所述步骤S10包括:Further, in one embodiment, the step S10 includes:
步骤S101,将所述第一设备中第一模型的参数在本轮本地迭代中的参数向量与在预设历史轮次的本地迭代中的参数向量进行对应元素相减,得到差向量;Step S101, the parameter vector of the parameter of the first model in the first device in the current round of local iteration and the parameter vector in the local iteration of the preset historical round are subtracted by corresponding elements to obtain a difference vector;
步骤S102,计算所述差向量中各元素的平方和,基于所述平方和得到所述近端优化损失。Step S102: Calculate the sum of squares of each element in the difference vector, and obtain the near-end optimization loss based on the sum of squares.
第一模型的参数有多个,可采用向量的形式来表示。第一设备可将第一模型的参数在本轮本地迭代中的参数向量与在预设历史轮次的本地迭代中的参数向量进行对应元素相减,得到由各个差值构成的差向量,再计算差向量中各个元素的平方和。第一设备可将平方和直接作为近端优化损失,也可以计算平方和的平方根后作为近端优化损失。需要说明的是,在计算近端优化损失的过程中,第一设备是将本轮本地迭代中的参数向量的各个元素作为未知变量来参与计算。There are multiple parameters of the first model, which can be represented in the form of vectors. The first device can perform the corresponding element subtraction of the parameter vector of the parameters of the first model in the local iteration of the current round and the parameter vector of the local iteration of the preset historical round to obtain a difference vector formed by each difference value, and then Computes the sum of squares of the elements in the difference vector. The first device may directly use the sum of squares as the proximal optimization loss, or may calculate the square root of the sum of squares as the proximal optimization loss. It should be noted that, in the process of calculating the near-end optimization loss, the first device participates in the calculation by using each element of the parameter vector in the current round of local iterations as unknown variables.
在其他实施方式中,第一设备也可以采用其他能够计算向量之间变化量的计算方法来计算近端优化损失。In other implementation manners, the first device may also use other calculation methods capable of calculating the variation between vectors to calculate the near-end optimization loss.
步骤S20,基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;Step S20, calculating the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the first model in the current round of local iteration, and the longitudinal federation intermediate result received from the second device;
第一设备将数据集中的训练数据输入到第一模型中,经过第一模型处理得到模型输出,根据近端优化损失、该模型输出和从第二设备接收到的纵向联邦中间结果,计算得到第一模型中各个参数对应的梯度值。其中,从第二设备接收到的纵向联邦中间结果即在本轮联合参数更新时第二设备发送的中间结果。具体地,当第一设备是拥有标签数据的数据应用参与方时,纵向联邦中间结果是第二设备发送的第二模型的输出,第一设备可根据纵向联邦中间结果和本轮本地迭代中的模型输出计算预测损失;在一实施方式中,第一设备可将近端优化损失和预测损失相加,得到一个总损失,再计算总损失相 对于第一模型中参数的梯度值;在另一实施方式中,第一设备分别计算近端优化损失和预测损失相对于第一模型中参数的梯度值,将两个梯度值将加,得到最终的梯度值。当第一设备是没有标签数据的数据提供参与方,第二设备是拥有标签数据的数据应用参与方时,纵向联邦中间结果是第二设备计算的预测损失相对于第一模型的输出的梯度值;第一设备可以根据纵向联邦中间结果和本轮本地迭代中的模型输出计算出预测损失相对于第一模型中参数的梯度值,再计算近端优化损失相对于第一模型中参数的梯度值,将两个梯度值相加得到最终的梯度值。需要说明的是,在本申请各实施例中,根据损失计算梯度值的方法可参照现有的梯度计算方法,不作详细赘述。The first device inputs the training data in the data set into the first model, and the model output is obtained after processing by the first model. The gradient value corresponding to each parameter in a model. The vertical federation intermediate result received from the second device is the intermediate result sent by the second device during the current round of joint parameter update. Specifically, when the first device is a data application participant with label data, the vertical federation intermediate result is the output of the second model sent by the second device, and the first device can obtain the data according to the vertical federation intermediate result and the current local iteration. The model output calculates the predicted loss; in one embodiment, the first device can add the near-end optimization loss and the predicted loss to obtain a total loss, and then calculate the gradient value of the total loss relative to the parameters in the first model; in another In an embodiment, the first device separately calculates the gradient values of the proximal optimization loss and the prediction loss relative to the parameters in the first model, and adds the two gradient values to obtain the final gradient value. When the first device is a data provider without labeled data and the second device is a data application participant with labeled data, the vertical federated intermediate result is the gradient value of the predicted loss calculated by the second device relative to the output of the first model ; The first device can calculate the gradient value of the predicted loss relative to the parameters in the first model according to the intermediate results of the longitudinal federation and the model output in the current round of local iterations, and then calculate the gradient value of the proximal optimization loss relative to the parameters in the first model. , add the two gradient values to get the final gradient value. It should be noted that, in each embodiment of the present application, the method for calculating the gradient value according to the loss may refer to the existing gradient calculation method, and details are not repeated.
步骤S30,采用所述梯度值更新所述参数,以完成本轮本地迭代。Step S30, using the gradient value to update the parameter to complete the current round of local iteration.
第一设备在计算得到第一模型中各个参数的梯度值后,采用各个梯度值对各个参数进行更新。也即,每一个参数对应一个梯度值,第一设备采用该参数对应的梯度值来更新该参数。具体地,第一设备可将该参数在上一轮本地迭代更新后的参数值加上该参数对应的梯度值乘以学习率,得到该参数在本轮本地迭代更新后的参数值。对各个参数进行更新后,即完成了本轮本地迭代。通过在计算参数的梯度值时增加近端优化损失,再根据梯度值来更新参数,能够使得参数朝着最小化近端优化损失的方向变化,从而约束参数的变化量,避免参数值变化过大而失真。After calculating and obtaining the gradient value of each parameter in the first model, the first device uses each gradient value to update each parameter. That is, each parameter corresponds to a gradient value, and the first device uses the gradient value corresponding to the parameter to update the parameter. Specifically, the first device may add the parameter value of the parameter after the local iterative update in the previous round and the gradient value corresponding to the parameter multiplied by the learning rate to obtain the parameter value of the parameter after the local iterative update of the current round. After updating each parameter, the current round of local iteration is completed. By increasing the near-end optimization loss when calculating the gradient value of the parameter, and then updating the parameter according to the gradient value, the parameters can be changed in the direction of minimizing the near-end optimization loss, thereby constraining the amount of parameter change and avoiding excessive parameter value changes and distorted.
进一步地,第一设备在完成本轮本地迭代后,若检测到达到了本轮联合参数更新的本地迭代轮数,则可以进行下一轮联合参数更新;若检测到还未达到本轮联合参数更新的本地迭代轮数,则可以进行下一轮本地迭代。在一实施方式中,可以设置一个联合更新参数的最大轮数,当达到该轮数时,第一设备停止对模型参数的更新。或,在另一实施方式中,第一设备可在一轮联合参数更新结束后,或者在一轮本地迭代结束后,检测预测损失是否收敛,若收敛,则停止对参数的更新。在停止对参数更新后,第一设备将当前参数值作为第一模型最终的参数值,确定第一模型的参数值后,即可采用第一模型完成预测任务。Further, after completing the current round of local iteration, if the first device detects that the number of local iteration rounds of the current round of joint parameter update has been reached, it can perform the next round of joint parameter update; if it is detected that the current round of joint parameter update has not been reached. the number of local iteration rounds, the next round of local iteration can be performed. In one embodiment, a maximum number of rounds for jointly updating parameters can be set, and when the number of rounds is reached, the first device stops updating the model parameters. Or, in another embodiment, the first device may detect whether the prediction loss has converged after one round of joint parameter updating or after one round of local iteration has ended, and if so, stop updating the parameters. After the parameter update is stopped, the first device takes the current parameter value as the final parameter value of the first model, and after determining the parameter value of the first model, the first model can be used to complete the prediction task.
如图4所示为一实施方式中参与纵向联邦学习的第一设备和第二设备的硬件架构图,第一设备和第二设备交互中间结果,基于对方发送的中间结果,各自在本地进行多轮本地迭代,在各轮本地迭代时,增加计算近端优化损失来约束参数的变化量,避免参数值变化过大而失真。Figure 4 shows the hardware architecture diagram of the first device and the second device participating in the vertical federated learning in one embodiment. The first device and the second device interact with intermediate results. Based on the intermediate results sent by the other party, each of them performs multiple local Rounds of local iterations, in each round of local iterations, the calculation of the near-end optimization loss is added to constrain the variation of the parameters, so as to avoid the parameter value changing too much and causing distortion.
相比于现有方案,在本实施例中,参与纵向联邦学习的第一设备在进行本地迭代时,增加计算能够表征第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量的近端优化损失,并基于近端优化损失、第一模型在本轮本地迭代中的模型输出以及从第二设备接收到的纵向联邦中间结果计算第一模型中参数对应的梯度值,根据梯度值来更新参数,也即增加近端优化损失来约束第一模型的参数在本地迭代中的变化量,从而避免本地迭代时参数值变化过大导致失真,实现了在通过增加本地迭代次数减少通信成本的同时,还能够保证模型的预测准确率。Compared with the existing solution, in this embodiment, when the first device participating in the vertical federated learning performs local iteration, it increases the calculation of the parameter values that can characterize the parameters of the first model in the first device in this round of local iterations. A near-end optimization loss compared to the amount of change in parameter values in a preset historical round of local iterations, and based on the near-end optimization loss, the model output of the first model in this round of local iterations, and the data received from the second device. The longitudinal federated intermediate result calculates the gradient values corresponding to the parameters in the first model, and updates the parameters according to the gradient values, that is, increases the proximal optimization loss to constrain the changes of the parameters of the first model in the local iteration, thereby avoiding the local iteration parameters. Excessive changes in the value lead to distortion, which can reduce the communication cost by increasing the number of local iterations, and at the same time ensure the prediction accuracy of the model.
进一步地,基于上述第一实施例,提出本申请模型参数更新方法第二实施例,在本实施例中,当所述第一设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中模型的输出,所述步骤S20包括:Further, based on the above-mentioned first embodiment, a second embodiment of the method for updating model parameters of the present application is proposed. In this embodiment, when the first device is a participant with label data, the intermediate result of the vertical federation is: The output of the model in the second device, the step S20 includes:
步骤S201,将所述第一设备的训练数据输入所述第一设备中的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;Step S201, input the training data of the first device into the first model in the first device for processing, and obtain the model output of the first model in the current round of local iteration;
在本实施例中,第一设备是拥有标签数据的数据应用参与方,第二设备是没有标签数据的数据提供参与方时,纵向联邦学习中间结果是在本轮联合参数更新时第二设备中将其训练数据输入到第二模型进行处理得到的输出。在本轮联合参数更新的一轮本地迭代中,第一设备可将其训练数据输入到第一模型进行处理,得到第一模型在本轮本地迭代中的模型输出。In this embodiment, when the first device is a data application participant with label data, and the second device is a data provider without label data, the intermediate result of vertical federated learning is that the second device is in the second device during this round of joint parameter update. The output obtained by inputting its training data into the second model for processing. In one round of local iteration of this round of joint parameter update, the first device may input its training data into the first model for processing, and obtain the model output of the first model in this round of local iteration.
步骤S202,根据所述模型输出和所述纵向联邦中间结果计算得到预测结果,并基于所述预测结果和所述训练数据对应的标签数据计算得到预测损失;Step S202, calculating a prediction result according to the model output and the vertical federation intermediate result, and calculating a prediction loss based on the prediction result and the label data corresponding to the training data;
第一设备根据模型输出和纵向联邦中间结果计算得到预测结果,并基于预测结果和训练数据对应的标签数据计算得到预测损失。具体地,根据机器学习模型的类型不同,预测结果的计算方式不同;例如,当纵向联邦学习的机器学习模型是线性回归模型时,第一设备将模型输出和纵向联邦中间结果相加得到预测结果;又如,当纵向联邦学习的机器学习模型是神经网络模型时,第一设备中第一模型包括如图3所示的Net K和Net c两部分,第一设备将训练数据输入到第一模型的Net K部分进行处理,得到模型输出N K,再将N K和纵向联邦中间结果N j输入到Net c部分进行处理,得到预测结果Y outThe first device calculates and obtains the prediction result according to the model output and the longitudinal federation intermediate result, and calculates the prediction loss based on the prediction result and the label data corresponding to the training data. Specifically, according to the type of machine learning model, the calculation method of the prediction result is different; for example, when the machine learning model of the longitudinal federation learning is a linear regression model, the first device adds the model output and the intermediate result of the longitudinal federation to obtain the prediction result For another example, when the machine learning model of longitudinal federated learning is a neural network model, the first model in the first device includes two parts, Net K and Net c as shown in Figure 3, and the first device inputs the training data into the first device. The Net K part of the model is processed to obtain the model output N K , and then N K and the vertical federated intermediate result N j are input to the Net c part for processing to obtain the prediction result Y out .
第一设备根据预测结果和训练数据对应的标签数据计算得到预测损失。其中,预测损失可采用常用的损失函数计算方法计算得到,例如交叉熵损失函数,根据所训练的机器学习模型不同,可采用不同的损失函数。The first device calculates and obtains the prediction loss according to the prediction result and the label data corresponding to the training data. Among them, the prediction loss can be calculated by using a common loss function calculation method, such as a cross entropy loss function, and different loss functions can be used according to different machine learning models being trained.
步骤S203,将所述预测损失和所述近端优化损失相加得到总损失,基于所述总损失计算得到所述参数对应的梯度值。Step S203, adding the prediction loss and the near-end optimization loss to obtain a total loss, and calculating a gradient value corresponding to the parameter based on the total loss.
第一设备将预测损失和近端优化损失相加得到总损失。具体地,第一设备可以将两个损失直接相加,也可以将两个损失加权求和,两个损失的权重可以根据需要进行设置。在一实施方式中,第一设备可以将预测损失加上近端优化损失与一调节系数的乘积得到总损失,其中,调节系数可以预先设置,并在各轮本地迭代时进行灵活调整,例如,在一轮联合参数更新中,调节系数可先初始化为0.1,再随着本地迭代的轮次增大而增大,以实现在本地迭代轮次越大时,对参数的变化量约束力度越大。第一设备基于总损失计算得到参数对应的梯度值,具体计算过程在此不做详细赘述。The first device adds the predicted loss and the proximal optimization loss to obtain a total loss. Specifically, the first device may directly add the two losses, or may weight and sum the two losses, and the weights of the two losses may be set as required. In one embodiment, the first device may add the predicted loss to the product of the near-end optimization loss and an adjustment coefficient to obtain the total loss, wherein the adjustment coefficient may be preset and adjusted flexibly in each round of local iterations, for example, In a round of joint parameter update, the adjustment coefficient can be initialized to 0.1 first, and then increase with the increase of the local iteration round, so as to realize that the larger the local iteration round, the greater the constraint on the parameter change. . The first device calculates the gradient value corresponding to the parameter based on the total loss, and the specific calculation process is not described in detail here.
进一步地,在其他实施方式中,第一设备在计算得到预测损失和近端优化损失后,可计算预测损失相对于第一模型的参数的梯度值,再计算近端优化损失相对于第一模型的参数的梯度值,再将两个梯度值相加,或者加权求和,得到参数对应的梯度值。Further, in other embodiments, after calculating the predicted loss and the proximal optimization loss, the first device may calculate the gradient value of the predicted loss relative to the parameters of the first model, and then calculate the proximal optimization loss relative to the first model. The gradient value of the parameter, and then add the two gradient values, or weighted sum, to obtain the gradient value corresponding to the parameter.
如图5所示,为一实施方式中第一设备与第二设备联合进行多轮联合参数更新的交互流程示意图。As shown in FIG. 5 , it is a schematic diagram of an interaction flow of a first device and a second device jointly performing multiple rounds of joint parameter update in an embodiment.
在本实施例中,第一设备通过预测损失和近端优化损失来计算第一模型的参数对应的梯度值,再根据梯度值更新参数,实现了往最小化预测损失和近端优化损失的方向更新参数,不仅提高了模型的预测准确率,还约束了参数的变化量,避免参数变化过大而失真,从而在保证通过增加本地迭代次数来减少通信成本的同时,还能够保证模型的预测准确率。In this embodiment, the first device calculates the gradient value corresponding to the parameters of the first model by using the prediction loss and the near-end optimization loss, and then updates the parameters according to the gradient value, so as to realize the direction of minimizing the prediction loss and the near-end optimization loss Updating the parameters not only improves the prediction accuracy of the model, but also constrains the variation of the parameters to avoid distortion due to excessive parameter changes, so as to ensure that the communication cost is reduced by increasing the number of local iterations, and the prediction accuracy of the model can also be guaranteed. Rate.
进一步地,基于上述第一和/或第二实施例,提出本申请模型参数更新方法第三实施例,在本实施例中,当所述第二设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中预测损失相对于所述第一设备在本轮联合参数更新时发送的第一模型的输出的梯度值,所述步骤S20包括:Further, based on the above-mentioned first and/or second embodiments, a third embodiment of the method for updating model parameters of the present application is proposed. In this embodiment, when the second device is a participant with tag data, the The longitudinal federated intermediate result is the gradient value of the predicted loss in the second device relative to the output of the first model sent by the first device during the current round of joint parameter update. The step S20 includes:
步骤S204,将所述第一设备的训练数据输入所述第一设备的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;Step S204, input the training data of the first device into the first model of the first device for processing, and obtain the model output of the first model in the current round of local iteration;
在本实施例中,当第一设备是没有标签数据的数据提供参与方,第二设备是拥有标签数据的数据应用参与方时,在本轮联合参数更新时第一设备将训练数据输入到第一模型进行处理得到输出,将该输出作为中间结果发送给第二设备,第二设备计算出预测损失相对于该输出对应的梯度值,作为中间结果发送给第一设备,该中间结果即纵向联邦中间结果。在本轮联合参数更新的一轮本地迭代中,第一设备可将其训练数据输入到第一模型进行处理,得到第一模型在本轮本地迭代中的模型输出。在一实施例中,如图3所示,第一设备将训练数据输入Net j进行处理,得到模型输出N jIn this embodiment, when the first device is a data provider without label data, and the second device is a data application participant with label data, the first device inputs the training data to the first device during this round of joint parameter update. A model is processed to obtain an output, and the output is sent to the second device as an intermediate result. The second device calculates the gradient value of the predicted loss relative to the output, and sends it to the first device as an intermediate result. The intermediate result is the vertical federation Intermediate results. In one round of local iteration of this round of joint parameter update, the first device may input its training data into the first model for processing, and obtain the model output of the first model in this round of local iteration. In one embodiment, as shown in FIG. 3 , the first device inputs the training data into Net j for processing, and obtains the model output N j .
步骤S205,根据所述模型输出和所述纵向联邦中间结果计算得到所述预测损失相对于所述参数的第一子梯度值;Step S205, calculating and obtaining the first sub-gradient value of the predicted loss relative to the parameter according to the model output and the longitudinal federation intermediate result;
第一设备根据模型输出和纵向联邦中间结果计算得到预测损失相对于第一模型中参数的梯度值(未示区别以下称为第一子梯度值)。第一设备根据纵向联邦中间结果和模型输出,按照反向传播方法计算得到第一模型中各个参数对应的第一子梯度值。具体可按照如下公式计算出第一子梯度值:The first device calculates the gradient value of the predicted loss relative to the parameters in the first model according to the model output and the longitudinal federated intermediate result (the difference is not shown and is hereinafter referred to as the first sub-gradient value). The first device calculates the first sub-gradient value corresponding to each parameter in the first model according to the back-propagation method according to the longitudinal federation intermediate result and the model output. Specifically, the first sub-gradient value can be calculated according to the following formula:
Figure PCTCN2021094936-appb-000002
Figure PCTCN2021094936-appb-000002
其中,w是第一模型中的参数,N j是本轮联合参数更新时发送给第二设备的中间结果(也即在本地迭代之前第一模型的模型输出),G(N j)是第二设备返回预测损失相对于N j的梯度值,也即纵向联邦中间结果,N b是本轮本地迭代时第一模型的模型输出。 Among them, w is the parameter in the first model, N j is the intermediate result sent to the second device during the current round of joint parameter update (that is, the model output of the first model before the local iteration), and G(N j ) is the first The second device returns the gradient value of the predicted loss relative to Nj , that is, the vertical federated intermediate result, and Nb is the model output of the first model in this round of local iterations.
步骤S206,计算所述近端优化损失相对于所述参数的第二子梯度值,将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值。Step S206: Calculate a second sub-gradient value of the near-end optimization loss relative to the parameter, and add the first sub-gradient value and the second sub-gradient value to obtain a gradient value corresponding to the parameter.
第一设备在计算得到近端优化损失后,计算近端优化损失相对于第一模型中参数的梯度值(未示区别以下称为第二子梯度值)。第一设备将参数的第一子梯度值和第二子梯度值相加,即可得到参数对应的梯度值。具体地,当有多个参数时,每个参数分别有对应的第一子梯度值和第二子梯度值,将每个参数各自的第一子梯度值和第二子梯度值相加,得到各个参数分别对应的梯度值。After calculating the proximal optimization loss, the first device calculates the gradient value of the proximal optimization loss relative to the parameters in the first model (the difference is not shown and is hereinafter referred to as the second sub-gradient value). The first device adds the first sub-gradient value and the second sub-gradient value of the parameter to obtain the gradient value corresponding to the parameter. Specifically, when there are multiple parameters, each parameter has a corresponding first sub-gradient value and a second sub-gradient value, and the respective first sub-gradient value and second sub-gradient value of each parameter are added to obtain The corresponding gradient values for each parameter.
在本实施例中,第一设备通过预测损失和近端优化损失来计算第一模型的参数对应的梯度值,再根据梯度值更新参数,实现了往最小化预测损失和近端优化损失的方向更新参数,不仅提高了模型的预测准确率,还约束了参数的变化量,避免参数变化过大而失真,从而在保证通过增加本地迭代次数来减少通信成本的同时,还能够保证模型的预测准确率。In this embodiment, the first device calculates the gradient value corresponding to the parameters of the first model by using the prediction loss and the near-end optimization loss, and then updates the parameters according to the gradient value, so as to realize the direction of minimizing the prediction loss and the near-end optimization loss Updating the parameters not only improves the prediction accuracy of the model, but also constrains the variation of the parameters to avoid distortion due to excessive parameter changes, so as to ensure that the communication cost is reduced by increasing the number of local iterations, and the prediction accuracy of the model can also be guaranteed. Rate.
进一步地,所述步骤S206中将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值的步骤包括:Further, the step of adding the first sub-gradient value and the second sub-gradient value to obtain the gradient value corresponding to the parameter in step S206 includes:
步骤S2061,将所述第二子梯度值乘以预设调节系数后加上所述第一子梯度值得到所述参数对应的梯度值。Step S2061: Multiply the second sub-gradient value by a preset adjustment coefficient and then add the first sub-gradient value to obtain a gradient value corresponding to the parameter.
在一实施方式中,第一设备中可以设置一个调节系数来调节各轮本地迭代时对参数变化量的约束力度。具体地,第一设备可以将第二子梯度值乘以该调节系数后再加上第一子梯度值,得到参数对应的梯度值。第一设备可以根据本地迭代的轮次来调整调节系数,例如,在一轮联合参数更新中,调节系数可先初始化为0.1,再随着本地迭代的轮次增大而增大,以实现在本地迭代轮次越大时,对参数的变化量约束力度越大。In one embodiment, an adjustment coefficient may be set in the first device to adjust the degree of constraint on the parameter variation during each round of local iteration. Specifically, the first device may multiply the second sub-gradient value by the adjustment coefficient and then add the first sub-gradient value to obtain the gradient value corresponding to the parameter. The first device can adjust the adjustment coefficient according to the round of local iteration. For example, in one round of joint parameter update, the adjustment coefficient can be initialized to 0.1 first, and then increase with the increase of the round of local iteration, so as to realize the The larger the local iteration round, the stronger the constraint on the variation of the parameters.
进一步地,基于上述第一、第二和/或第三实施例,提出本申请用户风险预测方法第四实施例,在本实施例中,所述方法应用于参与纵向联邦学习的第一设备,第一设备与参与纵向联邦学习的第二设备通信连接,第一设备和第二设备可以是智能手机、个人计算机和服务器等设备。所述用户风险预测方法包括以下步骤:Further, based on the above-mentioned first, second and/or third embodiments, a fourth embodiment of the user risk prediction method of the present application is proposed. In this embodiment, the method is applied to the first device participating in vertical federated learning, The first device is connected in communication with the second device participating in the longitudinal federated learning, and the first device and the second device may be devices such as a smart phone, a personal computer, and a server. The user risk prediction method includes the following steps:
步骤A10,基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;Step A10, based on the near-end optimization loss and the second device jointly perform vertical federated learning to obtain the local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end model to be trained in the current local iteration The amount of change in the value compared to the parameter value in the local iteration of the preset historical round;
第一设备可以是数据应用参与方,也可以是数据提供参与方。第一设备中部署有基于各个用户在第一数据特征下的数据构建的第一数据集和第一模型(以下也称本端待训练模型),第二设备中部署有基于各个用户在第二数据特征下的数据构建的第二数据集和第二模型(以下也称他端待训练模型),两个数据集的用户维度相同;第一数据特征和第二数据特征是与预测用户风险相关的数据特征,且第一数据特征与第二数据特征不相同;第一模型和第二模型是一个完成机器学习模型的两个部分,具体可以根据需要选取常用的机器学习模型来实现,例如线性回归模型或神经网络模型,模型的预测结果设置为能够表征用户风险程度的数据形式,例如风险值;第一设备和第二设备联合采用第一数据集和第二数据集来训练第一模型和第二模型,训练完成后,可采用两个模型联合预测用户的风险。其中,风险可以是用户贷款前的信用风险,用户贷款中的拖欠还款风险等。例如,在一实施方式中,第一设备是部署于银行的设备,第一数据特征是银行业务相关的特征,例如用户历史贷款次数、用户历史违约次数等;第二设备是部署于电商的设备,第二数据特征是电商业务相关的特征,例如用户的历史购买次数、金额等,第一设 备和第二设备采用各自的数据集进行纵向联邦学习,训练用于预测贷款前信用风险的模型。The first device may be a data application participant or a data providing participant. The first device is deployed with a first data set and a first model (hereinafter also referred to as the model to be trained at the local end) based on the data of each user under the first data feature, and the second device is deployed in the second device based on the data of each user. The second data set and the second model (hereinafter also referred to as the model to be trained on the other end) constructed from the data under the data feature, the user dimensions of the two data sets are the same; the first data feature and the second data feature are related to predicting user risks The first data feature and the second data feature are different; the first model and the second model are two parts of a machine learning model. Specifically, a commonly used machine learning model can be selected according to the needs to realize, such as linear A regression model or a neural network model, the prediction result of the model is set in a data form that can characterize the user's risk degree, such as a risk value; the first device and the second device jointly use the first data set and the second data set to train the first model and the second data set. The second model, after the training is completed, can use the two models to jointly predict the user's risk. Among them, the risk may be the credit risk before the user's loan, the default repayment risk in the user's loan, and the like. For example, in one embodiment, the first device is a device deployed in a bank, and the first data feature is a feature related to banking services, such as the number of historical loans of the user, the number of historical defaults of the user, etc.; the second device is deployed in an e-commerce business Device, the second data feature is the feature related to e-commerce business, such as the user's historical purchase times, amount, etc., the first device and the second device use their own data sets to perform vertical federated learning, and train the pre-loan credit risk prediction system. Model.
具体地,第一设备基于近端优化损失与第二设备联合进行纵向联邦学习得到本端风险预测模型。具体地,第一设备可以按照上述第一、第二或第三实施例中的模型参数更新方法进行各轮联合参数更新中的各轮本地迭代,以更新第一模型中的参数,在此不进行详细赘述。在进行多轮联合参数更新后,第一设备将最终更新参数后的第一模型作为本端风险预测模型。Specifically, the first device and the second device jointly perform longitudinal federated learning based on the near-end optimization loss to obtain a local-end risk prediction model. Specifically, the first device may perform each round of local iteration in each round of joint parameter updating according to the model parameter updating method in the above-mentioned first, second or third embodiment, so as to update the parameters in the first model. Describe in detail. After performing multiple rounds of joint parameter update, the first device uses the first model after the parameters are finally updated as the local-end risk prediction model.
步骤A20,采用所述本端风险预测模型预测得到待预测用户的风险值。Step A20, using the local end risk prediction model to predict and obtain the risk value of the user to be predicted.
第一设备在得到本端风险预测模型后,可以采用本端风险预测模型预测待预测用户的风险值。具体地,第二设备也按照上述实施例中的模型参数更新方法进行各轮联合参数更新中的各轮本地迭代,以更新第二模型中的参数,在进行多轮联合参数更新后,第二设备将最终更新参数后的第二模型作为他端风险预测模型(其中,他端是指第二设备);第一设备可以采用本端风险预测模型,联合第二设备中的他端风险预测模型进行预测得到待预测用户的风险值。其中,风险值可以是一个表示用户的风险程度大小的值。After obtaining the local-end risk prediction model, the first device may use the local-end risk prediction model to predict the risk value of the user to be predicted. Specifically, the second device also performs each round of local iteration in each round of joint parameter updating according to the model parameter updating method in the above-mentioned embodiment to update the parameters in the second model. After performing multiple rounds of joint parameter updating, the second device The device uses the second model after the parameters are finally updated as the risk prediction model of the other end (wherein, the other end refers to the second device); the first device can adopt the risk prediction model of the local end, combined with the risk prediction model of the other end in the second device Predict to get the risk value of the user to be predicted. The risk value may be a value representing the user's risk level.
在一实施方式中,第二设备可以将他端风险预测模型发送给第一设备,第一设备采用将待预测用户在第一数据特征下的用户数据输入到本端风险预测模型得到一个模型输出,再将待预测用户在第二数据特征下的用户数据输入到他端风险预测模型得到一个模型输出,根据两个模型输出得到待预测用户的风险值,例如,将两个模型直接相加。在另一实施方式中,若第一设备是拥有标签数据的数据应用参与方,则第一设备将待预测用户在第一数据特征下的用户数据输入到本端风险预测模型得到一个模型输出;第二设备将待预测用户在第二数据特征下的用户数据输入到他端风险预测模型得到一个模型输出,并发送给第一设备;第一设备根据两个模型输出计算得到待预测用户的风险值,例如第一设备的本端风险预测模型包括如图3所示的NetK和Netc两部分时,第一设备将各个模型输出输入到Netc部分进行处理,得到待预测用户的风险值。在另一实施方式中,若第一设备是没有标签数据的数据提供参与方,第二设备是拥有标签数据的数据应用参与方,则第一设备将待预测用户在第一数据特征下的用户数据输入到本端风险预测模型得到一个模型输出,并将该模型输出发送给第二设备;第二设备将待预测用户在第二数据特征下的用户数据输入到他端风险预测模型得到一个模型输出,第二设备根据两个模型输出计算得到待预测用户的风险值,并将风险值返回给第一设备。In one embodiment, the second device may send the other-end risk prediction model to the first device, and the first device obtains a model output by inputting the user data of the user to be predicted under the first data feature into the local-end risk prediction model. , and then input the user data of the user to be predicted under the second data feature into the risk prediction model at the other end to obtain a model output, and obtain the risk value of the user to be predicted according to the outputs of the two models, for example, directly adding the two models. In another embodiment, if the first device is a data application participant with tag data, the first device inputs the user data of the user to be predicted under the first data feature into the local risk prediction model to obtain a model output; The second device inputs the user data of the user to be predicted under the second data feature into the risk prediction model of the other end to obtain a model output, and sends it to the first device; the first device calculates the risk of the user to be predicted according to the two model outputs For example, when the local risk prediction model of the first device includes NetK and Netc as shown in FIG. 3 , the first device inputs the output of each model to the Netc part for processing to obtain the risk value of the user to be predicted. In another embodiment, if the first device is a data provider without tag data, and the second device is a data application participant with tag data, the first device will predict the user of the user under the first data feature. The data is input into the risk prediction model of the local end to obtain a model output, and the model output is sent to the second device; the second device inputs the user data of the user to be predicted under the second data feature into the risk prediction model of the other end to obtain a model Output, the second device calculates and obtains the risk value of the user to be predicted according to the output of the two models, and returns the risk value to the first device.
在本实施例中,通过第一设备在与第二设备进行纵向联邦学习的过程中,增加能够表征第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量的近端优化损失,通过近端算函数来约束第一模型的参数在本地迭代中的变化量,避免本地迭代时参数值变化过大导致失真,从而实现了在减少用户风险预测时通信成本的同时,还能给保证用户风险预测的准确度。In this embodiment, in the process of performing longitudinal federated learning with the second device, the first device increases the parameter value of the parameter that can characterize the first model in the first device in this round of local iteration, compared with the parameter value in the preset The near-end optimization loss of the change of parameter values in the local iterations of the historical rounds. The near-end calculation function is used to constrain the change of the parameters of the first model in the local iteration to avoid distortion caused by excessive parameter value changes during the local iteration. Therefore, the communication cost in the user risk prediction can be reduced, and the accuracy of the user risk prediction can also be guaranteed.
进一步地,在一实施方式中,所述步骤A10包括:Further, in one embodiment, the step A10 includes:
步骤A101,接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果;Step A101, receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device;
第一设备在进行一轮联合参数更新时,第一设备接收第二设备发送的本轮联合参数更新的纵向联邦中间结果。具体地,若第一设备是拥有标签数据的数据应用参与方,则第二设备在本轮联合参数更新中将其训练数据输入第二模型进行处理得到模型的输出,将该输出作为中间结果发送给第一设备,该中间结果即纵向联邦中间结果。若第一设备是没有标签数据的数据提供参与方,则第一设备将其训练数据输入到第一模型进行处理得到模型的输出,将该输出作为中间结果发送给第二设备,第二设备计算预测损失相对于该输出的梯度值,将梯度值作为中间结果发送给第一设备,该中间结果即纵向联邦中间结果。When the first device performs a round of joint parameter update, the first device receives the vertical federation intermediate result of the current round of joint parameter update sent by the second device. Specifically, if the first device is a data application participant with label data, the second device inputs its training data into the second model for processing in this round of joint parameter update to obtain the output of the model, and sends the output as an intermediate result To the first device, the intermediate result is the vertical federated intermediate result. If the first device is a data provider without label data, the first device inputs its training data into the first model for processing to obtain the output of the model, and sends the output as an intermediate result to the second device, and the second device calculates The gradient value of the prediction loss relative to this output is sent to the first device as an intermediate result, which is the longitudinal federated intermediate result.
步骤A102,基于近端优化损失和所述纵向联邦中间结果对所述本端待训练模型中的参数进行预设轮数的本地迭代更新;Step A102, based on the near-end optimization loss and the vertical federation intermediate result, perform local iterative update of a preset number of rounds of parameters in the local model to be trained;
第一设备基于近端优化损失和纵向联邦中间结果对本端待训练模型中的参数进行预设轮数的本地迭代更新,具体可以参照上述第一、第二或第三实施例中的模型参数更新方法对本端待训练模型进行各轮本地迭代,在此不进行详细赘述。其中,预设轮数可以是预先根据需要设置的一个数量。The first device performs a local iterative update of a preset number of rounds of parameters in the model to be trained at the local end based on the near-end optimization loss and the vertical federation intermediate result. For details, refer to the model parameter update in the first, second or third embodiments above. The method performs each round of local iteration on the model to be trained at the local end, which will not be described in detail here. The preset number of rounds may be a number set in advance as required.
步骤A103,检测更新参数后的本端待训练模型是否满足预设模型条件;Step A103, detecting whether the local model to be trained after updating the parameters satisfies the preset model condition;
第一设备在进行预设轮数的本地迭代后,检测更新参数后的本端待训练模型是否满足预设模型条件。其中,预设模型条件可以是预先设置的一个条件,例如预测损失收敛,又如联合参数更新的轮次达到一个预定的轮次,或联合参数更新的时长达到一个预定的时长。After the first device performs local iterations for a preset number of rounds, it detects whether the local model to be trained after updating the parameters satisfies the preset model condition. The preset model condition may be a preset condition, such as the convergence of prediction loss, or the round of joint parameter update reaches a predetermined round, or the duration of joint parameter update reaches a predetermined duration.
步骤A104,若满足,则将更新参数后的本端待训练模型作为所述本端风险预测模型;Step A104, if it is satisfied, the local to-be-trained model after updating the parameters is used as the local-end risk prediction model;
若检测到满足预设模型条件,则第一设备可以将更新参数后的本端待训练模型作为本端风险预测模型。对应地,第二设备将更新参数后的他端待训练模型作为他端风险预测模型。If it is detected that the preset model condition is met, the first device may use the local-end to-be-trained model after updating the parameters as the local-end risk prediction model. Correspondingly, the second device uses the other-end to-be-trained model after updating the parameters as the other-end risk prediction model.
步骤A105,若不满足,则返回执行所述接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果的步骤。Step A105, if not satisfied, return to the step of receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device.
若检测到不满足预设模型条件,则第一设备再返回到上述步骤A101,也即,进行下一轮联合参数更新。If it is detected that the preset model condition is not met, the first device returns to the above-mentioned step A101, that is, performs the next round of joint parameter update.
进一步地,在一实施方式中,所述步骤A102包括:Further, in one embodiment, the step A102 includes:
步骤A1021,计算近端优化损失,并基于所述近端优化损失、所述本端待训练模型在本轮本地迭代中的模型输出以及所述纵向联邦中间结果,计算得到所述参数对应的梯度值;Step A1021: Calculate the near-end optimization loss, and calculate the gradient corresponding to the parameter based on the near-end optimization loss, the model output of the local model to be trained in this round of local iterations, and the vertical federation intermediate result value;
步骤A1022,采用所述梯度值更新所述参数以完成本轮本地迭代;Step A1022, using the gradient value to update the parameter to complete the current round of local iteration;
具体地,第一设备可以按照上述第一实施例中步骤S10和步骤S20的具体实施过程来计算近端优化损失以及各个参数对应的梯度值,并按照上述步 骤S30的具体实施过程来根据梯度值更新参数,在本实施例中不作详细赘述。Specifically, the first device can calculate the near-end optimization loss and the gradient value corresponding to each parameter according to the specific implementation process of step S10 and step S20 in the above-mentioned first embodiment, and according to the specific implementation process of the above-mentioned step S30, according to the gradient value The update parameters are not described in detail in this embodiment.
步骤A1023,检测本地迭代轮数是否达到预设轮数;Step A1023, detecting whether the number of local iteration rounds reaches a preset number of rounds;
步骤A1024,若达到,则执行所述检测更新参数后的本端待训练模型是否满足预设模型条件的步骤;Step A1024, if so, execute the step of detecting whether the local model to be trained after updating the parameters meets the preset model conditions;
步骤A1025,若未达到,则返回执行所述计算近端优化损失的步骤,并将所述本地迭代轮数自增1。Step A1025, if not reached, return to the step of calculating the near-end optimization loss, and increment the number of local iteration rounds by 1.
第一设备在完成一轮本地迭代后,检测当前本地迭代轮数是否达到了预设轮数;若达到,则第一设备执行步骤A103;若未达到,则将本地迭代轮数自增1,并返回至步骤A1021,也即进行下一轮本地迭代。After completing one round of local iteration, the first device detects whether the current number of local iteration rounds has reached the preset number of rounds; if so, the first device executes step A103; if not, then increments the number of local iteration rounds by one, And return to step A1021, that is, perform the next round of local iteration.
此外本申请实施例还提出一种模型参数更新装置,参照图6,所述装置部署于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述装置包括:In addition, an embodiment of the present application also proposes an apparatus for updating model parameters. Referring to FIG. 6 , the apparatus is deployed on a first device that participates in vertical federated learning, and the first device is communicatively connected to a second device that participates in vertical federated learning. The device includes:
第一计算模块10,用于计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The first calculation module 10 is configured to calculate a near-end optimization loss, wherein the near-end optimization loss represents that the parameter values of the parameters of the first model in the first device in this round of local iterations are compared with those in the preset history. The amount of change in the parameter value in the local iteration of the round;
第二计算模块20,用于基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;The second calculation module 20 is configured to calculate the parameter based on the proximal optimization loss, the model output of the first model in the current local iteration, and the longitudinal federation intermediate result received from the second device corresponding gradient value;
更新模块30,用于采用所述梯度值更新所述参数,以完成本轮本地迭代。The updating module 30 is configured to update the parameter by using the gradient value to complete the current round of local iteration.
进一步地,所述第一计算模块10包括:Further, the first computing module 10 includes:
第一计算单元,用于将所述第一设备中第一模型的参数在本轮本地迭代中的参数向量与在预设历史轮次的本地迭代中的参数向量进行对应元素相减,得到差向量;The first calculation unit is used to perform the corresponding element subtraction of the parameter vector of the parameter of the first model in the first device in this round of local iteration and the parameter vector in the local iteration of the preset historical round to obtain the difference. vector;
第二计算单元,用于计算所述差向量中各元素的平方和,基于所述平方和得到所述近端优化损失。The second calculation unit is configured to calculate the sum of squares of each element in the difference vector, and obtain the near-end optimization loss based on the sum of squares.
进一步地,当所述第一设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中模型的输出,所述第二计算模块20包括:Further, when the first device is a participant with label data, the vertical federated intermediate result is the output of the model in the second device, and the second calculation module 20 includes:
第一处理单元,用于将所述第一设备的训练数据输入所述第一设备中的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;a first processing unit, configured to input the training data of the first device into the first model in the first device for processing, and obtain the model output of the first model in the current round of local iteration;
第三计算单元,用于根据所述模型输出和所述纵向联邦中间结果计算得到预测结果,并基于所述预测结果和所述训练数据对应的标签数据计算得到预测损失;a third computing unit, configured to calculate and obtain a prediction result according to the model output and the vertical federation intermediate result, and calculate and obtain a prediction loss based on the prediction result and the label data corresponding to the training data;
第四计算单元,用于将所述预测损失和所述近端优化损失相加得到总损失,基于所述总损失计算得到所述参数对应的梯度值。The fourth calculation unit is configured to add the prediction loss and the near-end optimization loss to obtain a total loss, and calculate the gradient value corresponding to the parameter based on the total loss.
进一步地,当所述第二设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中预测损失相对于所述第一设备在本轮联合参数更新时发送的第一模型的输出的梯度值,Further, when the second device is a participant with tag data, the vertical federation intermediate result is the predicted loss in the second device relative to the first device sent during the current round of joint parameter update. The gradient value of the output of a model,
所述第二计算模块20包括:The second computing module 20 includes:
第二处理单元,用于将所述第一设备的训练数据输入所述第一设备的第 一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;The second processing unit is used to input the training data of the first device into the first model of the first device for processing, and obtain the model output of the first model in this round of local iterations;
第五计算单元,用于根据所述模型输出和所述纵向联邦中间结果计算得到所述预测损失相对于所述参数的第一子梯度值;a fifth calculation unit, configured to calculate and obtain a first sub-gradient value of the predicted loss relative to the parameter according to the model output and the longitudinal federation intermediate result;
第六计算单元,用于计算所述近端优化损失相对于所述参数的第二子梯度值,将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值。The sixth calculation unit is configured to calculate the second sub-gradient value of the near-end optimization loss relative to the parameter, and add the first sub-gradient value and the second sub-gradient value to obtain the corresponding value of the parameter. gradient value.
进一步地,所述第六计算单元还用于:Further, the sixth computing unit is also used for:
将所述第二子梯度值乘以预设调节系数后加上所述第一子梯度值得到所述参数对应的梯度值。The gradient value corresponding to the parameter is obtained by multiplying the second sub-gradient value by a preset adjustment coefficient and then adding the first sub-gradient value.
此外本申请实施例还提出一种用户风险预测装置,所述装置部署于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述装置包括:In addition, an embodiment of the present application further proposes a user risk prediction device, the device is deployed on a first device participating in vertical federated learning, the first device is in communication connection with a second device participating in vertical federated learning, and the device includes:
联邦学习模块,用于基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The federated learning module is used to jointly perform vertical federated learning with the second device based on the near-end optimization loss to obtain a local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in compared to the parameter value in the local iteration of the preset historical round;
预测模块,用于采用所述本端风险预测模型预测得到待预测用户的风险值。A prediction module, configured to use the local risk prediction model to predict and obtain the risk value of the user to be predicted.
进一步地,所述联邦学习模块包括:Further, the federated learning module includes:
接收单元,用于接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果;a receiving unit, configured to receive the vertical federation intermediate result of the current round of joint parameter update sent by the second device;
本地迭代单元,用于基于近端优化损失和所述纵向联邦中间结果对所述本端待训练模型中的参数进行预设轮数的本地迭代更新;a local iterative unit, configured to perform local iterative update of a preset number of rounds of parameters in the model to be trained at the local end based on the near-end optimization loss and the vertical federated intermediate result;
检测单元,用于检测更新参数后的本端待训练模型是否满足预设模型条件;A detection unit, configured to detect whether the local model to be trained after updating the parameters satisfies the preset model conditions;
确定单元,用于若满足,则将更新参数后的本端待训练模型作为所述本端风险预测模型;A determination unit, configured to use the local-end to-be-trained model after updating the parameters as the local-end risk prediction model if it is satisfied;
返回单元,用于若不满足,则返回执行所述接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果的步骤。The returning unit is configured to, if not satisfied, return to the step of receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device.
进一步地,所述本地迭代单元包括:Further, the local iterative unit includes:
计算子单元,用于计算近端优化损失,并基于所述近端优化损失、所述本端待训练模型在本轮本地迭代中的模型输出以及所述纵向联邦中间结果,计算得到所述参数对应的梯度值;The calculation subunit is used to calculate the near-end optimization loss, and based on the near-end optimization loss, the model output of the local to-be-trained model in the current round of local iterations, and the intermediate results of the vertical federation, the parameters are calculated and obtained corresponding gradient value;
更新子单元,用于采用所述梯度值更新所述参数以完成本轮本地迭代;an update subunit, configured to update the parameter by using the gradient value to complete the current round of local iteration;
检测子单元,用于检测本地迭代轮数是否达到预设轮数;A detection subunit for detecting whether the number of local iteration rounds reaches a preset number of rounds;
执行子单元,用于若达到,则执行所述检测更新参数后的本端待训练模型是否满足预设模型条件的步骤;an execution subunit, configured to execute the step of detecting whether the model to be trained at the local end after updating the parameters meets the preset model condition if it is reached;
返回子单元,用于若未达到,则返回执行所述计算近端优化损失的步骤,并将所述本地迭代轮数自增1。The returning subunit is used for returning to the step of calculating the near-end optimization loss if not reached, and incrementing the number of local iteration rounds by 1.
此外,本申请实施例还提出一种计算机可读存储介质,所述存储介质上存储有模型参数更新程序,所述模型参数更新程序被处理器执行时实现如上所述的模型参数更新方法的步骤。本申请还提出一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上所述的模型参数更新方法的步骤。本申请模型参数更新设备、计算机可读存储介质和计算机产品的各实施例,均可参照本申请模型参数更新方法各实施例,此处不再赘述。In addition, an embodiment of the present application also proposes a computer-readable storage medium, where a model parameter update program is stored on the storage medium, and when the model parameter update program is executed by a processor, the steps of the above-mentioned model parameter update method are implemented . The present application also proposes a computer program product, including a computer program, which implements the steps of the above-mentioned model parameter updating method when the computer program is executed by a processor. For the embodiments of the model parameter updating device, the computer-readable storage medium, and the computer product of the present application, reference may be made to the embodiments of the model parameter updating method of the present application, which will not be repeated here.
此外,本申请实施例还提出一种计算机可读存储介质,所述存储介质上存储有用户风险预测程序,所述用户风险预测程序被处理器执行时实现如上所述的用户风险预测方法的步骤。本申请还提出一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现如上所述的用户风险预测方法的步骤。本申请用户风险预测设备、计算机可读存储介质和计算机产品的各实施例,均可参照本申请用户风险预测方法各实施例,此处不再赘述。In addition, an embodiment of the present application also proposes a computer-readable storage medium, where a user risk prediction program is stored on the storage medium, and when the user risk prediction program is executed by a processor, the steps of the user risk prediction method described above are implemented . The present application also proposes a computer program product, comprising a computer program, when the computer program is executed by a processor, the steps of the above-mentioned user risk prediction method are implemented. For the embodiments of the user risk prediction device, the computer-readable storage medium, and the computer product of the present application, reference may be made to the embodiments of the user risk prediction method of the present application, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the embodiments of this application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only the preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied in other related technical fields , are similarly included within the scope of patent protection of this application.

Claims (20)

  1. 一种模型参数更新方法,其中,所述方法应用于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述方法包括以下步骤:A method for updating model parameters, wherein the method is applied to a first device participating in longitudinal federated learning, and the first device is communicatively connected to a second device participating in longitudinal federated learning, and the method includes the following steps:
    计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;Calculate a near-end optimization loss, wherein the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in the current round of local iterations compared to the parameter values in the preset historical rounds of local iterations the amount of change;
    基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;Calculate the gradient value corresponding to the parameter based on the proximal optimization loss, the model output of the first model in the current local iteration, and the longitudinal federated intermediate result received from the second device;
    采用所述梯度值更新所述参数,以完成本轮本地迭代。The parameters are updated using the gradient values to complete the current round of local iterations.
  2. 如权利要求1所述的模型参数更新方法,其中,所述计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量的步骤包括:The method for updating model parameters according to claim 1, wherein the calculating a near-end optimization loss, wherein the near-end optimization loss represents the parameters of the parameters of the first model in the first device in the current round of local iterations The step of changing the value compared to the parameter value in the local iteration of the preset historical round includes:
    将所述第一设备中第一模型的参数在本轮本地迭代中的参数向量与在预设历史轮次的本地迭代中的参数向量进行对应元素相减,得到差向量;The parameter vector of the parameter of the first model in the first device in the current round of local iteration and the parameter vector in the local iteration of the preset historical round are subtracted by corresponding elements to obtain a difference vector;
    计算所述差向量中各元素的平方和,基于所述平方和得到所述近端优化损失。A sum of squares of each element in the difference vector is calculated, and the near-end optimization loss is obtained based on the sum of squares.
  3. 如权利要求1至2任一项所述的模型参数更新方法,其中,当所述第一设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中模型的输出,The method for updating model parameters according to any one of claims 1 to 2, wherein, when the first device is a participant with label data, the vertical federated intermediate result is the output of the model in the second device ,
    所述基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值的步骤包括:The step of calculating the gradient value corresponding to the parameter based on the proximal optimization loss, the model output of the first model in the current round of local iterations, and the longitudinal federated intermediate result received from the second device include:
    将所述第一设备的训练数据输入所述第一设备中的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;inputting the training data of the first device into the first model in the first device for processing to obtain the model output of the first model in this round of local iterations;
    根据所述模型输出和所述纵向联邦中间结果计算得到预测结果,并基于所述预测结果和所述训练数据对应的标签数据计算得到预测损失;Calculate the predicted result according to the model output and the vertical federated intermediate result, and calculate the predicted loss based on the predicted result and the label data corresponding to the training data;
    将所述预测损失和所述近端优化损失相加得到总损失,基于所述总损失计算得到所述参数对应的梯度值。A total loss is obtained by adding the prediction loss and the near-end optimization loss, and a gradient value corresponding to the parameter is calculated based on the total loss.
  4. 如权利要求1至2任一项所述的模型参数更新方法,其中,当所述第二设备为拥有标签数据的参与方时,所述纵向联邦中间结果为所述第二设备中预测损失相对于所述第一设备在本轮联合参数更新时发送的第一模型的输出的梯度值,The method for updating model parameters according to any one of claims 1 to 2, wherein, when the second device is a participant with label data, the vertical federated intermediate result is the relative prediction loss in the second device. Based on the gradient value of the output of the first model sent by the first device during the current round of joint parameter update,
    所述基于所述近端优化损失、所述模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值的步骤包括:The step of calculating the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the model in the current round of local iterations, and the longitudinal federated intermediate result received from the second device includes:
    将所述第一设备的训练数据输入所述第一设备的第一模型进行处理,得到所述第一模型在本轮本地迭代中的模型输出;Input the training data of the first device into the first model of the first device for processing, and obtain the model output of the first model in this round of local iterations;
    根据所述模型输出和所述纵向联邦中间结果计算得到所述预测损失相对于所述参数的第一子梯度值;Calculate the first sub-gradient value of the predicted loss with respect to the parameter according to the model output and the longitudinal federated intermediate result;
    计算所述近端优化损失相对于所述参数的第二子梯度值,将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值。A second sub-gradient value of the proximal optimization loss relative to the parameter is calculated, and the first sub-gradient value and the second sub-gradient value are added to obtain a gradient value corresponding to the parameter.
  5. 如权利要求4所述的模型参数更新方法,其中,所述将所述第一子梯度值和所述第二子梯度值相加得到所述参数对应的梯度值的步骤包括:The method for updating model parameters according to claim 4, wherein the step of adding the first sub-gradient value and the second sub-gradient value to obtain the gradient value corresponding to the parameter comprises:
    将所述第二子梯度值乘以预设调节系数后加上所述第一子梯度值得到所述参数对应的梯度值。The gradient value corresponding to the parameter is obtained by multiplying the second sub-gradient value by a preset adjustment coefficient and then adding the first sub-gradient value.
  6. 一种用户风险预测方法,其中,所述方法应用于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述方法包括以下步骤:A user risk prediction method, wherein the method is applied to a first device participating in longitudinal federated learning, the first device is communicatively connected with a second device participating in longitudinal federated learning, and the method includes the following steps:
    基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The local-end risk prediction model is obtained by jointly performing longitudinal federated learning with the second device based on the near-end optimization loss, wherein the near-end optimization loss represents the comparison of the parameter values of the parameters of the local-end model to be trained in the current local iteration The amount of change in the parameter value in the local iteration of the preset historical round;
    采用所述本端风险预测模型预测得到待预测用户的风险值。The risk value of the user to be predicted is obtained by using the local end risk prediction model to predict.
  7. 如权利要求6所述的用户风险预测方法,其中,所述基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型的步骤包括:The user risk prediction method according to claim 6, wherein the step of jointly performing longitudinal federated learning with the second device based on the near-end optimization loss to obtain the local-end risk prediction model comprises:
    接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果;receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device;
    基于近端优化损失和所述纵向联邦中间结果对所述本端待训练模型中的参数进行预设轮数的本地迭代更新;Based on the near-end optimization loss and the vertical federated intermediate result, locally iteratively update the parameters in the local model to be trained with a preset number of rounds;
    检测更新参数后的本端待训练模型是否满足预设模型条件;Check whether the local model to be trained after updating the parameters meets the preset model conditions;
    若满足,则将更新参数后的本端待训练模型作为所述本端风险预测模型;If it is satisfied, the local to-be-trained model after updating the parameters is used as the local-end risk prediction model;
    若不满足,则返回执行所述接收所述第二设备发送的本轮联合参数更新的纵向联邦中间结果的步骤。If not satisfied, return to the step of receiving the vertical federation intermediate result of the current round of joint parameter update sent by the second device.
  8. 如权利要求6至7任一项所述的用户风险预测方法,其中,所述基于近端优化损失和所述纵向联邦中间结果对所述本端待训练模型中的参数进行预设轮数的本地迭代更新的步骤包括:The user risk prediction method according to any one of claims 6 to 7, wherein the parameters in the model to be trained at the local end are subjected to a preset number of rounds based on the near-end optimization loss and the longitudinal federation intermediate result. The steps for local iterative update include:
    计算近端优化损失,并基于所述近端优化损失、所述本端待训练模型在本轮本地迭代中的模型输出以及所述纵向联邦中间结果,计算得到所述参数对应的梯度值;Calculate the near-end optimization loss, and calculate the gradient value corresponding to the parameter based on the near-end optimization loss, the model output of the local model to be trained in the current round of local iteration, and the intermediate result of the vertical federation;
    采用所述梯度值更新所述参数以完成本轮本地迭代;Use the gradient value to update the parameter to complete the current round of local iteration;
    检测本地迭代轮数是否达到预设轮数;Check whether the number of local iteration rounds reaches the preset number of rounds;
    若达到,则执行所述检测更新参数后的本端待训练模型是否满足预设模型条件的步骤;If so, perform the step of detecting whether the local model to be trained after updating the parameters meets the preset model conditions;
    若未达到,则返回执行所述计算近端优化损失的步骤,并将所述本地迭代轮数自增1。If not, return to the step of calculating the near-end optimization loss, and increment the number of local iteration rounds by 1.
  9. 一种模型参数更新装置,其中,所述装置部署于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述装置包括:An apparatus for updating model parameters, wherein the apparatus is deployed on a first device that participates in longitudinal federated learning, the first device is in communication connection with a second device that participates in longitudinal federated learning, and the device includes:
    第一计算模块,用于计算近端优化损失,其中,所述近端优化损失表征所述第一设备中第一模型的参数在本轮本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;A first calculation module, configured to calculate a near-end optimization loss, wherein the near-end optimization loss represents the parameter value of the parameter of the first model in the first device in this round of local iterations compared to the parameter value in the preset historical round The amount of change in the parameter value in the next local iteration;
    第二计算模块,用于基于所述近端优化损失、所述第一模型在本轮本地迭代中的模型输出以及从所述第二设备接收到的纵向联邦中间结果,计算得到所述参数对应的梯度值;The second calculation module is configured to calculate the corresponding parameter based on the near-end optimization loss, the model output of the first model in this local iteration, and the longitudinal federation intermediate result received from the second device the gradient value of ;
    更新模块,用于采用所述梯度值更新所述参数,以完成本轮本地迭代。An update module, configured to update the parameter by using the gradient value to complete the current round of local iteration.
  10. 一种模型参数更新设备,其中,所述模型参数更新设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如权利要求1所述的模型参数更新方法的步骤。A model parameter update device, wherein the model parameter update device includes: a memory, a processor, and a model parameter update program stored on the memory and executable on the processor, the model parameter update program being The processor implements the steps of the model parameter updating method according to claim 1 when executed.
  11. 一种模型参数更新设备,其中,所述模型参数更新设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如权利要求2所述的模型参数更新方法的步骤。A model parameter update device, wherein the model parameter update device includes: a memory, a processor, and a model parameter update program stored on the memory and executable on the processor, the model parameter update program being The processor implements the steps of the model parameter updating method according to claim 2 when executed.
  12. 一种模型参数更新设备,其中,所述模型参数更新设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如权利要求3所述的模型参数更新方法的步骤。A model parameter update device, wherein the model parameter update device includes: a memory, a processor, and a model parameter update program stored on the memory and executable on the processor, the model parameter update program being The processor implements the steps of the model parameter updating method according to claim 3 when executed.
  13. 一种用户风险预测装置,其中,所述装置部署于参与纵向联邦学习的第一设备,所述第一设备与参与纵向联邦学习的第二设备通信连接,所述装置包括:A user risk prediction device, wherein the device is deployed on a first device participating in longitudinal federated learning, the first device is in communication connection with a second device participating in longitudinal federated learning, and the device includes:
    联邦学习模块,用于基于近端优化损失与所述第二设备联合进行纵向联邦学习得到本端风险预测模型,其中,所述近端优化损失表征本端待训练模型的参数在当次本地迭代中的参数值相比于在预设历史轮次的本地迭代中参数值的变化量;The federated learning module is used to jointly perform vertical federated learning with the second device based on the near-end optimization loss to obtain the local-end risk prediction model, wherein the near-end optimization loss represents the parameters of the local-end to-be-trained model in the current local iteration The amount of change in the parameter value in compared to the parameter value in the local iteration of the preset historical round;
    预测模块,用于采用所述本端风险预测模型预测得到待预测用户的风险值。A prediction module, configured to use the local risk prediction model to predict and obtain the risk value of the user to be predicted.
  14. 一种用户风险预测设备,其中,所述用户风险预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如权利要求6所述的模型参数更新方法的步骤。A user risk prediction device, wherein the user risk prediction device includes: a memory, a processor, and a model parameter update program stored on the memory and executable on the processor, the model parameter update program being The processor implements the steps of the model parameter updating method as claimed in claim 6 when executed.
  15. 一种用户风险预测设备,其中,所述用户风险预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如权利要求7所述的 模型参数更新方法的步骤。A user risk prediction device, wherein the user risk prediction device includes: a memory, a processor, and a model parameter update program stored on the memory and executable on the processor, the model parameter update program being The processor implements the steps of the model parameter updating method as claimed in claim 7 when executed.
  16. 一种用户风险预测设备,其中,所述用户风险预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的模型参数更新程序,所述模型参数更新程序被所述处理器执行时实现如权利要求8所述的模型参数更新方法的步骤。A user risk prediction device, wherein the user risk prediction device includes: a memory, a processor, and a model parameter update program stored on the memory and executable on the processor, the model parameter update program being The processor implements the steps of the model parameter updating method according to claim 8 when executed.
  17. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有模型参数更新程序,所述模型参数更新程序被处理器执行时实现如权利要求1-5任一项所述的模型参数更新方法的步骤。A computer-readable storage medium, wherein a model parameter update program is stored on the computer-readable storage medium, and when the model parameter update program is executed by a processor, the model according to any one of claims 1-5 is implemented The steps of the parameter update method.
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有模型参数更新程序,所述模型参数更新程序被处理器执行时实现如权利要求6-9任一项所述的模型参数更新方法的步骤。A computer-readable storage medium, wherein a model parameter update program is stored on the computer-readable storage medium, and when the model parameter update program is executed by a processor, the model according to any one of claims 6-9 is implemented The steps of the parameter update method.
  19. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求1-5任一项所述的模型参数更新方法的步骤。A computer program product, comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the model parameter updating method according to any one of claims 1-5.
  20. 一种计算机程序产品,包括计算机程序,其中,所述计算机程序被处理器执行时实现如权利要求6-9任一项所述的模型参数更新方法的步骤。A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the model parameter updating method according to any one of claims 6-9.
PCT/CN2021/094936 2021-03-17 2021-05-20 Model parameter updating method, apparatus and device, storage medium, and program product WO2022193432A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110287041.3A CN113011603A (en) 2021-03-17 2021-03-17 Model parameter updating method, device, equipment, storage medium and program product
CN202110287041.3 2021-03-17

Publications (1)

Publication Number Publication Date
WO2022193432A1 true WO2022193432A1 (en) 2022-09-22

Family

ID=76409316

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/094936 WO2022193432A1 (en) 2021-03-17 2021-05-20 Model parameter updating method, apparatus and device, storage medium, and program product

Country Status (2)

Country Link
CN (1) CN113011603A (en)
WO (1) WO2022193432A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186782A (en) * 2023-04-17 2023-05-30 北京数牍科技有限公司 Federal graph calculation method and device and electronic equipment
CN116205313A (en) * 2023-04-27 2023-06-02 数字浙江技术运营有限公司 Federal learning participant selection method and device and electronic equipment
CN116610958A (en) * 2023-06-20 2023-08-18 河海大学 Unmanned aerial vehicle group reservoir water quality detection oriented distributed model training method and system
CN117151208A (en) * 2023-08-07 2023-12-01 大连理工大学 Asynchronous federal learning parameter updating method based on self-adaptive learning rate, electronic equipment and storage medium
CN117575291A (en) * 2024-01-15 2024-02-20 湖南科技大学 Federal learning data collaborative management method based on edge parameter entropy

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330759B (en) * 2022-03-08 2022-08-02 富算科技(上海)有限公司 Training method and system for longitudinal federated learning model
WO2024036526A1 (en) * 2022-08-17 2024-02-22 华为技术有限公司 Model scheduling method and apparatus
CN116128072B (en) * 2023-01-20 2023-08-25 支付宝(杭州)信息技术有限公司 Training method, device, equipment and storage medium of risk control model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754105A (en) * 2017-11-07 2019-05-14 华为技术有限公司 A kind of prediction technique and terminal, server
CN111210003A (en) * 2019-12-30 2020-05-29 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium
CN111242316A (en) * 2020-01-09 2020-06-05 深圳前海微众银行股份有限公司 Longitudinal federated learning model training optimization method, device, equipment and medium
CN111860864A (en) * 2020-07-23 2020-10-30 深圳前海微众银行股份有限公司 Longitudinal federal modeling optimization method, device and readable storage medium
WO2020234984A1 (en) * 2019-05-21 2020-11-26 日本電気株式会社 Learning device, learning method, computer program, and recording medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754105A (en) * 2017-11-07 2019-05-14 华为技术有限公司 A kind of prediction technique and terminal, server
WO2020234984A1 (en) * 2019-05-21 2020-11-26 日本電気株式会社 Learning device, learning method, computer program, and recording medium
CN111210003A (en) * 2019-12-30 2020-05-29 深圳前海微众银行股份有限公司 Longitudinal federated learning system optimization method, device, equipment and readable storage medium
CN111242316A (en) * 2020-01-09 2020-06-05 深圳前海微众银行股份有限公司 Longitudinal federated learning model training optimization method, device, equipment and medium
CN111860864A (en) * 2020-07-23 2020-10-30 深圳前海微众银行股份有限公司 Longitudinal federal modeling optimization method, device and readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116186782A (en) * 2023-04-17 2023-05-30 北京数牍科技有限公司 Federal graph calculation method and device and electronic equipment
CN116205313A (en) * 2023-04-27 2023-06-02 数字浙江技术运营有限公司 Federal learning participant selection method and device and electronic equipment
CN116205313B (en) * 2023-04-27 2023-08-11 数字浙江技术运营有限公司 Federal learning participant selection method and device and electronic equipment
CN116610958A (en) * 2023-06-20 2023-08-18 河海大学 Unmanned aerial vehicle group reservoir water quality detection oriented distributed model training method and system
CN117151208A (en) * 2023-08-07 2023-12-01 大连理工大学 Asynchronous federal learning parameter updating method based on self-adaptive learning rate, electronic equipment and storage medium
CN117151208B (en) * 2023-08-07 2024-03-22 大连理工大学 Asynchronous federal learning parameter updating method based on self-adaptive learning rate, electronic equipment and storage medium
CN117575291A (en) * 2024-01-15 2024-02-20 湖南科技大学 Federal learning data collaborative management method based on edge parameter entropy
CN117575291B (en) * 2024-01-15 2024-05-10 湖南科技大学 Federal learning data collaborative management method based on edge parameter entropy

Also Published As

Publication number Publication date
CN113011603A (en) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2022193432A1 (en) Model parameter updating method, apparatus and device, storage medium, and program product
CN112202928B (en) Credible unloading cooperative node selection system and method for sensing edge cloud block chain network
US10891161B2 (en) Method and device for virtual resource allocation, modeling, and data prediction
WO2022016964A1 (en) Vertical federated modeling optimization method and device, and readable storage medium
CN113408743B (en) Method and device for generating federal model, electronic equipment and storage medium
CN107230133B (en) Data processing method, equipment and computer storage medium
US20170206361A1 (en) Application recommendation method and application recommendation apparatus
US11748452B2 (en) Method for data processing by performing different non-linear combination processing
CN110837653B (en) Label prediction method, apparatus and computer readable storage medium
WO2022048195A1 (en) Longitudinal federation modeling method, apparatus, and device, and computer readable storage medium
CN111797999A (en) Longitudinal federal modeling optimization method, device, equipment and readable storage medium
WO2023103864A1 (en) Node model updating method for resisting bias transfer in federated learning
CN110889759A (en) Credit data determination method, device and storage medium
WO2023217127A1 (en) Causation determination method and related device
CN110795768A (en) Model learning method, device and system based on private data protection
CN110874638B (en) Behavior analysis-oriented meta-knowledge federation method, device, electronic equipment and system
CN112292696A (en) Determining action selection guidelines for an execution device
CN115270001A (en) Privacy protection recommendation method and system based on cloud collaborative learning
CN112100642A (en) Model training method and device for protecting privacy in distributed system
CN112861165A (en) Model parameter updating method, device, equipment, storage medium and program product
WO2022188534A1 (en) Information pushing method and apparatus
CN113592593B (en) Training and application method, device, equipment and storage medium of sequence recommendation model
CN114760308A (en) Edge calculation unloading method and device
CN111510473B (en) Access request processing method and device, electronic equipment and computer readable medium
CN107766944B (en) System and method for optimizing system function flow by utilizing API analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21931017

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21931017

Country of ref document: EP

Kind code of ref document: A1