CN116432040B

CN116432040B - Model training method, device and medium based on federal learning and electronic equipment

Info

Publication number: CN116432040B
Application number: CN202310706543.4A
Authority: CN
Inventors: 马平; 兰春嘉
Original assignee: Shanghai Lingshuzhonghe Information Technology Co ltd
Current assignee: Shanghai Lingshuzhonghe Information Technology Co ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-09-01
Anticipated expiration: 2043-06-15
Also published as: CN116432040A

Abstract

The embodiment of the application discloses a model training method, device, medium and electronic equipment based on federal learning. The method is executed by a training participant of the global total model and specifically comprises the following steps: training the local sub-model by utilizing the local data set so that the local sub-model outputs a local predicted value; the local predicted values are shared in the training participation group of the global total model, and the joint predicted values of the global total model are determined according to the local predicted values shared by other participants in the training participation group; determining a global loss value of the global total model according to the combined predicted value; the method comprises the steps of determining a latest gradient value of a local sub-model based on a global loss value of a global total model, and updating the local sub-model based on the latest gradient value to train the global total model. According to the technical scheme, the model training method based on federal learning is simplified, the calculation complexity and the communication complexity of the model training method are reduced, and the practicability of the model training method is improved.

Description

Model training method, device and medium based on federal learning and electronic equipment

Technical Field

The application relates to the technical field of computers, in particular to a model training method, device, medium and electronic equipment based on federal learning.

Background

Federal learning is a distributed machine learning technology, and the core idea is to implement balance of data privacy protection and data value sharing calculation by performing distributed model training among a plurality of data sources with local data sets, and constructing a global total model based on virtual fusion data only by exchanging relevant intermediate results of model parameters on the premise of not exchanging the local data sets. Federal learning can be classified into horizontal federal learning, vertical federal learning, and transfer federal learning.

Longitudinal federal learning focuses on protecting the local data sets of the training participants and the sub-model parameters (e.g., gradients, etc.) of the local sub-model in the model training phase. In the related art, the model training method based on federal learning has higher calculation complexity and communication complexity and poorer performance in practical application.

Disclosure of Invention

The application provides a model training method, device, medium and electronic equipment based on federal learning, which are suitable for the condition of model training of a regression model without a central coordinator based on federal learning, and can achieve the purposes of simplifying the model training method based on federal learning, reducing the calculation complexity and communication complexity of the model training method and improving the practicability of the model training method.

According to a first aspect of the present application, there is provided a federally learning based model training method performed by a training participant of a global ensemble model, the method comprising:

training a local sub-model by utilizing a local data set so that the local sub-model outputs a local predicted value;

sharing a local predicted value in a training participation group of a global total model, and determining a joint predicted value of the global total model according to the local predicted values shared by other participants in the training participation group;

determining a global loss value of the global total model according to the joint predicted value;

and determining the latest gradient value of the local sub-model based on the global loss value of the global total model, and updating the local sub-model based on the latest gradient value to train the global total model.

According to a second aspect of the present application, there is provided a model training apparatus based on federal learning, configured as a training participant in a global ensemble model, the apparatus comprising:

the local prediction value determining module is used for training the local sub-model by utilizing the local data set so that the local sub-model outputs a local prediction value;

the joint prediction value determining module is used for sharing the local prediction value in the training participation group of the global total model and determining the joint prediction value of the global total model according to the local prediction value shared by other participants in the training participation group;

The global loss value determining module is used for determining a global loss value of the global total model according to the joint predicted value;

and the local sub-model updating module is used for determining the latest gradient value of the local sub-model based on the global loss value of the global total model and updating the local sub-model based on the latest gradient value so as to train the global total model.

According to a third aspect of the present application, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a federal learning based model training method according to embodiments of the present application.

According to a fourth aspect of the present application, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements a model training method based on federal learning according to an embodiment of the present application when the computer program is executed by the processor.

According to the technical scheme, the local sub-model is trained by utilizing the local data set, so that the local sub-model outputs a local predicted value; sharing a local predicted value in a training participation group of a global total model, and determining a joint predicted value of the global total model according to the local predicted values shared by other participants in the training participation group; determining a global loss value of the global total model according to the joint predicted value; and determining the latest gradient value of the local sub-model based on the global loss value of the global total model, and updating the local sub-model based on the latest gradient value to train the global total model. The data privacy security of the training participants is ensured, leakage of local data sets, sub-model parameters and the like of the training participants is avoided, meanwhile, a model training method based on federal learning is simplified, the calculation complexity and the communication complexity of the model training method are reduced, and the practicability of the model training method is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a federal learning-based model training method provided in accordance with an embodiment one;

FIG. 2 is a flow chart of a federal learning-based model training method provided in accordance with a second embodiment;

FIG. 3 is a flow chart of a federal learning-based model training method provided in accordance with a third embodiment;

fig. 4 is a schematic structural diagram of a model training device based on federal learning according to a fourth embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," "target," and the like in the description and claims of the present application and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a model training method based on federal learning, which is applicable to a case of performing model training on a regression model of a non-central coordinator based on federal learning according to an embodiment, where the method may be configured to be performed by a model training device based on federal learning, and the model training device based on federal learning may be configured to a training participant of a global overall model, implemented in hardware and/or software, and may be integrated into an electronic device running the system.

As shown in fig. 1, the method includes:

s110, training a local sub-model by utilizing a local data set so that the local sub-model outputs a local predicted value;

wherein the local sub-model is configured at a training participant, the training participant utilizing a local data set for training. The local sub-model corresponds to the training participant, and the global total model can be obtained by aggregating the local sub-models of the training participant. The training participants of the global total model are multiple. The specific number of training participants is not limited herein and may be determined based on actual circumstances.

The local data set is used for training the local sub-model, and is reserved for privacy of a training participant and is not disclosed externally. Optionally, model training is based on longitudinal federal learning. The same user exists in the local data sets of different training participants. Feature dimensions in the local data sets of different training participants are different for the same user.

Optionally, the business scenario to which the global total model is applicable includes, but is not limited to: loan amount prediction, house price prediction, investment risk analysis, crop fertilization amount analysis, student learning ability assessment, patient organ function prediction and the like.

The identity type of the training participant is related to an application scenario of the global total model, for example, if the global total model is applied to a business scenario of loan amount prediction, that is, if the global total model is a loan amount prediction model, the training participant may include: tax administration, lending bank, e-commerce platform and credit investigation organization. Wherein the local data set of the tax authority comprises a user feature dimension for assessing the ability to compensate; the local data set of the lending bank includes user feature dimensions for assessing loan potential; the local data set of the e-commerce platform includes a user feature dimension for rating consumption levels; the local data set of the credit bureau includes a user feature dimension for rating the credit.

It will be appreciated that to ensure that the global overall model is accurate to the desired degree, the local sub-model needs to be trained multiple times by the training participants. Optionally, the training participant model initializes the local sub-model before the training participant trains the local sub-model with the local data set. Specifically, the training participants initialize sub-model parameters of the local sub-model, such as model gradient, model learning rate, or training round number. It is known that the local sub-model and sub-model parameters of the training participants in the training participant set are different during the initialization phase.

The training participants train the local sub-model by using the local data set, and the local sub-model outputs a local predicted value under the condition that one training is completed. It will be appreciated that the local sub-model is trained using the training participant's local data set, the local sub-model having a characteristic sensitivity to the training participant's local data set. The local predictors are related to user feature dimensions included in the local dataset. For example, if the global total model is applied to a business scenario of loan amount prediction, and the training participant is an e-commerce platform, the local data set of the e-commerce platform includes a user feature dimension for evaluating the consumption level. And training the local sub-model by using a local data set of the e-commerce platform, wherein the local predicted value output by the local sub-model is only related to the consumption level of the user. The local predicted value corresponding to the e-commerce platform predicts the loan amount from the consumption level of the user. That is to say only the consumer level of the user is taken into account.

However, it is known that loan amount predictions are not only related to consumption level, but also to loan potential, compensation capability, and credit rating.

Optionally, all training participants of the global total model train the local sub-model by using the local data set, and each training participant of the global total model has a corresponding local predicted value. Illustratively, other training participants, such as tax authorities, lending banks, credit authorities, and the like, in addition to the e-commerce platform, use the local data set to train the local sub-model. And respectively predicting the loan amount according to the characteristic dimensions of the repayment capability, the loan potential, the credit level and the like of the user.

S120, sharing a local predicted value in a training participation group of a global total model, and determining a joint predicted value of the global total model according to the local predicted values shared by other participants in the training participation group;

each training participation group of the global total model has a local prediction value corresponding to the training participation group. The user characteristic dimensions corresponding to the local predicted values of different training participation groups are different. That is, the training engagement sets of the global ensemble model are all predicted from the feature dimensions of the respective local datasets.

To obtain the joint prediction value of the global total model, the local prediction value of each training participant needs to be comprehensively considered.

The training participation group corresponds to the global total model, the training participants are members in the training participation group, and the members in the training participation group participate in the training process of the global total model together. The training participants share the local predicted values in the training participation group of the global total model, so that the local predicted values can be acquired by other participants in the training participation group, and the other participants in the training participation group share the local predicted values of the other participants in the training participation group.

The training participants determine the joint predicted value of the global total model according to the local predicted value of the training participants and the local predicted value shared by other participants in the training participation group.

The combined predicted value comprehensively considers the user characteristic dimension included by the training participants in the training participation group, and has higher prediction accuracy compared with the local predicted value of the training participants.

Optionally, the training participants perform addition operation on the local predicted value and the local predicted value shared by other participants in the training participation group, and determine the joint predicted value of the global total model according to the addition operation result.

S130, determining a global loss value of the global total model according to the joint prediction value.

Optionally, a global loss value of the global total model is determined according to the combined predicted value and the target statistical value. Specifically, a predictor difference between the joint predictor and the target statistic is determined, and a global loss value of the global total model is determined according to the predictor difference.

The global loss value of the global total model is used for measuring the matching degree of the joint predicted value and the target statistical value of the global total model.

Optionally, determining a target participant in the training participation group of the global total model according to the available calculation power of the training participant and whether the target statistical value is contained, collecting the local predicted values of other participants in the training participation group through the target participant, and calculating the joint predicted value of the global total model by the target participant according to the local predicted values of the target participant and the local predicted values collected from the other participants. The target participants may be one or a plurality of target participants. The specific number of target participants is not limited herein and may be determined based on actual circumstances. It will be appreciated that in the case of multiple target participants, a task decomposition of the joint prediction value may be determined, jointly completed by each target participant. Exemplary, training participation group packages Involving 4 training participants with P ₁ ，P ₂ ，P ₃ And P ₄ Representation, where P ₂ Is a training participant with target statistics. Will P ₁ And P ₂ Determined as target participant, by P ₁ And P ₂ The core computing task of determining the joint prediction value is undertaken. Specifically, P ₁ ，P ₂ ，P ₃ And P ₄ And the local predicted value is subjected to slicing processing, and the local predicted value is divided into a local shared value and a local reserved value. Optionally, the training participants share the local sharing value in the training participation group of the global total model based on a preset rule. The preset rule may be P ₁ And P ₂ Exchange local sharing value between P ₃ And P ₄ Share the local sharing value to P ₂ 。P ₁ ，P ₂ ，P ₃ And P ₄ Each retaining a local retention value. P (P) ₁ According to the local reserved value, from P ₂ The obtained local sharing value is used for determining a joint prediction value; p (P) ₂ According to the local reserved value, from P ₁ ，P ₃ And P ₄ The obtained local sharing value is used for determining a joint prediction value; p (P) ₃ And P ₄ Then a joint prediction value is determined from the local reserve value. Notably, the comprehensive training participant P ₁ ，P ₂ ，P ₃ And P ₄ The determined joint prediction values may determine joint prediction values of the global total model. Training participant P ₁ ，P ₂ ，P ₃ And P ₄ The determined actual is the predicted value patch of the global total model.

Accordingly, the global loss value may also be determined by the target participant. That is, the training participants in the training participant-group need to train the local sub-model using the local data set, and as for the joint prediction value and the global loss value of the global total model, the target participants in the training participant-group can be determined according to the available calculation power and whether the target statistics are included. Optionally, the target participant shares the determined joint prediction value or global loss value in the training participation group of the global total model, so that other participants in the training participation group can update the local sub-model based on the global loss value.

And S140, determining the latest gradient value of the local sub-model based on the global loss value of the global total model, and updating the local sub-model based on the latest gradient value to train the global total model.

The global penalty value of the global total model may be used to determine a latest gradient value of the local sub-model, wherein the latest gradient value of the local sub-model is related to the global penalty value of the global total model, the latest gradient value of the local sub-model being used to determine a direction in which sub-model parameters of the local sub-model are updated.

And updating the local sub-model based on the latest gradient value, so that the joint predicted value of the global total model can be matched with the target statistical value, and the prediction accuracy of the global total model is promoted to meet the expectation.

The latest gradient values are related to the sub-model parameters of the local sub-model and the local sub-model of the training participant. The latest gradient values are determined by the training participants in the training participant set, each based on the global loss values.

According to the technical scheme, the local sub-model is trained by utilizing the local data set, so that the local sub-model outputs a local predicted value; sharing a local predicted value in a training participation group of a global total model, and determining a joint predicted value of the global total model according to the local predicted values shared by other participants in the training participation group; determining a global loss value of the global total model according to the joint predicted value; and determining the latest gradient value of the local sub-model based on the global loss value of the global total model, and updating the local sub-model based on the latest gradient value to train the global total model. The data privacy safety of the training participants is guaranteed, the leakage of local data sets and sub-model parameters of the training participants is avoided, meanwhile, the model training method based on federal learning is simplified, the calculation complexity and the communication complexity of the model training method are reduced, and the practicability of the model training method is improved.

In an alternative embodiment, updating the local sub-model based on the latest gradient values to train the global total model includes: determining a last gradient value of the local sub-model in the last training; determining a gradient value difference value between adjacent iterations according to the last gradient value and the latest gradient value; if the gradient value difference value is greater than or equal to a preset threshold value, updating the local sub-model by using the latest gradient value, and continuing training the local sub-model to train the global total model until the gradient value difference value between adjacent iterations is smaller than the preset threshold value.

The last training is adjacent to the current training, and a last gradient value of the local sub-model in the last round of training is determined. And determining the gradient value difference value between adjacent iterations according to the last gradient value and the latest gradient value. The gradient value difference value between adjacent iterations is used for measuring the training effect of the local submodel. The preset threshold is an index for judging whether the local sub-model is trained.

Optionally, determining a relative magnitude relation between the gradient value difference value between adjacent iterations and a preset threshold, if the gradient value difference value is smaller than the preset threshold, indicating that the local sub-model has converged, and that the local sub-model has been trained, and the training participant can stop training the local sub-model. If the gradient value difference value is greater than or equal to a preset threshold value, the local sub-model is not converged, and the training participant needs to train the local sub-model continuously. Specifically, the training participant updates the last gradient value of the local sub-model with the latest gradient value. Training of the local sub-model is continued based on the latest gradient values until the gradient value difference between adjacent iterations is smaller than a preset threshold, i.e. until the local sub-model converges. The technical scheme provides a feasible scheme for determining whether the local sub-model is trained or not for training participants, and provides technical support for model training based on federal learning.

In an alternative embodiment, sharing the local predicted value in the training participation group of the global total model, and determining the joint predicted value of the global total model according to the local predicted value shared by other participants in the training participation group includes: performing slicing processing on the local predicted value, and dividing the local predicted value into a local shared value and a local reserved value; sharing the local sharing value in a training participation group of the global total model, and acquiring local sharing values of other participants in the training participation group; and determining a joint prediction value of the global total model according to the local reserved value and the local sharing value of the other participants.

The training participants perform slicing processing on the local predicted value, and divide the local predicted value into a local shared value and a local reserved value. The local sharing value is used for sharing to other participants in the training participation group, and the local reserved value is reserved by the training participants. The training participants acquire local sharing values of other participants in the training participation group, and the joint prediction value of the global total model is determined according to the local reserved value and the local sharing values of the other participants.

Optionally, the training participants share the local sharing value in the training participation group of the global total model based on a preset rule. Based on preset rules, the training participants may determine to share the local sharing value to the corresponding training participants in the training participant group. The preset rule is specifically determined according to the actual situation, and is not limited herein. Continuing with the above example, the training participation group includes 4 training participants each using P ₁ ，P ₂ ，P ₃ And P ₄ Representation, where P ₂ Is a party containing target statistics. Based on available calculation force and whether the target statistics are contained, P ₁ And P ₂ Is determined as the target participant. P (P) ₁ ，P ₂ ，P ₃ And P ₄ And the local predicted value is subjected to slicing processing, and the local predicted value is divided into a local shared value and a local reserved value. The preset rule may be P ₁ And P ₂ Exchange local sharing value between P ₃ And P ₄ Share the local sharing value to P ₂ 。P ₁ ，P ₂ ，P ₃ And P ₄ Each retaining the present retention value. P (P) ₁ According to the local reserved value, from P ₂ The obtained local sharing value is used for determining a joint prediction value; p (P) ₂ According to the local reserved value, from P ₁ ，P ₃ And P ₄ The obtained local sharing value is used for determining a joint prediction value; p (P) ₃ And P ₄ Determining a union based on the local reserve valueAnd combining the predicted values. Notably, the comprehensive training participant P ₁ ，P ₂ ，P ₃ And P ₄ The determined joint prediction values may determine joint prediction values of the global total model. Training participant P ₁ ，P ₂ ，P ₃ And P ₄ The determined actual is the predicted value patch of the global total model.

According to the technical scheme, the joint predicted values of the global total model can be stored in the training participants of the training participation group in a scattered manner, so that the data privacy safety in the model training process is effectively ensured, the situation that the joint predicted values leak due to collusion of the training participants is effectively avoided, and meanwhile, the calculation pressure of the joint predicted values is scattered.

Example two

Fig. 2 is a flowchart of a model training method based on federal learning according to a second embodiment. The present embodiment further optimizes the foregoing embodiments, and in particular refines the operation of "determining the global loss value of the global total model according to the joint prediction value".

As shown in fig. 2, the method includes:

s210, training a local sub-model by utilizing a local data set so that the local sub-model outputs a local predicted value;

s220, sharing a local predicted value in a training participation group of a global total model, and determining a joint predicted value of the global total model according to the local predicted values shared by other participants in the training participation group;

the training participants in the training participation group all have corresponding joint predicted values, and the joint predicted values of the global total model can be determined only by integrating the joint predicted values determined by the training participants. The joint predictions determined by the training participants are actually fragments of the predictions of the global total model.

S230, performing slicing processing on the joint predicted value, and dividing the joint predicted value into a joint sharing value and a joint reserved value;

the training participants carry out slicing processing on the self-determined joint predicted values, and divide the joint predicted values into joint sharing values and joint reserved values.

Notably, the training participants perform the slicing processing on the joint predicted values in the plaintext form, so that the computational complexity in model training based on federal learning can be reduced, and the data privacy safety and the model training efficiency are both considered.

S240, sharing the joint sharing value in a training participation group of the global total model, and acquiring joint sharing values of other participants in the training participation group;

the training participants share the joint sharing value in the training participation group of the global total model, and the joint reserved value is reserved. The training participants acquire joint sharing values of other participants in the training participation group.

S250, determining loss value fragments of a global total model according to the joint reserved value and the joint sharing value of the other participants;

the training participants determine loss value fragments of the global total model according to the joint reserved values and the joint sharing values of other participants.

Optionally, the training participants perform a summation operation on the self-reserved joint reserved value and the joint shared value of other participants to determine loss value fragments of the global total model.

Optionally, the training participants share the joint sharing value in the training participation group of the global total model based on a preset rule. The training participants may determine, based on preset rules, to share the joint sharing value to corresponding training participants in the training participant group. The preset rules are related to available computing power of training participants in the training participation group and whether the training participants contain target statistical values or not. The preset rule is specifically determined according to the actual situation, and is not limited herein. Continuing with the above example, the training participation group includes 4 training participants each using P ₁ ，P ₂ ，P ₃ And P ₄ Representation, where P ₂ Is a training participant with target statistics. Based on available calculation force and whether the target statistics are contained, P ₁ And P ₂ Is determined as the target participant. The preset rule may be P ₁ And P ₂ The joint sharing value is exchanged between the two,P ₃ and P ₄ Respectively sharing the joint sharing value to P ₂ ，P ₁ ，P ₂ ，P ₃ And P ₄ Each retaining a joint retention value. P (P) ₁ According to the joint reservation value, from P ₂ The obtained joint sharing value is used for determining loss value fragments; p (P) ₂ According to the joint reservation value, from P ₁ ，P ₃ And P ₄ And determining loss value fragments according to the acquired joint sharing values. Alternatively, P ₃ And P ₄ Then a loss value shard is determined from the joint reservation value.

Optionally, to further improve the calculation efficiency of the loss value chip, the calculation pressure of the loss value chip is further dispersed, P ₄ Sharing the joint sharing value to P ₂ At the same time, issue a joint reservation value to P ₃ Through P ₃ Based on own joint reserved value and P ₄ Shared joint retention value determination loss value fragmentation, that is to say, training participant P ₄ The method does not participate in determining loss value fragments, so that load balance among training participants in the model training process can be realized, and the calculation force requirements on the training participants can be reduced.

S260, determining the global loss value of the global total model according to the loss value fragments of the global total model and loss value fragments obtained from other participants in the training participation group.

And synthesizing fragments of the loss values determined by the training participants in the training participation group, and determining the global loss value of the global total model.

Optionally, a summation operation is performed on the loss value patches determined by the training participants within the training participant set.

Optionally, determining a target participant in the training participation group, and determining a global loss value of the global total model according to the obtained summation result and the target statistic value by the target participant. And sharing the determined global loss value to other participants in the training participation group through the target participants, so that the training participants in the training participation group update the local sub-model based on the global loss value of the global total model.

S270, determining the latest gradient value of the local sub-model based on the global loss value of the global total model, and updating the local sub-model based on the latest gradient value to train the global total model.

The training participants in the training participation set each determine a local sub-model latest gradient value based on the global loss value of the global total model. The local sub-model is updated by each training participant based on the latest gradient values.

According to the technical scheme, the training participants divide the joint predicted value into the joint sharing value and the joint reserved value, the training participants share the joint sharing value in the training participant group, and the joint sharing values of other participants in the training participant group are obtained. And determining loss value fragments of the global total model according to the joint reserved value and the joint sharing value of other participants. And determining the global loss value of the global total model according to the loss value fragments of the global total model and the loss value fragments obtained from other participants in the training participation group. The global penalty values are used to update the local sub-model to train the global total model. The data privacy safety in the model training process is effectively guaranteed, and the situation that the joint predicted value is leaked due to collusion of training participants is effectively avoided.

In an alternative embodiment, determining the global loss value of the global total model according to the loss value fragments of the global total model and loss value fragments obtained from other participants in the training participation group includes: encrypting the loss value fragments of the global total model to obtain a loss value ciphertext; sharing the loss value ciphertext in the training participation group, and acquiring the loss value ciphertext of other participants in the training participation group; and calculating the loss value fragments of the global total model and loss value ciphertext obtained from other participants in the training participation group to obtain the global loss value of the global total model.

Optionally, the training participant encrypts the loss value fragments of the global total model by adopting a homomorphic encryption algorithm to obtain a loss value ciphertext. And the training participants share the loss value ciphertext in the training participation group, and share the loss value ciphertext to other participants in the training participation group. The training participants acquire the loss value ciphertext of other participants in the training participation group. And carrying out summation operation on the loss value ciphertext shared by the training participants in the training participation group, and determining the global loss value in the ciphertext form based on the summation result. And decrypting the global loss value in the ciphertext form through the training party to obtain a loss value plaintext of the global total model.

Because the loss value ciphertext is obtained by encrypting through a homomorphic encryption algorithm. It will be appreciated that processing homomorphically encrypted data results in an output that is decrypted in the same way as the output result from processing unencrypted raw data in the same way. Thus, the global penalty value for the global total model may be determined based on the penalty value ciphertext.

Optionally, according to the available computing power, determining the target participant in the training participation group, and encrypting the determined loss value fragments by the target participant by using a homomorphic encryption algorithm, wherein other participants in the training participation group can directly share the loss value fragments in a plaintext form in the training participation group without encrypting the loss value fragments.

And determining the global loss value of the global total model according to the loss value ciphertext shared by the target participant and the loss value plaintext shared by other participation modes. The global loss value thus determined is in ciphertext form. The global loss value in ciphertext form is then shared among the training participation sets by the target participants. The encryption work of the loss value fragments is completed by the target participants, so that the computational complexity and the communication complexity of the model training method based on federal learning are reduced.

According to the technical scheme, the loss value fragments of the global total model are encrypted, then the loss value ciphertext is shared in the training participation group, and the loss value ciphertext is used for determining the global loss value of the global total model, so that the data privacy safety of the training participants in the model training process is effectively ensured.

Example III

Fig. 3 is a flowchart of a model training method based on federal learning according to a third embodiment. The present embodiment further optimizes the foregoing embodiments, and in particular refines the operation of determining the latest gradient value of the local sub-model based on the global loss value of the global total model.

As shown in fig. 3, the method includes:

s310, training a local sub-model by utilizing a local data set so that the local sub-model outputs a local predicted value;

s320, sharing a local predicted value in a training participation group of a global total model, and determining a joint predicted value of the global total model according to the local predicted values shared by other participants in the training participation group;

s330, determining a global loss value of the global total model according to the joint prediction value;

s340, determining a gradient left factor of the training participant according to the global loss value of the global total model and the sample characteristic value of the local data set;

Wherein the gradient left factor corresponds to the training participant, the gradient left factor being used to determine the latest gradient value of the local sub-model.

The training participants in the training participation group determine the gradient left factors of the local sub-model according to the global loss value of the global total model and the sample characteristic values of the local data set.

Optionally, the training participant determines the gradient left factor based on:；

wherein, the liquid crystal display device comprises a liquid crystal display device,representing a local submodel M _i Gradient left factor of (c), local submodel M _i With training participants P _i In a corresponding manner,with training participants P _i Correspondingly, as the sample characteristic value of the local data set,representing a global loss value. Optionally, the global loss value is in the form of ciphertext, and the global loss value is determined based on loss value fragments, wherein the loss value fragments can be obtained based on homomorphic encryption algorithm.

S350, determining the gradient right factor of the training party according to the sub-model coefficient of the local sub-model and the sample characteristic value of the local data set.

Optionally, each training participant in the training participation group performs multiplication on the sub-model coefficient and the regularized coefficient, and determines a gradient right factor of the training participant according to a multiplication result. In particular, based on Determining training participants P _i Right factor of the gradient of (c). Wherein, the liquid crystal display device comprises a liquid crystal display device,representing regularization coefficients;representing a local submodel M _i Is used for the model coefficients.To train participator P _i Right gradient factor of (c).

S360, according to the gradient right factor and the gradient left factor, determining the latest gradient value of the local sub-model, and updating the local sub-model based on the latest gradient value to train the global total model.

Optionally, the training participants in the training participation group perform addition operation on the right gradient factor and the left gradient factor, and determine the latest gradient value of the local sub-model according to the addition operation result. In particular, based onThe latest gradient value of the local sub-model is determined, wherein,representing training participants P _i Training at the jth timeThe latest gradient value of the training.Respectively represent training participants P _i Gradient left factor and gradient right factor of (c).

Optionally, training participant P _i Updating the sub-model coefficients of the local sub-model based on the latest gradient values, in particular training the participants P _i Based on the formulaDetermining the latest coefficient of the local sub-model, and updating the sub-model parameters of the local sub-model by using the latest coefficientTo update the local sub-model, wherein,representing the model learning rate.

According to the technical scheme, a gradient left factor of a training participant is determined according to a global loss value of a global total model and a sample characteristic value of a local data set; determining a gradient right factor of the training participant according to the sub-model coefficient of the local sub-model and the sample characteristic value of the local data set; according to the method, a feasible model gradient value determining method is provided according to the gradient left factor and the gradient right factor which are used for determining the latest gradient value of the local sub-model, and technical support is provided for model training based on federal learning.

In an alternative embodiment, determining the latest gradient value of the local sub-model from the gradient right factor and the gradient left factor comprises: if the gradient left factor of the training participant is in a ciphertext form, decrypting the gradient left factor to obtain a gradient left factor in a plaintext form; and determining the latest gradient value of the local sub-model according to the gradient right factor and the gradient left factor in a plaintext form.

Wherein the gradient left factor is determined from the global loss value. The global loss value is determined according to loss value fragments, and the loss value fragments can be obtained through encryption processing; accordingly, the global loss value and the gradient left factor may both be in ciphertext form. And if the gradient left factor of the training party is in the ciphertext form, decrypting the gradient left factor to obtain the gradient left factor in the plaintext form. Because the gradient right factor is determined according to the submodel coefficient of the local submodel, the submodel coefficient is the data of the training participant, and the gradient right factor is in a plaintext form.

Optionally, the training participant performs a summation operation on the gradient right factor and the gradient left factor in a plaintext form, and determines the latest gradient value of the local sub-model according to the summation operation result.

According to the technical scheme, a feasible latest gradient value determining method is provided, and technical support is provided for model training based on federal learning.

In a specific embodiment, in the business scenario of loan amount prediction, the global total model is a loan amount prediction model. The training participants of the global total model include: tax administration agency, loan bank, E-commerce platform and credit investigation agency, respectively using P ₁ ，P ₂ ，P ₃ And P ₄ Representation, where P ₂ Is a training participant with target statistics. The model training method based on federal learning provided by the application can comprise the following stages: an initialization phase, a joint prediction phase and a training iteration phase.

In the initialization stage, the training participants of the global total model respectively initialize the local submodel, and specifically, the submodel parameters of the local submodelAn initialization is performed, wherein,。

a joint prediction phase comprising the steps of: 1) Training participants in a j-th iterationOutputs a local prediction value of a local sub-model of (a) ，Is a sample feature value of the local data set. 2) Training participants share local predictors, and in particular, training participants share local predictors based on secret sharing algorithmsSplitting intoAndtwo portions of the fragments, wherein,the method comprises the steps of carrying out a first treatment on the surface of the The training participants then share in the training participation group. Specifically, the target participants in the training participation set are determined based on available computing power and whether the target statistics are included, and the target participants can beAnd。andexchange with each other；Andfeeding ofThe method comprises the steps of carrying out a first treatment on the surface of the 3) The joint prediction value is calculated in a distributed manner, specifically,a joint prediction value is calculated, wherein,is that，Is thatA kind of electronic device；Based on the formulaWherein, the method comprises the steps of,representation of，Respectively represent P ₁ ，P ₃ And P ₄ A kind of electronic device。According toDetermining a joint prediction value, wherein,. As can be seenThe joint predictor determined by the training participants is a predictor fragment of the global predictor. And comprehensively training predicted value fragments determined by the participants, and determining the global loss value of the global total model.

Training iteration stage: 1) Training participants share joint predictions, and in particular, training participants share joint predictions based on a secret sharing algorithmSplitting intoAndtwo portions of the fragments, wherein,. In particular, the method comprises the steps of,andexchanging the joint sharing values with each other; Sharing the joint sharing value to，Respectively sharing the joint sharing and joint reservation values to. 2) The global loss value is calculated in a distributed manner, first,calculating loss value fragments:wherein, the method comprises the steps of, wherein,is that；Calculating loss value fragments:wherein, the method comprises the steps of, wherein,is thatIs used to determine the joint reserved value of (a),，andrespectively is，Andis a joint sharing value of (1);based on=Loss value patches are calculated. Wherein, the liquid crystal display device comprises a liquid crystal display device,is that. Alternatively to this, the method may comprise,andand encrypting the loss value fragments obtained by calculation by using the public key of the opposite party based on the homomorphic encryption algorithm to obtain a loss value ciphertext.Andthe obtained loss value ciphertext is respectively usedAndand (3) representing. Next to this, the process is carried out,，andthe loss value ciphertext is shared among the training participation sets, and in particular,andthe fragments of the loss value are exchanged with each other,sharing loss value fragments to the same timeAnd. 3) Calculating global loss values, in particular, byBased onA global loss value is calculated, wherein,the value of the global loss is represented as,the fragments of the loss value in the form of plaintext,is the slaveThe loss value fragments in the form of the ciphertext obtained,is the slaveThe obtained fragments of the loss value in the plaintext form.Based onCalculating a global loss value; then, by training participantsThe calculated global loss value is sent to other participants in the training participation group Andso that all training participants in the training participation group can obtain the global loss value. Notably, the training participantsAndthe calculation results in a global loss value in ciphertext form. By training participantsThe undetermined global loss value ciphertext is shared to the training participation groupAndcan ensure that except for the training participation groupBeyond that，Andare all based on，Loss value fragmentation determined based on homomorphic encryption algorithms. This is advantageous in simplifying the subsequent data decryption process. 4) Calculating gradient left factors, specifically, training participants P in training participant group ₁ ，P ₂ ，P ₃ And P ₄ Training participants are based on formulasThe gradient left factors are each calculated. Notably, training participant P ₁ ， P ₃ And P ₄ Are all based on P ₁ The determined global loss value calculates a gradient left factor. Training participant P ₂ The gradient left factor is calculated based on the global loss value determined by itself. 5) Decryption of gradient left factor, gradient leftThe factor is determined based on the global loss value of the ciphertext form, and the gradient left factor is the ciphertext form. And decrypting the gradient left factor, specifically, determining a training participant for encrypting the global loss value, sending the gradient left factor to the training participant for encrypting the global loss value, and decrypting the gradient left factor through the training participant for encrypting the global loss value. Specifically, P ₁ ，P ₃ And P ₄ Send gradient left factor to P ₂ Through P ₂ The decryption results in a gradient left factor in plaintext form. P (P) ₂ Send gradient left factor to P ₁ Through P ₁ The decryption results in a gradient left factor in plaintext form. Then pass through P ₁ And P ₂ The decrypted gradient left factors in the plaintext form are respectively fed back to P ₂ P ₁ ，P ₃ And P ₄ . 6) The right factor of the gradient is calculated,based onCalculating a gradient right factor;. 7) The latest gradient value is calculated and the gradient value is calculated,based onCalculating the latest gradient value of the local submodel in the current iteration。The left factor of the gradient is represented,representing the gradient right factor. 8) The coefficients of the sub-model are updated,based on respectivelySub-model coefficients are calculated that update the respective local sub-model, wherein,the learning rate of the model is represented,representing the submodel coefficients. 9) Determining whether training is complete, in particular, determining gradient value differences between adjacent iterationsWherein, the method comprises the steps of, wherein,the value of the latest gradient is represented,representing the last gradient value, subscript j represents the jth iteration. If it isAfter updating the submodel coefficients, the iterative training is ended. Otherwise, after updating the submodel coefficients, continuing to train the local submodel for the next iteration, wherein,in order to be a preset threshold value,the specific value is determined according to the actual service requirement, and is not limited herein.

According to the technical scheme, the data privacy safety of the training participants is guaranteed, the leakage of local data sets and sub-model parameters of the training participants is avoided, meanwhile, the model training method based on federal learning is simplified, the calculation complexity and the communication complexity of the model training method are reduced, and the practicability of the model training method is improved.

Example IV

Fig. 4 is a schematic structural diagram of a model training device based on federal learning according to a fourth embodiment of the present application, where the present embodiment is applicable to a case where model training is performed on a regression model of a coreless coordinator based on federal learning. The apparatus may be implemented in software and/or hardware and may be integrated in an electronic device such as a smart terminal.

As shown in fig. 4, the apparatus may include: a local predictor determination module 410, a joint predictor determination module 420, a global penalty determination module 430, and a local submodel update module 440.

A local prediction value determining module 410, configured to train a local sub-model with a local data set, so that the local sub-model outputs a local prediction value;

the joint prediction value determining module 420 is configured to share a local prediction value in a training participation group of a global total model, and determine a joint prediction value of the global total model according to the local prediction values shared by other participants in the training participation group;

A global loss value determining module 430, configured to determine a global loss value of the global total model according to the joint prediction value;

the local sub-model updating module 440 is configured to determine a latest gradient value of the local sub-model based on the global loss value of the global total model, and update the local sub-model based on the latest gradient value to train the global total model.

Optionally, the joint prediction value determining module 420 includes: the local predicted value segmentation sub-module is used for carrying out segmentation processing on the local predicted value and dividing the local predicted value into a local shared value and a local reserved value; the local sharing value sharing sub-module is used for sharing the local sharing value in the training participation group of the global total model and obtaining the local sharing value of other participants in the training participation group; and the joint prediction value determination submodule is used for determining the joint prediction value of the global total model according to the local reserved value and the local sharing value of the other participants.

Optionally, the global loss value determination module 430 includes: the joint prediction value segmentation sub-module is used for carrying out segmentation processing on the joint prediction value and dividing the joint prediction value into a joint sharing value and a joint reserved value; the joint sharing value sharing sub-module is used for sharing the joint sharing value in a training participation group of the global total model and acquiring joint sharing values of other participants in the training participation group; the loss value fragment determining submodule is used for determining loss value fragments of the global total model according to the joint reserved value and the joint sharing value of the other participants; and the global loss value determining submodule is used for determining the global loss value of the global total model according to the loss value fragments of the global total model and the loss value fragments obtained from other participants in the training participation group.

Optionally, the global loss value determining submodule includes: the loss value encryption unit is used for encrypting the loss value fragments of the global total model to obtain a loss value ciphertext; the loss value ciphertext sharing unit is used for sharing the loss value ciphertext in the training participation group and acquiring the loss value ciphertext of other participants in the training participation group; and the global loss value determining unit is used for calculating loss value fragments of the global total model and loss value ciphertext acquired from other participants in the training participation group to obtain the global loss value of the global total model.

Optionally, the local submodel updating module 440 includes: the gradient left factor determining submodule is used for determining the gradient left factor of the training participant according to the global loss value of the global total model and the sample characteristic value of the local data set; the gradient right factor determining submodule is used for determining the gradient right factor of the training participant according to the submodel coefficient of the local submodel and the sample characteristic value of the local data set; and the latest gradient value determining submodule is used for determining the latest gradient value of the local submodule according to the gradient right factor and the gradient left factor.

Optionally, the latest gradient value determining submodule includes: the gradient left factor decryption unit is used for decrypting the gradient left factor if the gradient left factor of the training participant is in a ciphertext form, so as to obtain the gradient left factor in a plaintext form; and the latest gradient value determining unit is used for determining the latest gradient value of the local sub-model according to the gradient right factor and the gradient left factor in a plaintext form.

Optionally, the local sub-model updating module 440 may further include: the last gradient value determining submodule is used for determining a last gradient value of the local submodel in the last training; a gradient value difference value determining submodule, configured to determine a gradient value difference value between adjacent iterations according to the previous gradient value and the latest gradient value; and the local sub-model updating sub-module is used for updating the local sub-model by using the latest gradient value if the gradient value difference value is greater than or equal to a preset threshold value, and continuing training the local sub-model to train the global total model until the gradient value difference value between adjacent iterations is smaller than the preset threshold value.

The model training device based on federal learning provided by the embodiment of the application can execute the model training method based on federal learning provided by any embodiment of the application, and has the corresponding performance module and beneficial effects of executing the model training method based on federal learning.

In the technical scheme of the disclosure, the related user data are collected, stored, used, processed, transmitted, provided, disclosed and the like, all conform to the regulations of related laws and regulations and do not violate the popular regulations of the public order.

Example five

Fig. 5 illustrates a schematic diagram of an electronic device 510 that can be used to implement an embodiment. The electronic device 510 includes at least one processor 511, and a memory, such as a Read Only Memory (ROM) 512, a Random Access Memory (RAM) 513, etc., communicatively coupled to the at least one processor 511, wherein the memory stores computer programs executable by the at least one processor, and the processor 511 may perform various suitable actions and processes in accordance with the computer programs stored in the Read Only Memory (ROM) 512 or the computer programs loaded from the storage unit 518 into the Random Access Memory (RAM) 513. In the RAM 513, various programs and data required for the operation of the electronic device 510 can also be stored. The processor 511, the ROM 512, and the RAM 513 are connected to each other by a bus 514. An input/output (I/O) interface 515 is also connected to bus 514.

Various components in the electronic device 510 are connected to the I/O interface 515, including: an input unit 516 such as a keyboard, a mouse, etc.; an output unit 517 such as various types of displays, speakers, and the like; a storage unit 518 such as a magnetic disk, optical disk, etc.; and a communication unit 519 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 519 allows the electronic device 510 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The processor 511 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of processor 511 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 511 performs the various methods and processes described above, such as the federal learning-based model training method.

In some embodiments, the federal learning-based model training method can be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 518. In some embodiments, some or all of the computer program may be loaded and/or installed onto the electronic device 510 via the ROM 512 and/or the communication unit 519. When the computer program is loaded into RAM 513 and executed by processor 511, one or more steps of the federal learning-based model training method described above may be performed. Alternatively, in other embodiments, processor 511 may be configured to perform the federal learning-based model training method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present application, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data processing server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present application are achieved, and the present application is not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A model training method based on federal learning, performed by a training participant of a global ensemble model, the method comprising:

determining a latest gradient value of a local sub-model based on a global loss value of the global total model, and updating the local sub-model based on the latest gradient value to train the global total model; the global total model is a loan amount prediction model and is used for predicting the loan amount; the training participants of the global total model include: tax administration, loan bank, e-commerce platform and credit investigation organization;

wherein determining a global loss value of the global total model according to the joint prediction value comprises: performing slicing processing on the joint predicted value, and dividing the joint predicted value into a joint sharing value and a joint reserved value; sharing the joint sharing value in a training participation group of the global total model, and acquiring joint sharing values of other participants in the training participation group; determining loss value fragments of a global total model according to the joint reserved value and the joint sharing value of the other participants; determining a global loss value of the global total model according to the loss value fragments of the global total model and loss value fragments obtained from other participants in the training participation group;

Wherein determining the latest gradient value of the local sub-model based on the global loss value of the global total model comprises: determining a gradient left factor of a training participant according to the global loss value of the global total model and the sample characteristic value of the local data set; determining a gradient right factor of the training participant according to the sub-model coefficient of the local sub-model and the sample characteristic value of the local data set; and determining the latest gradient value of the local sub-model according to the gradient right factor and the gradient left factor.

2. The method of claim 1, wherein determining a global loss value for a global total model from loss value patches for the global total model and loss value patches obtained from other participants in the training participation group comprises:

encrypting the loss value fragments of the global total model to obtain a loss value ciphertext;

sharing the loss value ciphertext in the training participation group, and acquiring the loss value ciphertext of other participants in the training participation group;

and calculating the loss value fragments of the global total model and loss value ciphertext obtained from other participants in the training participation group to obtain the global loss value of the global total model.

3. The method of claim 1, wherein determining the most recent gradient value for the local sub-model based on the right gradient factor and the left gradient factor comprises:

if the gradient left factor of the training participant is in a ciphertext form, decrypting the gradient left factor to obtain a gradient left factor in a plaintext form;

and determining the latest gradient value of the local sub-model according to the gradient right factor and the gradient left factor in a plaintext form.

4. The method of claim 1, wherein sharing local predictors within a training participation group of a global overall model and determining joint predictors for the global overall model based on local predictors shared by other participants within the training participation group comprises:

performing slicing processing on the local predicted value, and dividing the local predicted value into a local shared value and a local reserved value;

sharing the local sharing value in a training participation group of the global total model, and acquiring local sharing values of other participants in the training participation group;

and determining a joint prediction value of the global total model according to the local reserved value and the local sharing value of the other participants.

5. The method of claim 1, wherein updating a local sub-model based on the latest gradient values to train the global total model comprises:

determining a last gradient value of the local sub-model in the last training;

determining a gradient value difference value between adjacent iterations according to the last gradient value and the latest gradient value;

if the gradient value difference value is greater than or equal to a preset threshold value, updating the local sub-model by using the latest gradient value, and continuing training the local sub-model to train the global total model until the gradient value difference value between adjacent iterations is smaller than the preset threshold value.

6. A model training device based on federal learning, characterized in that the device comprises:

the local sub-model updating module is used for determining the latest gradient value of the local sub-model based on the global loss value of the global total model and updating the local sub-model based on the latest gradient value so as to train the global total model; the global total model is a loan amount prediction model and is used for predicting the loan amount; the training participants of the global total model include: tax administration, loan bank, e-commerce platform and credit investigation organization;

wherein, the global loss value determining module includes: the joint prediction value segmentation sub-module is used for carrying out segmentation processing on the joint prediction value and dividing the joint prediction value into a joint sharing value and a joint reserved value; the joint sharing value sharing sub-module is used for sharing the joint sharing value in a training participation group of the global total model and acquiring joint sharing values of other participants in the training participation group; the loss value fragment determining submodule is used for determining loss value fragments of the global total model according to the joint reserved value and the joint sharing value of the other participants; the global loss value determining submodule is used for determining the global loss value of the global total model according to the loss value fragments of the global total model and the loss value fragments obtained from other participants in the training participation group;

The local sub-model updating module comprises: the gradient left factor determining submodule is used for determining the gradient left factor of the training participant according to the global loss value of the global total model and the sample characteristic value of the local data set; the gradient right factor determining submodule is used for determining the gradient right factor of the training participant according to the submodel coefficient of the local submodel and the sample characteristic value of the local data set; and the latest gradient value determining submodule is used for determining the latest gradient value of the local submodule according to the gradient right factor and the gradient left factor.

7. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the federal learning-based model training method according to any of claims 1-5.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the federal learning-based model training method according to any one of claims 1-5 when the computer program is executed by the processor.