CN115965093A

CN115965093A - Model training method and device, storage medium and electronic equipment

Info

Publication number: CN115965093A
Application number: CN202111176309.2A
Authority: CN
Inventors: 刘洋; 蔡权伟; 吴烨
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2023-04-14

Abstract

The disclosure relates to a model training method, a model training device, a storage medium and an electronic device, which are used for reducing computing resources, memory resources and communication resources in a joint training process under the condition of protecting data privacy and improving the joint training efficiency. The method comprises the following steps: acquiring sample data and determining a label value corresponding to the sample data; determining a predicted value corresponding to the sample data through a joint training model; determining a first gradient and a second gradient of model parameters of the joint training model according to the label value and the predicted value; if the first gradient and the second gradient meet a preset gradient condition, performing differential privacy processing on the first gradient to obtain a first target gradient, and performing differential privacy processing on the second gradient to obtain a second target gradient; and sending the first target gradient and the second target gradient to the other end participating in the joint training.

Description

Model training method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of machine learning technologies, and in particular, to a model training method and apparatus, a storage medium, and an electronic device.

Background

Federal Learning (fed Learning) is a "privacy protection + distributed" machine Learning technique, and solves the problem of how to jointly train a global model based on virtual "aggregated" data while protecting privacy when sensitive data exists in multiple independent organizations, groups, or individuals. In the training process of the global model, a homomorphic encryption mode is generally adopted to protect the privacy of data. However, the homomorphic encryption method has a high computational complexity and a problem of ciphertext expansion, so that a large amount of computational resources, memory resources and communication resources are required to be consumed in the model training process.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a model training method, the method comprising:

acquiring sample data and determining a label value corresponding to the sample data;

determining a predicted value corresponding to the sample data through a joint training model;

determining a first gradient and a second gradient of model parameters of the joint training model according to the label value and the predicted value, wherein a derivative corresponding to the first gradient is different from a derivative corresponding to the second gradient;

if the first gradient and the second gradient meet a preset gradient condition, performing differential privacy processing on the first gradient to obtain a first target gradient, and performing differential privacy processing on the second gradient to obtain a second target gradient;

and sending the first target gradient and the second target gradient to the other end participating in the joint training, wherein the first target gradient and the second target gradient are used for adjusting the parameters of the joint training model by the other end.

In a second aspect, the present disclosure also provides a model training apparatus, the apparatus comprising:

the first determining module is used for acquiring sample data and determining a label value corresponding to the sample data;

the second determination module is used for determining a predicted value corresponding to the sample data through a joint training model;

a third determining module, configured to determine a first gradient and a second gradient of a model parameter of the joint training model according to the label value and the predicted value, where a derivative corresponding to the first gradient is different from a derivative corresponding to the second gradient;

the first processing module is used for carrying out differential privacy processing on the first gradient to obtain a first target gradient and carrying out differential privacy processing on the second gradient to obtain a second target gradient when the first gradient and the second gradient both meet a preset gradient condition;

and the sending module is used for sending the first target gradient and the second target gradient to the other end participating in the joint training, and the first target gradient and the second target gradient are used for adjusting the parameters of the joint training model by the other end.

In a third aspect, the present disclosure also provides a computer readable medium having stored thereon a computer program which, when executed by a processing apparatus, performs the steps of the method of the first aspect.

In a fourth aspect, the present disclosure also provides an electronic device, including:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.

Through the technical scheme, privacy protection can be performed on data in the joint training process in a differential privacy mode. Compared with homomorphic encryption, the differential privacy is plaintext calculation, so that the calculation complexity can be reduced, the efficiency of the joint training can be improved, and the consumption of memory resources and the consumption of communication resources in the joint training process can be reduced. On the other hand, before privacy differentiation, whether the first gradient and the second gradient meet a preset gradient condition or not can be judged, and if the first gradient and the second gradient meet the preset gradient condition, privacy differentiation processing is performed on the first gradient and the second gradient. Therefore, data distortion caused by differential privacy processing can be reduced, and the joint training effect is improved while data privacy is guaranteed. In addition, the gradient of the other end participating in the joint training for adjusting the model parameters is obtained by calculation based on the original data characteristics, and compared with a mode of predicting the gradient at the other end, the accuracy of the model training can be improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

FIG. 1 is a schematic diagram of data content that each participant has in federated learning;

FIG. 2 is a flow chart illustrating a method of model training according to an exemplary embodiment of the present disclosure;

FIG. 3 is a diagram illustrating data at one end and at the other end of a model training method according to an exemplary embodiment of the present disclosure;

FIG. 4 is a block diagram illustrating a model training apparatus in accordance with an exemplary embodiment of the present disclosure;

fig. 5 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence of the functions performed by the devices, modules or units. It is further noted that references to "a", "an", and "the" modifications in the present disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Federal Learning (fed Learning) is a "privacy protection + distributed" machine Learning technique, and solves the problem of how to jointly train a global model based on virtual "aggregated" data while protecting privacy when sensitive data exists in multiple independent organizations, groups, or individuals. The concept of federal learning is extended to horizontal federal, vertical federal, and federal migration. Where the horizontal federation has a large amount of feature overlap for multi-party datasets, but the samples are complementary. The vertical federation has a large sample overlap for multi-party data sets, but features are complementary, and the vertical federation is suitable for a scenario in which multiple features from finance, social interaction, games, education and the like serve one business party label.

For example, a vertical federal application scenario may include party a, party B, and party C, where party a is a social media company that possesses social characteristics of numerous groups of people, as shown in table 1. Participant B was a network education company with education-related characteristics for a large number of groups, as shown in table 2. Party C is a credit company that has credit records and a small number of relevant characteristics for a small portion of the population, as shown in table 3.

TABLE 1

TABLE 2

TABLE 3

In this case, data owned by each participant can also be referred to fig. 1.

If the participant C wants to jointly train the model by means of the user characteristics owned by the participants a and B and the credit characteristics owned by the participant C itself and the default labels, it is necessary to protect the privacy of the data (such as characteristics, labels, etc.) of each participant from being leaked to other participants during the training process. After the training is finished, the participant C can make default prediction on numerous people through the trained model, judge whether a new user makes default according to the prediction result, and make follow-up decision, thereby reducing the bad account rate.

In the related art, a Homomorphic Encryption (Homomorphic Encryption) mode is usually adopted to perform privacy protection on data in a joint training process. Wherein, the same asStateful encryption is one of the core technologies in the field of secure multi-party computing, and homomorphism is a property possessed by some cryptographic systems. Specifically, if a cryptosystem is homomorphic, then ciphertext operations (e.g., add, multiply) in ciphertext space may be mapped to plaintext space, which may be used<x>Representing the ciphertext of plaintext x. For example, the well-known RSA cryptosystem is multiplicatively homomorphic in the unfilled (unpadded) state, i.e., satisfies<x ₁ x ₂ >＝<x ₁ ><x ₂ >。

However, the homomorphic encryption method has high computational complexity, and therefore, more computational resources and memory resources need to be consumed in the joint model training process. Moreover, since joint training involves multiple participants, the homomorphic encrypted data needs to be transmitted between the corresponding participants, and the homomorphic encryption method has a problem of expansion of the ciphertext, so that more communication resources need to be consumed for the transmission of the homomorphic encrypted data between the participants.

In view of this, the present disclosure provides a model training method to reduce computational resources, memory resources, and communication resources in a joint training process under the condition of protecting data privacy, and improve joint training efficiency.

FIG. 2 is a flow chart illustrating a method of model training according to an exemplary embodiment of the present disclosure. Referring to fig. 2, the model training method can be applied to one end participating in joint training, including:

step 201, sample data is obtained, and a tag value corresponding to the sample data is determined.

And step 202, determining a predicted value corresponding to the sample data through a joint training model.

And step 203, determining a first gradient and a second gradient of model parameters of the joint training model according to the label value and the predicted value. Wherein the derivative corresponding to the first gradient is different from the derivative corresponding to the second gradient.

And 204, if the first gradient and the second gradient meet the preset gradient condition, performing differential privacy processing on the first gradient to obtain a first target gradient, and performing differential privacy processing on the second gradient to obtain a second target gradient.

And step 205, sending the first target gradient and the second target gradient to the other end participating in the joint training, wherein the first target gradient and the second target gradient are used for adjusting the parameters of the joint training model at the other end.

Through the technical scheme, privacy protection can be performed on data in the combined training process in a differential privacy processing mode. Compared with homomorphic encryption, the differential privacy is plaintext calculation, so that the calculation complexity can be reduced, the efficiency of the joint training can be improved, the consumption of memory resources can be reduced, and the consumption of communication resources in the joint training process can be reduced. On the other hand, before performing the differential privacy processing, it may be determined whether the first gradient and the second gradient satisfy a preset gradient condition, and if the first gradient and the second gradient satisfy the preset gradient condition, the differential privacy processing is performed on the first gradient and the second gradient. Therefore, data distortion caused by differential privacy processing can be reduced, and the joint training effect is improved while data privacy is guaranteed.

In addition, compared to the manner in which the first participant calculates the first gradient and the second participant predicts the second gradient and sends the second gradient to the first participant in the related art, the model training method provided by the present disclosure may be implemented such that the first participant calculates the first gradient and the second gradient based on the real data and then performs local differential privacy sending to the second participant. That is, the gradient of the model parameter for adjustment at the other end participating in the joint training is calculated based on the original data characteristics, and compared with the way of estimating the gradient at the other end, the accuracy of the model training can be improved.

In order to make the model training method provided by the present disclosure more understandable to those skilled in the art, the above steps are exemplified in detail below.

It should be understood at first that each end of the joint training model may maintain a global template, i.e., the joint training model in the embodiments of the present disclosure. Then, in the training process, either end can synchronize respective model training results to the other end based on the data after privacy protection. Wherein, one end (such as a first party in the present disclosure) having the true tag value of the sample data may send the calculation result of the loss function to the other end (such as a second party in the present disclosure) for model training after performing privacy protection. One end having data characteristics (such as a second party in the present disclosure) may synchronize model information after model training to the other end (such as a first party in the present disclosure). Thus, federal learning can be achieved.

Illustratively, the joint training model may be a gradient-boosted tree model, such as XGBoost. Of course, the joint training model may also be any other type of machine learning model, which is not limited in this disclosure. The training participants of the joint training model may include a first participant and a second participant. Wherein the first participant may have a true tag value of the sample data, i.e., one end that performs the model training method in this disclosure. Participant C, such as the example above, has a breach label that characterizes whether the user has a breach condition and associated characteristics of a small portion of the population. The second participant may have data characteristics associated with the sample data, i.e., the other end participating in the joint training. Such as participant a having social characteristics associated with the sample data, or participant B having educational characteristics of the sample data, as exemplified above.

It should be understood that the other end participating in the joint training may include all of the other ends of the participants involved in the joint training model except the end performing the model training method described above. Therefore, in a possible manner, the other end participating in the joint training may be one participant or multiple participants, which is not limited in the embodiment of the present disclosure. For convenience of explanation, the other end is taken as an example below, and it is assumed that one end performing the above model training method does not grasp any data feature, only grasps the real label value, and the other end grasps the data feature, that is, the data content of one end and the other end participating in the joint training below may be as shown in fig. 3.

Illustratively, the predicted value is used for representing the prediction result of the joint training model on the sample data, and the label value is used for representing the real result of the sample data. Therefore, the loss function is calculated based on the predicted value and the label value, so that the joint training model is adjusted according to the result of the loss function, the predicted value and the actual value of the model can be close to each other, and the aim of model training is fulfilled.

In a possible mode, the model information synchronized at the other end may be obtained first, where the model information includes sample space information corresponding to each split node in the joint training model after the model parameters of the joint training model are adjusted at the other end. Correspondingly, the predicted value corresponding to the sample data is determined through the joint training model, which may be: and updating sample space information corresponding to each split node in the joint training model according to the model information, determining a target predicted value of the joint training model for the sample data based on the updated joint training model, and determining the sum of the target predicted value and the historical predicted value of the sample data as the predicted value of the sample data.

In a possible manner, the model information may be obtained by substituting the received first target gradient and the second target gradient into a splitting gain calculation formula of the joint training model at the other end, and then adjusting parameters of the joint training model according to a calculation result of the splitting gain calculation formula.

For example, after receiving the first target gradient and the second target gradient corresponding to a plurality of sample data, the other end may determine the first target total gradient, the second target total gradient, the third target total gradient, and the fourth target total gradient based on the plurality of first target gradients and the plurality of second target gradients. The first target total gradient is the sum of first target gradients of sample data with characteristic values smaller than the target characteristic value, the second target total gradient is the sum of first target gradients of the sample data with characteristic values larger than or equal to the target characteristic value, the third target total gradient is the sum of second target gradients of the sample data with characteristic values smaller than the target characteristic value, the fourth target total gradient is the sum of second target gradients of the sample data with characteristic values larger than or equal to the target characteristic value, and the target characteristic value is a characteristic value corresponding to each box point after the characteristic values of the plurality of sample data are subjected to equal-frequency box separation. Then, the first target total gradient, the second target total gradient, the third target total gradient and the fourth target total gradient can be substituted into a splitting gain calculation formula of the joint training model, and parameters of the joint training model are adjusted according to the result of the splitting gain calculation formula, so that model information to be synchronized is obtained.

For example, first, performing equal frequency binning on each data feature j (j =1, …, m) that the other end has, resulting in k binning points: s _j1 <…<s _jk . Taking the age characteristics shown in table 4 as an example, the equal frequency binning of k =2 is performed to obtain the binning point s ₁ =23 and s ₂ ＝55。

TABLE 4

Age(s)	Case number
		15	0
16	0
		23	1
50	1
		55	2
60	2

In this case, the target feature value s _jk Including two age characteristic values of 23 and 55。

After the other end receives the first target gradient and the second target gradient, all characteristic values smaller than s can be summarized _jk Sample set of

And calculating the sum of the first target gradients for these samples: />

And calculating the sum of the second target gradients for these samples: />

Likewise, all eigenvalues may be summed to be greater than or equal to s _jk In a sample set +>

And calculating the sum of the first target gradients for these samples: />

And calculating the sum of the second target gradients for these samples:

that is to say that the first target overall gradient is->

A second target overall gradient of->

The third target overall gradient is

A fourth target overall gradient of->

Then, the first target total gradient, the second target total gradient, the third target total gradient and the fourth target total gradient may be substituted into a splitting gain calculation formula of the joint training model, and a parameter of the joint training model may be adjusted according to a result of the splitting gain calculation formula. For example, the joint training model is a decision tree model, and the first target overall gradient, the second target overall gradient, the third target overall gradient, and the fourth target overall gradient may be substituted into the following splitting gain calculation formula:

wherein the content of the first and second substances,

λ is a preset regularization coefficient, λ ≧ 0.

Finally, the split point with the largest gain may be selected as the best split point:

thus, the other end may perform model training based on the received first target gradient and second target gradient. And because the privacy protection is carried out through the differential privacy, and the differential privacy is plaintext calculation, the first target gradient and the second target gradient can be directly used for model training at the other end in the model training process, decryption operation is not needed, and compared with homomorphic encryption, the computation complexity can be reduced, and the model training efficiency is improved.

For example, in practical applications, the joint training model is a decision tree model, and the model training process may include a plurality of iterative processes. Correspondingly, in each iteration process, model information synchronized at the other end is obtained, and the model information may include sample space information corresponding to each split node in the joint training model after the other end adjusts parameters of the joint training model. Correspondingly, in each iteration process, the sample space information corresponding to each split node in the joint training model can be updated according to the model information, the target predicted value of the joint training model to the sample data in the iteration process is determined based on the updated joint training model, and then the sum of the target predicted value and the historical predicted value of the joint training model to the sample data in each iteration process before the iteration process is determined as the predicted value of the joint training model to the sample data in the iteration process.

For example, during the first training process, the predicted value may be a preset value, for example, set to 0.5, which is not limited in the embodiment of the present disclosure. Then, the model training method provided by the present disclosure may be executed according to the initially set predicted value and the tag value of the sample data, to obtain a first target gradient and a second target gradient, and send the first target gradient and the second target gradient to the second participant (i.e., the other end participating in the joint training). After the second participant adjusts the parameters of the joint training model according to the first target gradient and the second gradient, the data characteristics of the second participant can be input into the joint training model to obtain the sample spatial information corresponding to each split node in the joint training model, and the sample spatial information can be synchronized to the first participant (i.e., one end executing the model training method in the present disclosure). Then, the first participant may calculate a weight value (i.e., a predicted value) of each leaf node in the joint training model maintained by the first participant according to the synchronized sample space information and the first gradient and the second gradient corresponding to each sample data in the sample space information.

In the second training process, the first participant may determine the predicted value of the joint training model to the sample data in the current iteration process based on the sum of the weight value of each leaf node in the self-maintained joint training model and the historical predicted value of the sample data in the last obtaining process (i.e., the predicted value in the first iteration process is 0.5). And then, executing the model training method provided by the disclosure again according to the predicted value and the label value to obtain a first target gradient and a second target gradient, and sending the first target gradient and the second target gradient to a second participant for model training. By analogy, the predicted value of the iteration process can be determined according to the historical predicted values of all previous iteration processes in each iteration process, and the iteration training is stopped until the preset iteration times are reached or the model parameters meet the preset conditions. Thereby, joint training between the first and second participants may be achieved.

It should be appreciated that in the above-described joint training process, privacy protection of the data is required due to the involvement of multiple participants. Specifically, a first gradient and a second gradient of model parameters of the joint training model can be determined according to the label value and the predicted value, and then privacy protection processing is performed on the first gradient and the second gradient.

Where gradient is a vector (vector) indicating that the directional derivative of a certain function at that point takes the maximum value along that direction, i.e. the function changes the fastest (in the direction of this gradient) at that point, and the rate of change is the maximum (modulo of the gradient).

In a possible manner, a loss function of the joint training model may be calculated according to the label value and the predicted value, and then a first derivative of the loss function is determined as a first gradient and a second derivative of the loss function is determined as a second gradient.

For example, according to the label value y of the ith sample _i And the predicted value of the t-th trained Cheng Zhongdi i samples

The loss function of the joint training model is calculated as: />

Further, the first derivative of the loss function is calculated as:

the second derivative of the loss function is calculated as: />

Wherein, embodiments of the present disclosure assert a loss function>

Is not limited, and a mean square error, i.e. < >>

Accordingly, the first and second derivatives are: />

h _i ＝1。

In the embodiment of the disclosure, after the first gradient and the second gradient are obtained, a data privacy protection mode can be performed by replacing a homomorphic encryption mode with a differential privacy mode, so that computing resources, memory resources and communication resources in a joint training process are reduced under the condition of protecting the data privacy, and the joint training efficiency is improved.

However, the inventor researches and finds that the data distortion may be too large due to a differential privacy mode, and therefore before data privacy protection is performed, it may be determined whether the first gradient and the second gradient satisfy a preset gradient condition, and if the first gradient and the second gradient satisfy the preset gradient condition, the first gradient is subjected to differential privacy processing to obtain a first target gradient, and the second gradient is subjected to differential privacy processing to obtain a second target gradient. Therefore, data distortion caused by differential privacy processing can be reduced, the accuracy of a joint training result is guaranteed while data privacy is guaranteed, and the joint training effect is improved.

In a possible manner, the preset gradient condition may include any one of the following conditions: the variance of the first gradient is less than or equal to a first preset threshold, and the variance of the second gradient is less than or equal to the first preset threshold; the standard deviation of the first gradient is less than or equal to a second preset threshold, and the standard deviation of the second gradient is less than or equal to the second preset threshold; the information entropy of the first gradient is less than or equal to a third preset threshold, and the information entropy of the second gradient is less than or equal to the third preset threshold.

For example, the first preset threshold, the second preset threshold, and the third preset threshold may be set according to actual situations, which is not limited in the embodiment of the present disclosure. It should be understood that the larger the first preset threshold, the second preset threshold, and the third preset threshold is, the lower the requirement for data fidelity is, whereas the smaller the first preset threshold, the second preset threshold, and the third preset threshold is, the higher the requirement for data fidelity is, and in a specific application, the first preset threshold, the second preset threshold, and the third preset threshold may be set according to actual requirements.

It should be appreciated that differential privacy processing may be understood as adding noise to the data such that other users cannot discern whether a sample is in the data set. Thus, the differential privacy processing of the first gradient may be achieved by applying noise to the first gradient, i.e. the first target gradient may be:

likewise, the differential privacy processing of the second gradient may be achieved by applying noise to the second gradient, i.e. the second target gradient may be: />

In a possible manner, a preset corresponding relationship between the noise type and the privacy budget may be obtained first, and then the target noise type for performing the differential privacy processing on the first gradient and the second gradient may be determined according to the preset privacy budget and the preset corresponding relationship. And finally, carrying out differential privacy processing on the first gradient based on the noise corresponding to the target noise type to obtain a first target gradient, and carrying out differential privacy processing on the second gradient based on the noise corresponding to the target noise type to obtain a second target gradient.

For example, a preset correspondence between the noise type and the privacy budget may be set first, such as setting a first privacy budget for the first noise type, a second privacy budget for the second noise type, and so on. In the subsequent process, the target noise type for performing the differential privacy processing on the first gradient and the second gradient can be determined based on the preset corresponding relation according to the given privacy budget. For example, based on a given privacy budget, the target noise type is determined to be laplacian noise. In this case, the variance of the first gradient is:

(ε represents a preset privacy budget) and the variance of the second gradient is:

if the variance of the first gradient is smaller than or equal to a first preset threshold value and the variance of the second gradient is smaller than or equal to the first preset threshold value, performing differential privacy processing on the first gradient based on the noise of the laplacian distribution with the mean value of zero to obtain a first target gradient, and performing differential privacy processing on the second gradient based on the noise of the laplacian distribution with the mean value of zero to obtain a second target gradient.

In other possible manners, if the first gradient and the second gradient do not satisfy the preset gradient condition, the homomorphic encryption processing or the secret sharing processing may be performed on the first target gradient to obtain a first target gradient, and the homomorphic encryption processing or the secret sharing processing may be performed on the second gradient to obtain a second target gradient.

It should be understood that, if the first gradient and the second gradient do not satisfy the preset gradient condition, it indicates that the data distortion is too large due to the differential privacy-based method, and therefore, a homomorphic encryption method or a secret sharing method may be used to perform data privacy protection, so as to ensure that data is not damaged. On the contrary, if the first gradient and the second gradient meet the preset gradient condition, it is indicated that the data distortion is not too large in a differential privacy-based mode, so that the data privacy protection can be performed in a differential privacy-based mode, and the training efficiency is improved.

Therefore, in the joint training process, if precision loss is expected to be increased by using the local differential privacy, the mode is switched to homomorphic or secret sharing, and accuracy is improved. On the contrary, if the precision loss is not increased by using the local differential privacy, the local differential privacy is used, and the speed is increased. That is, the embodiment of the present disclosure may adaptively switch between a lossless but slow homomorphic encryption or secret sharing manner and a lossy but fast local differential privacy manner, so that the model accuracy can be improved while efficient calculation is ensured.

After the first target gradient and the second target gradient are obtained, the method can be usedThe first target gradient and the second target gradient are sent to a second participant (i.e., the other end participating in the joint training) of the joint training model, so that the second participant adjusts parameters of the joint training model maintained by the second participant based on the first target gradient and the second target gradient, and synchronizes model information of the adjusted joint training model to the first participant (i.e., the end executing the model training method in the present disclosure). Therefore, federal learning based on local differential privacy can be achieved, and the joint training efficiency is improved. And the first target gradient and the second target gradient are obtained by processing in a local differential privacy mode, and corresponding real values g of the first target gradient and the second target gradient are obtained _i And h _i The privacy of is protected. In addition, due to g _i And h _i Contains the true tag value y _i So the privacy of the true tag value is also protected.

Through any model training method, the privacy of the first gradient and the privacy of the second gradient can be protected in a local differential privacy mode, and the privacy of a real label value can be further protected. Secondly, the local differential privacy is pure plaintext calculation, and compared with a high-complexity cryptography method, the calculation efficiency can be greatly optimized. Moreover, the receiver does not need to decrypt, and the calculation flow can be optimized, so that the joint training efficiency is improved.

Taking an example that a joint training model is XGboost, model training is performed on data of the same public data set, the model training method provided by the disclosure takes 70 seconds to train each tree, and the SecureBoost algorithm based on a homomorphic encryption mode takes 421 seconds to train each tree. Therefore, the model training method provided by the disclosure can greatly improve the joint training efficiency.

In addition, through tests, aiming at data of the same public data set, the data are sent to a certain end to carry out a non-privacy protection method for model training, and the AUC of a model index is 0.675. In the model training method provided by the disclosure, under the condition that the preset privacy budgets epsilon are 1, 3, 5, 8 and 10 respectively, the difference privacy is performed by using a Duchi method, and the model index AUC is shown in Table 5.

TABLE 5

ε＝1	ε＝3	ε＝5	ε＝8	ε＝10
					0.699	0.699	0.695	0.686	0.691

As can be seen from table 5, compared with a non-privacy protection method, the model training method provided by the present disclosure not only can implement data privacy, but also improves the AUC of the model. That is to say, the model training method provided by the present disclosure can improve the accuracy of the model while ensuring data privacy and efficient computation.

Based on the same inventive concept, the present disclosure provides a model training apparatus, which may be a part or all of an electronic device through software, hardware, or a combination of both. Referring to fig. 4, the model training apparatus 400 is applied to a first participant of a joint training model, and includes:

a first determining module 401, configured to acquire sample data and determine a tag value corresponding to the sample data;

a second determining module 402, configured to determine a predicted value corresponding to the sample data through a joint training model;

a third determining module 403, configured to determine a first gradient and a second gradient of the model parameter of the joint training model according to the label value and the predicted value, where a derivative corresponding to the first gradient is different from a derivative corresponding to the second gradient

A first processing module 404, configured to, when both the first gradient and the second gradient meet a preset gradient condition, perform differential privacy processing on the first gradient to obtain a first target gradient, and perform differential privacy processing on the second gradient to obtain a second target gradient;

a sending module 405, configured to send the first target gradient and the second target gradient to another end participating in joint training, where the first target gradient and the second target gradient are used by the other end to adjust parameters of the joint training model.

Optionally, the apparatus 400 further comprises:

and the second processing module is used for performing homomorphic encryption processing or secret sharing processing on the first target gradient to obtain a first target gradient and performing homomorphic encryption processing or secret sharing processing on the second gradient to obtain a second target gradient when the first gradient and the second gradient do not meet the preset gradient condition.

Optionally, the preset gradient condition includes any one of the following conditions:

the variance of the first gradient is less than or equal to a first preset threshold, and the variance of the second gradient is less than or equal to the first preset threshold;

the standard deviation of the first gradient is less than or equal to a second preset threshold, and the standard deviation of the second gradient is less than or equal to the second preset threshold;

the information entropy of the first gradient is less than or equal to a third preset threshold, and the information entropy of the second gradient is less than or equal to the third preset threshold.

Optionally, the first processing module 404 is configured to:

acquiring a preset corresponding relation between a noise type and a privacy budget;

determining a target noise type for performing differential privacy processing on the first gradient and the second gradient according to a preset privacy budget and the preset corresponding relation;

and performing differential privacy processing on the first gradient based on the noise corresponding to the target noise type to obtain a first target gradient, and performing differential privacy processing on the second gradient based on the noise corresponding to the target noise type to obtain a second target gradient.

Optionally, the apparatus further comprises:

the acquisition module is used for acquiring the model information synchronized at the other end, and the model information comprises sample space information corresponding to each splitting node in the joint training model after the other end adjusts the parameters of the joint training model;

the second determining module 402 is configured to:

updating sample space information corresponding to each splitting node in the joint training model according to the model information, and determining a target predicted value of the sample data based on the updated joint training model;

and determining the sum of the target predicted value and the historical predicted value of the sample data as the predicted value of the sample data.

Optionally, the model information is obtained by substituting the received first target gradient and the second target gradient into a splitting gain calculation formula of the joint training model at the other end, and then adjusting parameters of the joint training model according to a calculation result of the splitting gain calculation formula.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Based on the same inventive concept, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of any of the above-described model training methods.

Based on the same inventive concept, an embodiment of the present disclosure further provides an electronic device, including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to implement the steps of any of the above-described model training methods.

Referring now to FIG. 5, a block diagram of an electronic device 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 501.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the communication may be via any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring sample data and determining a label value corresponding to the sample data; determining a predicted value corresponding to the sample data through a joint training model; determining a first gradient and a second gradient of model parameters of the joint training model according to the label value and the predicted value, wherein a derivative corresponding to the first gradient is different from a derivative corresponding to the second gradient; if the first gradient and the second gradient meet a preset gradient condition, performing differential privacy processing on the first gradient to obtain a first target gradient, and performing differential privacy processing on the second gradient to obtain a second target gradient; and sending the first target gradient and the second target gradient to the other end participating in the joint training, wherein the first target gradient and the second target gradient are used for the other end to adjust the parameters of the joint training model.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module does not in some cases constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides, in accordance with one or more embodiments of the present disclosure, a model training method, comprising:

Example 2 provides the method of example 1, further comprising, in accordance with one or more embodiments of the present disclosure:

and if the first gradient and the second gradient do not meet the preset gradient condition, performing homomorphic encryption processing or secret sharing processing on the first target gradient to obtain a first target gradient, and performing homomorphic encryption processing or secret sharing processing on the second gradient to obtain a second target gradient.

Example 3 provides the method of example 1 or 2, the preset gradient condition comprising any one of:

Example 4 provides the method of example 1 or 2, wherein performing differential privacy processing on the first gradient to obtain a first target gradient and performing differential privacy processing on the second gradient to obtain a second target gradient, according to one or more embodiments of the present disclosure, includes:

acquiring a preset corresponding relation between the noise type and the privacy budget;

Example 5 provides the method of example 1 or 2, further comprising, in accordance with one or more embodiments of the present disclosure:

obtaining synchronous model information of the other end, wherein the model information comprises sample space information corresponding to each splitting node in the joint training model after the other end adjusts the parameters of the joint training model;

the determining the predicted value corresponding to the sample data through the joint training model comprises:

updating sample space information corresponding to each split node in the joint training model according to the model information, and determining a target predicted value of the joint training model to the sample data based on the updated joint training model;

Example 6 provides the method of example 5, and the model information is obtained by substituting the received first target gradient and the second target gradient into a split gain calculation formula of the joint training model by the other end, and then adjusting parameters of the joint training model according to a calculation result of the split gain calculation formula.

Example 7 provides, in accordance with one or more embodiments of the present disclosure, a model training apparatus, the apparatus comprising:

a third determining module, configured to determine, according to the label value and the predicted value, a first gradient and a second gradient of a model parameter of the joint training model, where a derivative corresponding to the first gradient is different from a derivative corresponding to the second gradient;

Example 8 provides the apparatus of example 7, further comprising, in accordance with one or more embodiments of the present disclosure:

Example 9 provides, in accordance with one or more embodiments of the present disclosure, a computer-readable medium having stored thereon a computer program that, when executed by a processing device, performs the steps of the method of any of examples 1-6.

Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-6.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A method of model training, the method comprising:

2. The method of claim 1, further comprising:

3. The method according to claim 1 or 2, wherein the preset gradient condition comprises any one of the following conditions:

4. The method according to claim 1 or 2, wherein the privacy differentiating the first gradient to obtain a first target gradient and the privacy differentiating the second gradient to obtain a second target gradient comprises:

5. The method according to claim 1 or 2, characterized in that the method further comprises:

obtaining synchronous model information of the other end, wherein the model information comprises sample space information corresponding to each splitting node in the joint training model after the model parameters of the joint training model are adjusted by the other end;

updating sample space information corresponding to each splitting node in the joint training model according to the model information, and determining a target predicted value of the joint training model to the sample data based on the updated joint training model;

6. The method according to claim 5, wherein the model information is obtained by the other end adjusting parameters of the joint training model according to a calculation result of a split gain calculation formula after the other end substitutes the received first target gradient and the received second target gradient into the split gain calculation formula of the joint training model.

7. A model training apparatus, the apparatus comprising:

8. The apparatus of claim 7, further comprising:

9. A computer-readable medium, on which a computer program is stored, characterized in that the program, when being executed by processing means, carries out the steps of the method of any one of claims 1 to 6.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 6.