CN112183757A

CN112183757A - Model training method, device and system

Info

Publication number: CN112183757A
Application number: CN201910599381.2A
Authority: CN
Inventors: 陈超超; 李梁; 王力; 周俊
Original assignee: Advanced New Technologies Co Ltd
Current assignee: Advanced New Technologies Co Ltd
Priority date: 2019-07-04
Filing date: 2019-07-04
Publication date: 2021-01-05
Anticipated expiration: 2039-07-04
Also published as: CN112183757B

Abstract

The present disclosure provides a method and apparatus for training a linear/logistic regression model in which a feature sample set is subjected to a vertical-horizontal segmentation transformation to obtain transformed feature sample subsets for each training participant. And obtaining a current predicted value based on the current conversion submodel and the conversion characteristic sample subset of each training participant. At a first training participant, a prediction difference and a first model update quantity are determined, the first model update quantity is decomposed, and a first portion of the model update quantity is sent to a second training participant. And at the second training participant, obtaining a second model updating quantity based on the prediction difference and the corresponding conversion characteristic sample subset, decomposing the second model updating quantity, and sending the second part of model updating quantity to the first training participant. At each training participant, the respective conversion submodel is updated based on the respective partial model update amount. When the loop-ending condition is met, determining respective submodels based on the conversion submodels of the respective training participants.

Description

Model training method, device and system

Technical Field

The present disclosure relates generally to the field of machine learning, and more particularly, to methods, apparatuses, and systems for collaborative training of linear/logistic regression models via multiple training participants using a vertically-segmented training set.

Background

Linear regression models and logistic regression models are widely used regression/classification models in the field of machine learning. In many cases, multiple model training participants (e.g., e-commerce companies, courier companies, and banks) each possess different portions of data for feature samples used to train linear/logistic regression models. The multiple model training participants generally want to use each other's data together to train a linear/logistic regression model uniformly, but do not want to provide their respective data to other individual model training participants to prevent their own data from being leaked.

In view of such a situation, a machine learning method capable of protecting data security is proposed, which is capable of training a linear/logistic regression model in cooperation with a plurality of model training participants for use by the plurality of model training participants while ensuring respective data security of the plurality of model training participants. However, the model training efficiency of the existing machine learning method capable of protecting data security is low.

Disclosure of Invention

In view of the above, the present disclosure provides a method, an apparatus, and a system for collaborative training of a linear/logistic regression model via a plurality of training participants, which can improve the efficiency of model training while ensuring the security of respective data of the plurality of training participants.

According to an aspect of the present disclosure, there is provided a method for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the method being performed by the first training participant, the method comprising: carrying out model conversion processing on the submodels of all the training participants to obtain conversion submodels of all the training participants; the following loop process is executed until a loop end condition is satisfied: performing vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant; obtaining current predicted values for the feature sample set using secret sharing matrix multiplication based on current transformation submodels and transformation feature sample subsets of respective training participants; determining a prediction difference value between the current prediction value and the corresponding mark value; determining a first model update quantity using the prediction difference and a subset of transformed feature samples at the first training participant; decomposing the first model update quantity into two first part model update quantities, and sending one first part model update quantity to the second training participant; and receiving a second partial model update quantity from the second training participant, the second partial model update quantity being obtained by decomposing a second model update quantity at the second training participant, the second model update quantity being obtained by performing secret sharing matrix multiplication on the prediction difference and the conversion feature sample subset at the second training participant; updating the current transition submodel at the first training participant based on the remaining first partial model update quantity and the received second partial model update quantity, wherein the updated transition submodel of each training participant is used as the current transition submodel for the next cycle process when the cycle process is not ended; determining a sub-model of the first training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied.

According to another aspect of the present disclosure, there is provided a method for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the method being performed by the second training participant, the method comprising: carrying out model conversion processing on the submodels of all the training participants to obtain conversion submodels of all the training participants; the following loop process is executed until a loop end condition is satisfied: performing vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant; obtaining current predicted values for the feature sample set using secret sharing matrix multiplication based on current transformation submodels and transformation feature sample subsets of respective training participants; receiving a first partial model update quantity from the first training participant, the first partial model update quantity being a result of a decomposition of a first model update quantity at the first training participant, the first model update quantity being determined at the first training participant using a prediction difference and a subset of transformed feature samples at the first training participant, wherein the prediction difference is a difference between the current prediction value and a corresponding marker value; performing secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain a second model update; decomposing the second model update quantity into two second partial model update quantities, and sending one second partial model update quantity to the first training participant; and updating the current transition submodel of the second training participant based on the remaining second partial model update amount and the received first partial model update amount, wherein, when the cycle process is not finished, the updated transition submodel of each training participant is used as the current transition submodel of the next cycle process; determining a sub-model of the second training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied.

According to another aspect of the present disclosure, there is provided an apparatus for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the apparatus being located on the first training participant side, the apparatus comprising: the model conversion unit is configured to perform model conversion processing on the submodels of the training participants to obtain conversion submodels of the training participants; the sample conversion unit is configured to perform vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant; a prediction value obtaining unit configured to obtain a current prediction value for the feature sample set using secret sharing matrix multiplication based on a current conversion submodel and a conversion feature sample subset of each training participant; a prediction difference determination unit configured to determine a prediction difference between the current prediction value and a corresponding marker value; a model update amount determination unit configured to determine a first model update amount using the prediction difference value and the first converted feature sample subset; a model update amount decomposition unit configured to decompose the first model update amount into two first partial model update amounts; a model update amount transmitting/receiving unit configured to transmit a first partial model update amount to the second training participant and receive a second partial model update amount from the second training participant, the second partial model update amount being obtained by decomposing a second model update amount at the second training participant, the second model update amount being obtained by performing secret sharing matrix multiplication on the prediction difference value and the second conversion feature sample subset; a model updating unit configured to update a current converter model at the first training participant based on the remaining first partial model update amount and the received second partial model update amount; and a model determination unit configured to determine a sub-model of the first training participant based on conversion sub-models of the first training participant and the second training participant when the cycle end condition is satisfied, wherein the sample conversion unit, the predicted value acquisition unit, the prediction difference value determination unit, the model update amount decomposition unit, the model update amount transmission/reception unit, and the model update unit cyclically perform operations until the cycle end condition is satisfied, wherein the updated conversion sub-models of the respective training participants are used as a current conversion sub-model of a next cycle process when a cycle process is not ended.

According to another aspect of the present disclosure, there is provided an apparatus for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the apparatus being located on the side of the second training participant, the apparatus comprising: the model conversion unit is configured to perform model conversion processing on the submodels of the training participants to obtain conversion submodels of the training participants; the sample conversion unit is configured to perform vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant; a prediction value obtaining unit configured to obtain a current prediction value for the feature sample set using secret sharing matrix multiplication based on a current conversion submodel and a conversion feature sample subset of each training participant; a model update amount receiving unit configured to receive a first partial model update amount from the first training participant, the first partial model update amount being obtained by decomposing a first model update amount at the first training participant, the first model update amount being determined at the first training participant using a prediction difference value and a conversion feature sample subset at the first training participant, wherein the prediction difference value is a difference value between the current prediction value and a corresponding label value; a second model update amount determination unit configured to perform secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain a second model update amount; a model update amount decomposition unit configured to decompose the second model update amount into two second partial model update amounts; a model update amount sending unit configured to send a second partial model update amount to the first training participant; a model updating unit configured to update a current sub-model of the second training participant based on the remaining second partial model update amount and the received first partial model update amount; and a model determination unit configured to determine a sub-model of the second training participant based on conversion sub-models of the first training participant and the second training participant when the cycle end condition is satisfied, wherein the sample conversion unit, the predicted value acquisition unit, the model update amount reception unit, the model update amount determination unit, the model update amount decomposition unit, the model update amount transmission unit, and the model update unit cyclically perform operations until the cycle end condition is satisfied, wherein the updated conversion sub-models of the respective training participants are used as a current conversion sub-model of a next cycle process when a cycle process is not ended.

According to another aspect of the present disclosure, there is provided a system for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the system comprising: a first training participant device comprising means for co-training a linear/logistic regression model via first and second training participants as described above; and a second training participant device comprising means for co-training the linear/logistic regression model via the first and second training participants as described above.

According to another aspect of the present disclosure, there is provided a computing device comprising: at least one processor, and a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method performed on the first training participant side as described above.

According to another aspect of the present disclosure, there is provided a machine-readable storage medium storing executable instructions that, when executed, cause the at least one processor to perform a training method as described above performed on a first training participant side.

According to another aspect of the present disclosure, there is provided a computing device comprising: at least one processor, and a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform a training method performed on a second training participant side as described above.

According to another aspect of the present disclosure, there is provided a machine-readable storage medium storing executable instructions that, when executed, cause the at least one processor to perform a training method as described above performed on a second training participant side.

By using the scheme of the embodiment of the disclosure, the model parameters of the linear/logistic regression model can be obtained by training without leaking the secret data of the training participants, and the workload of the model training is only in linear relation with the number of the feature samples used for training, rather than exponential relation, so that compared with the prior art, the scheme of the embodiment of the disclosure can improve the efficiency of the model training under the condition of ensuring the safety of the respective data of the training participants.

Drawings

A further understanding of the nature and advantages of the present disclosure may be realized by reference to the following drawings. In the drawings, similar components or features may have the same reference numerals.

FIG. 1 shows a schematic diagram of an example of vertically sliced data according to an embodiment of the present disclosure;

FIG. 2 illustrates an architectural diagram showing a system for collaborative training of a linear/logistic regression model via two training participants, according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a method for collaborative training of a linear/logistic regression model via two training participants, in accordance with an embodiment of the present disclosure;

FIG. 4 shows a flow diagram of one example of a model transformation process, according to an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of one example of a feature sample set transformation process, in accordance with an embodiment of the present disclosure;

FIG. 6 shows a flow diagram of a predictive value acquisition process according to an embodiment of the disclosure;

FIG. 7 shows a flowchart of one example of a secret-shared-matrix multiplication with a trusted initializer according to an embodiment of the disclosure;

FIG. 8 shows a flowchart of one example of untrusted initializer secret sharing matrix multiplication according to an embodiment of the present disclosure;

FIG. 9 illustrates a block diagram of an apparatus for collaborative training of a linear/logistic regression model via two training participants, in accordance with an embodiment of the present disclosure;

fig. 10 shows a block diagram of one example of a prediction value acquisition unit according to an embodiment of the present disclosure;

FIG. 11 illustrates a block diagram of an apparatus for collaborative training of a linear/logistic regression model via two training participants, in accordance with an embodiment of the present disclosure;

FIG. 12 shows a schematic diagram of a computing device for collaborative training of a linear/logistic regression model via two training participants, in accordance with an embodiment of the present disclosure;

FIG. 13 shows a schematic diagram of a computing device for collaborative training of a linear/logistic regression model via two training participants, in accordance with an embodiment of the present disclosure.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and thereby implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as needed. For example, the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. In addition, features described with respect to some examples may also be combined in other examples.

As used herein, the term "include" and its variants mean open-ended terms in the sense of "including, but not limited to. The term "based on" means "based at least in part on". The terms "one embodiment" and "an embodiment" mean "at least one embodiment". The term "another embodiment" means "at least one other embodiment". The terms "first," "second," and the like may refer to different or the same object. Other definitions, whether explicit or implicit, may be included below. The definition of a term is consistent throughout the specification unless the context clearly dictates otherwise.

The secret sharing method is a cryptographic technology for decomposing and storing a secret, and divides the secret into a plurality of secret shares in a proper manner, each secret share is owned and managed by one of a plurality of participants, a single participant cannot recover the complete secret, and only a plurality of participants cooperate together can the complete secret be recovered. The secret sharing method aims to prevent the secret from being too concentrated so as to achieve the purposes of dispersing risks and tolerating intrusion.

Secret sharing methods can be roughly divided into two categories: there is a trusted initializer secret sharing method and a untrusted initializer secret sharing method. In the secret sharing method with a trusted initiator, the trusted initiator is required to perform parameter initialization (often to generate random numbers meeting certain conditions) on each participant participating in multi-party secure computation. After the initialization is completed, the trusted initialization party destroys the data and disappears at the same time, and the data are not needed in the following multi-party security calculation process.

The trusted initializer secret sharing matrix multiplication is applicable to the following situations: the complete secret data is a product of the first set of secret shares and the second set of secret shares, and each of the participants has one of the first set of secret shares and one of the second set of secret shares. By the secret sharing matrix multiplication of the trusted initiator, each of the multiple participants can obtain partial complete secret data of the complete secret data, the sum of the partial complete secret data obtained by each participant is the complete secret data, and each participant discloses the obtained partial complete secret data to the rest of the participants, so that each participant can obtain the complete secret data without disclosing the secret share owned by each participant, thereby ensuring the safety of the data of each of the multiple participants.

Untrusted initializer secret sharing matrix multiplication is one of the secret sharing methods. Secret-sharing matrix multiplication by an untrusted initializer is applicable to the case where the complete secret is the product of a first secret share and a second secret share, and both parties own the first and second secret shares, respectively. By secret sharing matrix multiplication by an untrusted initiator, each of the two parties that own a respective secret share generates and discloses data that is different from the secret share that they own, but the sum of the data that the two parties each disclose is equal to the product of the secret shares that the two parties each own (i.e., the complete secret). Therefore, the parties can recover the complete secret by the cooperative work of the secret sharing matrix multiplication of the trusted initialization party without disclosing the secret shares owned by the parties, and the data security of the parties is guaranteed.

In the present disclosure, the training sample set used in the linear/logistic regression model training scheme is a vertically sliced training sample set. The term "vertically dividing the training sample set" refers to dividing the training sample set into a plurality of training sample subsets according to a module/function (or some specified rule), where each training sample subset includes a part of the training subsamples of each training sample in the training sample set, and all the training subsamples included in the training sample subset constitute the training sample. In one example, assume that the training sample includes label y₀And attribute

Then after vertical segmentation, trainThe training participant Alice owns y of the training sample₀And

and that the training participants Bob possess

In another example, assume that the training sample includes label y₀And attribute

Then after vertical slicing, the training participant Alice owns y of the training sample₀And

and that the training participants Bob possess

And

in addition to these two examples, there are other possible scenarios, which are not listed here.

Suppose a sample x of attribute values described by d attributes (also called features) is given^T＝(x₁；x₂；…；x_d) Wherein x is_iIf the value sum T of x on the ith attribute represents transposition, the linear regression model is Y ═ Wx, and the logistic regression model is Y ═ 1/(1+ e)^-wx) Where Y is the predicted value, and W is the model parameter of the linear/logistic regression model (i.e., the model described in this disclosure),

W_Prefers to a sub-model at each training participant P in the present disclosure. In this disclosure, attribute value samples are also referred to as feature data samples.

In the present disclosure, each training participant has a different portion of the data of the training samples used to train the linear/logistic regression model. For example, taking two training participants as an example, assuming that the training sample set includes 100 training samples, each of which contains a plurality of feature values and labeled actual values, the data owned by the first participant may be the partial feature values and labeled actual values of each of the 100 training samples, and the data owned by the second participant may be the partial feature values (e.g., remaining feature values) of each of the 100 training samples.

The matrix multiplication computation described anywhere in this disclosure needs to determine whether to transpose a corresponding matrix of one or more of two or more matrices participating in matrix multiplication or not, as the case may be, to satisfy a matrix multiplication rule, thereby completing the matrix multiplication computation.

Embodiments of a method, apparatus, and system for collaborative training of a linear/logistic regression model via two training participants according to the present disclosure are described in detail below with reference to the accompanying drawings.

Fig. 1 shows a schematic diagram of an example of a vertically sliced training sample set according to an embodiment of the present disclosure. In fig. 1,2 data parties Alice and Bob are shown, as are the multiple data parties. Each data party Alice and Bob owns part of the training subsamples of each of all the training samples in the training sample set, and for each training sample, the part of the training subsamples owned by data parties Alice and Bob are combined together to form the complete content of the training sample. For example, assume that the content of a certain training sample includes a label (hereinafter referred to as "label value") y₀And attribute features (hereinafter referred to as "feature samples")

and that the training participants Bob possess

Fig. 2 shows an architectural schematic diagram illustrating a system 1 for collaborative training of a linear/logistic regression model via two training participants (hereinafter referred to as model training system 1) according to an embodiment of the present disclosure.

As shown in fig. 2, the model training system 1 comprises a first training participant device 10 and a second training participant device 20. The first training participant device 10 and the second training participant device 20 may communicate with each other over a network 30 such as, but not limited to, the internet or a local area network. In the present disclosure, the first training participant device 10 and the second training participant device 20 are collectively referred to as training participant devices. Wherein the first training participant device 10 owns the tag value and the second training participant device 20 does not own the tag value.

In the present disclosure, the trained linear/logistic regression model is decomposed into 2 sub-models, one for each training participant device. Training sample sets for model training are located at the first training participant device 10 and the second training participant device 20, the training sample sets being vertically partitioned as described above, and the training sample sets comprising feature data sets and corresponding marker values, i.e., X shown in fig. 1₀And y₀. The submodel and corresponding training samples owned by each training participant are secret to that training participant and cannot be learned or are completely learned by other training participants.

In the present disclosure, the linear/logistic regression model and the submodels of each training participant are represented using a weight vector W and a weight subvector Wi, respectively, where i is used to represent the serial number or identification (e.g., a and B) of the training participant. The feature data set is represented using a feature matrix X, and the predictor and the tag value are respectively represented using a predictorVector quantity

And a vector of tag values Y.

In performing model training, the first training participant device 10 and the second training participant device 20 together perform a secret shared matrix multiplication using the respective subset of training samples and the respective submodels to obtain predicted values for the set of training samples to cooperatively train the linear/logistic regression model. The specific training process for the model will be described in detail below with reference to fig. 3 to 8.

In the present disclosure, the first training participant device 10 and the second training participant device 20 may be any suitable computing device with computing capabilities. The computing devices include, but are not limited to: personal computers, server computers, workstations, desktop computers, laptop computers, notebook computers, mobile computing devices, smart phones, tablet computers, cellular phones, Personal Digital Assistants (PDAs), handheld devices, messaging devices, wearable computing devices, consumer electronics, and so forth.

FIG. 3 shows a flow diagram of a method for collaborative training of a linear/logistic regression model via two training participants, in accordance with an embodiment of the present disclosure. In the training method shown in FIG. 3, a first training participant Alice has a sub-model W of a linear/logistic regression model_AThe second training participant Bob has a sub-model W of the linear/logistic regression model_BThe first training participant Alice has a first feature sample subset X_AAnd a label value Y, the second training participant Bob having a second subset X of feature samples_BFirst subset of feature samples X_AAnd a second subset of feature samples X_BIs obtained by vertically segmenting a feature sample set X used for model training.

As shown in FIG. 3, first, at block 301, a first training participant Alice, a second training participant Bob initialize their sub-model parameters, i.e., weight sub-vectors W_AAnd W_BTo obtain the initial values of its sub-model parameters, and will have executedThe number of row training cycles t is initialized to zero. Here, it is assumed that the end condition of the loop process is that a predetermined number of training loops are performed, for example, T training loops are performed.

After initialization as above, at block 302, model transformation processes are performed on the respective initial sub-models at Alice and Bob, respectively, to obtain transformation sub-models.

FIG. 4 shows a flow diagram of one example of a model transformation process, according to an embodiment of the present disclosure.

As shown in FIG. 4, at Alice, at block 410, the submodel W that Alice has is modeled_ADecomposition into W_A1And W_A2. Here, the sub-pattern W_AIn the decomposition process, aiming at the sub-model W_AThe attribute value of the element is decomposed into 2 partial attribute values, and 2 new elements are obtained by using the decomposed partial attribute values. Then, the resulting 2 new elements are assigned to W, respectively_A1And W_A2Thereby obtaining W_A1And W_A2。

Next, at Block 420, at Bob, the sub-model W Bob has_BDecomposition into W_B1And W_B2。

Then, at block 430, Alice will W_A2Sent to Bob and, at block 440, Bob sends W_B1And sending the data to Alice.

Next, at block 450, at Alice, for W_A1And W_B1Splicing to obtain the converted submodel W_A'. The resulting converted submodel W_AThe dimension of' is equal to the dimension of the feature sample set used for model training. At block 460, at Bob, for W_A2And W_B2Splicing to obtain the converted submodel W_B'. Likewise, the resulting converted submodel W_BThe dimension of' is equal to the dimension of the feature sample set used for model training.

Returning to FIG. 3, after the model conversion is completed as above, at block 303, the first training participant Alice and the second training participant Bob cooperate to pair the first feature sample subset X_AAnd a second subset of feature samples X_BTo carry outVertical-to-horizontal slicing conversion to obtain a first conversion feature sample subset X_A' and second transform feature sample subset X_B'. The resulting first transformed feature sample subset X_A' and second transform feature sample subset X_B' each feature sample in the set has the complete feature content of each training sample, i.e., a subset of feature samples similar to that obtained by horizontally slicing the feature sample set.

Fig. 5 shows a flow diagram of a feature sample set transformation process according to an embodiment of the present disclosure.

As shown in FIG. 5, at Block 510, at Alice, a first feature sample subset X is combined_ADecomposition into X_A1And X_A2. At block 520, at Bob, the second feature sample subset X is combined_BDecomposition into X_B1And X_B2. For a subset of feature samples X_AAnd X_BDecomposition process and for submodel W_AThe decomposition process of (a) is exactly the same. Then, at block 530, Alice compares X with X_A2Sent to Bob, and at block 540, Bob sends X_B1And sending the data to Alice.

Next, at block 550, at Alice, X is paired_A1And X_B1Stitching to obtain a first transformed feature sample subset X_A'. The resulting first transformed feature sample subset X_A' is equal in dimension to the feature sample set X used for model training. At block 560, at Bob, for X_A2And X_B2Performing stitching to obtain a second conversion feature sample subset X_B'. Transforming feature sample subset X_BThe dimension of' is the same as the dimension of the feature sample set X.

For the first feature sample subset X as above_AAnd a second subset of feature samples X_BAfter the vertical slice-horizontal slice conversion is performed, the operations of blocks 304 through 314 are performed in a loop until a loop end condition is satisfied.

Specifically, at block 304, based on the current submodel W of the individual training participants_AAnd W_BAnd a respective transformed feature sample subset X of each training participant_A' and X_B', obtaining information to be shared using secret shared matrix multiplicationTrained linear/logistic regression model's current predicted values for feature sample set X

How to use secret shared matrix multiplication to obtain current predicted values of a linear/logistic regression model to be trained for a feature sample set X

Which will be described below with reference to fig. 6 to 8.

Obtaining the current predicted value

Thereafter, at a first training participant Alice, a current predictor is determined, at block 305

Predicted difference between corresponding flag value Y

Where E is a column vector, Y is a column vector representing the label values of the training samples X, and,

is a column vector representing the current predictor for training sample X. E, Y and if training sample X contains only a single training sample

Are column vectors having only a single element. E, Y if training sample X contains multiple training samples

Are column vectors having a plurality of elements, wherein,

is a current predicted value of a corresponding training sample of the plurality of training samples, each element of YThe element is a labeled value of a corresponding training sample of the plurality of training samples, and each element in E is a difference of the labeled value of the corresponding training sample of the plurality of training samples and the current predicted value.

Then, at block 306, at Alice, the prediction difference E and the first transformed feature sample subset X are used_A' determining a first model update TMP1 ═ X_A' E. Then, at block 307, at Alice, the first model update quantity TMP1 is decomposed into TMP1 — TMP1_A+TMP1_B. Here, the decomposition process for the TMP1 is the same as the decomposition process described above and will not be described again. Subsequently, at block 308, Alice connects the TMP1_BSent to Bob.

Then, at block 309, Alice and Bob pair the predicted difference E and the second transformed feature sample subset X_B' performing a secret sharing matrix multiplication to calculate a second model update TMP2 ═ X_B' E. Then, at block 310, at Bob, the second model update TMP2 is decomposed into TMP2 — TMP2_A+TMP2_B. Subsequently, at block 311, Bob applies TMP2_AAnd sending the data to Alice.

Next, at block 312, at Alice, based on TMP1_AAnd TMP2_ATo the current converter model W at Alice_A' update. Specifically, first, TMP is calculated_A＝TMP1_A+TMP2_AThen, using TMP_ATo update the current conversion submodel W_A', for example, the sub-model update can be performed using the following equation (1):

wherein, W_A' (n) is the current converter model at Alice, W_A' (n +1) is the updated conversion submodel at Alice, α is the learning rate (learning rate), and S is the number of training samples used by the round of model training process, i.e., the batch size (batch size) of the round of model training process.

At block 313, at Bob, based on TMP1_BAnd TMP2_BTo the current converter sub-model W at Bob_B' update. Specifically, first, TMP is calculated_B＝TMP1_B+TMP2_BThen, using TMP_BTo update the current conversion submodel W_B', for example, the sub-model update can be performed using the following equation (2):

wherein, W_B' (n) is the current converter model at Bob, W_B' (n +1) is the updated conversion submodel at Bob, α is the learning rate (learning rate), and S is the number of training samples used by the round of model training process, i.e., the batch size (batch size) of the round of model training process.

Then, at block 314, it is determined whether a predetermined number of cycles has been reached, i.e., whether a cycle end condition has been reached. If a predetermined number of cycles (e.g., T) is reached, block 315 is entered. If the predetermined number of cycles has not been reached, flow returns to the operation of block 302 to perform a next training cycle in which the updated submodel obtained by the respective training participant in the current cycle is used as the current submodel for the next training cycle.

At block 315, sub-models (i.e., trained sub-models) at Alice and Bob are determined based on the updated transition sub-models of Alice and Bob, respectively.

Specifically, W is trained as described above_A' and W_B', Alice will W_A'[|A|:]Sent to Bob, and Bob sends W_B'[0：|A|]And sending the data to Alice. Here, W_A'[|A|:]Means W_AThe vector component after the' A dimension (i.e., | A |), W_B'[0：|A|]Means W_BThe vector components before the a dimension (i.e., | a |) in' i.e., the components from 0 to | a |. For example, let W be [0,1,2,3,4 ═ W]If | A | is 2, then W [0: | A | Y]＝[0,1]And W [ | A |:]＝[2,3,4]. Then, at Alice, W is calculated_A＝W_A'[0：|A|]+W_B'[0：|A|]And at Bob, calculating W_B＝W_B'[|A|：]+W_B'[|A|：]Thus obtaining the sub-models W after training at Alice and Bob_AAnd W_B。

It is to be noted here that, in the above example, the end condition of the training loop process means that the predetermined number of loops is reached. In other examples of the disclosure, the end condition of the training loop process may also be that the determined prediction difference is within a predetermined range, i.e., each element E in the prediction difference E_iAll within a predetermined range, e.g. predicting each element E of the difference E_iAre less than a predetermined threshold or the mean of the predicted differences E is less than a predetermined threshold. Accordingly, the operations of block 314 in FIG. 3 may be performed after the operations of block 305.

Here, it is to be noted that, at X_iWhen it is a single feature sample, X_iIs a feature vector (column vector or row vector) consisting of multiple attributes, and E is a single prediction difference. At X_iWhen it is a plurality of feature samples, X_iIs a feature matrix, the attributes of each feature sample form a feature matrix X_iAnd E is the prediction difference vector. In calculating X_iWhen E, multiplied by each element in E is the matrix X_iThe characteristic value of each sample corresponding to a certain characteristic. For example, assuming E is a column vector, each time E is multiplied by the matrix X_iThe element in the row represents the feature value of a certain feature corresponding to each sample.

Fig. 6 shows a flowchart of a predictive value acquisition process according to an embodiment of the present disclosure.

As shown in FIG. 6, first, at block 601, at Alice, a first subset of transformed feature samples X is used_A' and Current converter model W_A', calculate Z_A1＝X_A'*W_A'. At Bob, at block 602, a second subset of transformed feature samples X is used_B' and Current converter model W_B', calculate Z_B1＝X_B'*W_B'。

Then, at block 603, Alice and Bob calculate Z using secret sharing matrix multiplication₂＝X_A'*W_B' and Z₃＝X_B'*W_A'. Here, the secret-sharing matrix multiplication may use a party with a trusted party to initialize the secret-sharing matrix multiplication and a party without a trusted party to initialize the secret-sharing matrix multiplication. The following description will be made with reference to fig. 7 and 8, respectively, regarding the trusted party initialized secret-shared matrix multiplication and the untrusted party initialized secret-shared matrix multiplication.

Next, at block 604, at Alice, Z is determined₂Decomposition to Z_A2And Z_B2. At block 505, at Bob, Z is added₃Decomposition to Z_A3And Z_B3. Here, for Z₂And Z₃The decomposition process of (a) is the same as the decomposition process described above for the feature sample subset and will not be described again here.

Then, at block 606, Alice compares Z_B2Sent to Bob and, at block 607, Bob will Z_A3And sending the data to Alice.

Next, at block 608, at Alice, Z is calculated_A＝Z_A1+Z_A2+Z_A3. At block 609, at Bob, Z is calculated_B＝Z_B1+Z_B2+Z_B3. Then, at block 610, Bob combines Z with Z_BSent to Alice and, at block 611, Alice sends Z_ASent to Bob.

Upon respectively receiving Z_AAnd Z_BThereafter, at block 612, at Alice and Bob, predicted values are obtained

Figure 7 illustrates a flow diagram of one example of a secret-shared-matrix multiplication with a trusted initializer according to an embodiment of the disclosure. Multiplication with trusted party secret sharing matrix to compute Z shown in FIG. 7₂＝X_A'*W_B' As an example, where X_A' is a subset of transformed samples at Alice (hereinafter referred to as the feature matrix), W_B' is the conversion submodel at Bob (hereinafter referred to as the weight vector).

As shown in FIG. 7, first, at the trusted initiator 30, 2 random weight vectors W are generated_R，1And W _R，22 random feature matrices X_R，1、X_R，2And 2 vectors of random tag values Y_R，1、Y_R，2Wherein, in the step (A),

here, the dimension of the random weight vector is the same as that of the conversion submodel (weight vector) of each training participant, the dimension of the random feature matrix is the same as that of the conversion sample subset (feature matrix), and the dimension of the random token value vector is the same as that of the token value vector.

The trusted initiator 30 then forwards the generated W at block 701_R，1、X_R，1And Y_R，1Sent to Alice and at block 702, the generated W is transmitted_R，2、X_R，2And Y_R，2Sent to Bob.

Next, at block 703, at Alice, the feature matrix X is applied_A' decomposition into 2 feature sub-matrices, i.e. feature sub-matrix X_A1' and X_A2'。

For example, assume feature matrix X_A' includes two feature samples S1 and S2, each of the feature samples S1 and S2 including 3 attribute values, where S1 ═ a₁ ¹,a₂ ¹,a₃ ¹]And S2 ═ a₁ ²,a₂ ²,a₃ ²]Then, the feature matrix X is_A' decomposition into 2 feature sub-matrices X_A1' and X_A2' thereafter, a first feature submatrix X_A1' includes a characteristic subsample [ a₁₁ ¹,a₂₁ ¹,a₃₁ ¹]And a characteristic subsample [ a₁₁ ²,a₂₁ ²,a₃₁ ²]Second feature submatrix X_A2' includes a characteristic subsample [ a₁₂ ¹,a₂₂ ¹,a₃₂ ¹]And a characteristic subsample [ a₁₂ ²,a₂₂ ²,a₃₂ ²]Wherein a is₁₁ ¹+a₁₂ ¹＝a₁ ¹，a₂₁ ¹+a₂₂ ¹＝a₂ ¹，a₃₁ ¹+a₃₂ ¹＝a₃ ¹，a₁₁ ²+a₁₂ ²＝a₁ ²，a₂₁ ²+a₂₂ ²＝a₂ ²And a₃₁ ²+a₃₂ ²＝a₃ ²。

Then, at block 704, Alice decomposes the decomposed feature submatrix X_A2' sent to Bob.

At block 705, at Bob, a weight vector W is applied_B' decomposition into 2 weight subvectors W_B1' and W_B2'. The decomposition process of the weight vector is the same as the decomposition process described above. At block 706, Bob weights the subvector W_B1' to Alice.

Then, at each training participant, a weight sub-vector difference E and a feature sub-matrix difference D at the training participant are determined based on the weight sub-vector, the corresponding feature sub-matrix, and the received random weight vector and random feature matrix of the training participant. For example, at block 707, at Alice, its weighted sub-vector difference E1 is determined to be W_B1'-W_R，1And the feature submatrix difference D1 ═ X_A1'-X_R，1. At block 708, at Bob, its weight subvector difference E2 ═ W is determined_B2'-W_R，2And the feature submatrix difference D2 ═ X_A2'-X_R，2。

After each training participant determines the weight sub-vector difference Ei and the feature sub-matrix difference Di, at block 709, Alice sends D1 and E1 to the training cooperator Bob, respectively. At block 710, the training cooperator Bob sends D2 and E2 to Alice.

Then, at each training participant, the weight sub-vector difference value and the feature sub-matrix difference value at each training participant are summed to obtain a weight sub-vector total difference value E and a feature sub-matrix total difference value D, respectively, at block 711. For example, as shown in fig. 7, D-D1 + D2, and E-E1 + E2.

Then, at each training participant, based on the received random weight vector W_R,iRandom feature matrix X_R,iVector of random mark values Y_R,iAnd calculating the predicted value vector Zi corresponding to the weight sub-vector total difference E and the feature sub-matrix total difference D respectively.

In one example of the present disclosure, at each training participant, the random labeled value vector of the training participant, the product of the total difference value of the weight sub-vectors and the random feature matrix of the training participant, and the product of the total difference value of the feature sub-matrices and the random weight vector of the training participant may be summed to obtain the corresponding predictor vector (first calculation). Alternatively, the random labeled value vector of the training participant, the product of the total difference value of the weight sub-vectors and the random feature matrix of the training participant, the product of the total difference value of the feature sub-matrices and the random weight vector of the training participant, and the product of the total difference value of the weight sub-vectors and the total difference value of the feature sub-matrices may be summed to obtain the corresponding predictor matrix (second calculation).

It should be noted here that, in the predictor matrix calculation at each training participant, only one predictor matrix calculated at each training participant includes the product of the total weight sub-vector difference and the total feature sub-matrix difference. In other words, for each training participant, only one of the training participants' predictor vectors is calculated in the second calculation, while the remaining training participants calculate the corresponding predictor vector in the first calculation.

For example, at block 712, at Alice, the corresponding predictor vector Z1 ═ Y is calculated_R，1+E*X_R，1+D*W_R，1+ D × E. At block 713, at Bob, the corresponding predictor vector Z2 ═ Y is calculated_R，2+E*X_R，2+D*W_R，2。

Note that, in fig. 7, Z1 calculated at Alice includes D × E. In other examples of the disclosure, D _ E may also be included in Zi calculated by Bob, and accordingly, D _ E is not included in Z1 calculated at Alice. In other words, only one of the zis calculated at each training participant contains D × E.

Alice then sends Z1 to Bob at block 714. At block 715, Bob sends Z2 to Alice.

Then, at

blocks

716 and 717, the training participants sum Z-Z1 + Z2 to obtain the secret sharing matrix multiplication result.

Figure 8 illustrates a flow diagram of one example of untrusted initializer secret sharing matrix multiplication according to an embodiment of the present disclosure. In FIG. 8, to train the X between the participants Alice and Bob_A'*W_BThe calculation process of' is explained as an example.

As shown in FIG. 8, first, at block 801, if X at Alice_A' (hereinafter referred to as first feature matrix) is not even in number of rows, and/or the current sub-model parameter W at Bob_B' (hereinafter referred to as first weight submatrix) is not even, the first feature matrix X is subjected to_A' and/or a first weight submatrix W_B' conducting dimension completion processing so that the first feature matrix X_A' the number of rows is even and/or the first weight submatrix W_B' is even. For example, the first feature matrix X_A' the end of the line is increased by a line 0 value and/or the first weight submatrix W_B' the dimension completion processing is performed by adding a row of 0 values at the end of the row. In the following description, it is assumed that the first weight submatrix W_B' dimension is I X J, and a first feature matrix X_A' has dimension J x K, wherein J is an even number.

The operations of blocks 802 to 804 are then performed at Alice to obtain a random feature matrix X1, second and third feature matrices X2 and X3. Specifically, at block 802, a random feature matrix X1 is generated. Here, the dimension of the random feature matrix X1 is the same as the first feature matrix X_A' are identical in dimension, i.e., the random feature matrix X1 has dimension J × K. At block 803, the random feature matrix X1 is subtracted from the first feature matrix X_A', to obtain a second feature matrix X2. The dimension of the second feature matrix X2 is J × K. In thatAt block 804, the even row submatrix X1_ e of the random feature matrix X1 is subtracted from the odd row submatrix X1_ o of the random feature matrix X1 to obtain a third feature matrix X3. The dimension of the third feature matrix X3 is J × K, where J is J/2.

Further, the operations of blocks 805 to 807 are performed at Bob to obtain a random weight submatrix W_B1A second and a third weight submatrix W_B2And W_B3. Specifically, at block 805, a random weight submatrix W is generated_i1. Here, the random weight submatrix W_B1Dimension of (d) and a first weight submatrix W_B' same dimension, i.e. random weight submatrix W_i1Is I x J. At block 806, the first weight submatrix W is processed_B' and random weight submatrix W_B1Summing to obtain a second weight submatrix W_B2. Second weight submatrix W_B2Is I x J. At block 807, the random weight submatrix W_B1Odd column submatrix W_{B1_o}Adding a random weight sub-matrix W_B1Of even-numbered rows of the submatrix W_{B1_e}To obtain a third weight submatrix W_B3. Third weight submatrix W_B3Is represented by I x J, where J/2.

Then, at block 808, Alice sends the generated second feature matrix X2 and third feature matrix X3 to Bob, and at block 809, Bob sends a second weight submatrix W_B2And a third weight submatrix W_B3And sending the data to Alice.

Next, at block 810, at Alice, W based on equation Y1_B2*(2*X_A'-X1)-W_B3(X3+ X1_ e) performs a matrix calculation to get the first matrix product Y1, and at block 812, sends the first matrix product Y1 to Bob.

At block 811, at Bob, (W) based on equation Y2_B'+2*W_B1)*X2+(W_B3+W_{B1_o}) X3 computes a second matrix product Y2 and, at block 813, sends the second matrix product Y2 to Alice.

Then, at

blocks

814 and 815, the first matrix product Y1 and the second matrix product Y2 are summed at Alice and Bob, respectively, to obtain X_A'*W_B'＝Y_B＝Y1+Y2。

Here, fig. 6 to 8 show a calculation process of the current predicted value Y — W × X in the linear regression model. In the case of a logistic regression model, W X may be determined according to the procedure shown in fig. 6 to 8, and then substituted into the logistic regression model Y1/(1 + e)^-wx) Thereby calculating the current predicted value.

By using the linear/logistic regression model training method disclosed in fig. 3 to 8, the model parameters of the linear/logistic regression model can be obtained by training without leaking the secret data of the plurality of training participants, and the workload of model training is only in linear relationship with the number of the feature samples used for training, rather than exponential relationship, so that the efficiency of model training can be improved under the condition of ensuring the safety of the respective data of the plurality of training participants.

Fig. 9 shows a schematic diagram of an apparatus for collaborative training of a linear/logistic regression model via two training participants (hereinafter referred to as a model training apparatus) 900 according to an embodiment of the present disclosure. Each training participant has one sub-model of said linear/logistic regression model, the first training participant (Alice) has a first subset of feature samples and labeled values, the second training participant (Bob) has a second subset of feature samples, the first and second subsets of feature samples are obtained by vertically slicing a set of feature samples used for model training, and the model training apparatus 900 is located at the side of the first training participant.

As shown in fig. 9, the model training apparatus 900 includes a model conversion unit 910, a sample conversion unit 920, a prediction value acquisition unit 930, a prediction difference determination unit 940, a model update amount determination unit 950, a model update amount decomposition unit 960, a model update amount transmission/reception unit 970, a model update unit 980, and a model determination unit 990.

The model transformation unit 910 is configured to perform a model transformation process on the sub-models of the respective training participants to obtain transformation sub-models of the respective training participants. The operation of the model conversion unit 910 may refer to the operation of block 302 described above with reference to fig. 3 and the operation described with reference to fig. 4.

At the time of model training, the sample conversion unit 920, the predicted value acquisition unit 930, the predicted difference value determination unit 940, the model update amount determination unit 950, the model update amount decomposition unit 960, the model update amount transmission/reception unit 970, and the model update unit 980 are configured to cyclically perform operations until a cycle end condition is satisfied. The loop-ending condition may include: reaching a predetermined cycle number; or the determined prediction difference is within a predetermined range. When the loop process is not finished, the updated conversion submodel of each training participant is used as the current conversion submodel of the next loop process.

Specifically, in each iteration process, the sample conversion unit 920 is configured to perform vertical-horizontal slicing conversion on the feature sample set to obtain a converted feature sample subset at each training participant. The operation of the sample conversion unit 920 may refer to the process described above with reference to fig. 5.

The prediction value obtaining unit 930 is configured to obtain a current prediction value for the feature sample set using secret sharing matrix multiplication based on the current conversion submodel and the conversion feature sample subset of the respective training participants. The operation of the prediction value acquisition unit 930 may refer to the operation of the block 304 described above with reference to fig. 3 and the operation described with reference to fig. 6 to 8.

The prediction difference determination unit 940 is configured to determine a prediction difference between the current prediction value and the corresponding flag value. The operation of the prediction difference determination unit 940 may refer to the operation of the block 305 described above with reference to fig. 3.

The model update amount determination unit 950 is configured to determine a first model update amount using the prediction difference and the subset of transformed feature samples at the first training participant. The operation of the model update amount determination unit 950 may refer to the operation of the block 306 described above with reference to fig. 3.

The model update amount decomposition unit 960 is configured to decompose the first model update amount into two first partial model update amounts. The operation of the model update amount decomposition unit 960 may refer to the operation of block 307 described above with reference to fig. 3.

The model update amount transmitting/receiving unit 970 is configured to transmit a first partial model update amount to the second training participant and receive a second partial model update amount from the second training participant, the second partial model update amount being obtained by decomposing a second model update amount at the second training participant, the second model update amount being obtained by performing secret sharing matrix multiplication on the prediction difference and the conversion feature sample subset at the second training participant. The operation of the model update amount transmitting/receiving unit 970 may refer to the operation of block 308/311 described above with reference to fig. 3.

The model update unit 980 is configured to update the current conversion submodel at the first training participant based on the remaining first partial model update amount and the received second partial model update amount. The operation of the model update unit 980 may refer to the operation of block 312 described above with reference to fig. 3.

The model determination unit 990 is configured to determine the sub-model of the first training participant based on the conversion sub-models of the first training participant and the second training participant when the loop end condition is fulfilled. The operation of the model determination unit 990 may refer to the operation of block 315 described above with reference to fig. 3.

In one example of the present disclosure, the sample conversion unit 920 may include a sample decomposition module (not shown), a sample transmission/reception module (not shown), and a sample stitching module (not shown). The sample decomposition module is configured to decompose the first subset of feature samples into two first partial subsets of feature samples. The sample sending/receiving module is configured to send a first partial feature sample subset to the second training participant and receive a second partial feature sample subset from the second training participant, the second partial feature sample subset being obtained by decomposing the feature sample subset at the second training participant. The sample stitching module is configured to stitch the remaining first subset of partial feature samples and the received second subset of partial feature samples to obtain a first subset of transformed feature samples.

Fig. 10 shows a block diagram of one example of a prediction value acquisition unit (hereinafter referred to as "prediction value acquisition unit 1000") according to an embodiment of the present disclosure. As shown in fig. 10, the prediction value acquisition unit 1000 may include a first calculation module 1010, a second calculation module 1020, a matrix product decomposition module 1030, a matrix product transmission/reception module 1040, a first summation module 1050, and a value transmission/reception module 1060 and a second summation module 1070.

The first calculation module 1010 is configured to calculate a conversion submodel (W) for a first training participant_A') sample subset of transformed features (X) with a first training participant_A') of the first matrix product. The operation of the first calculation module 1010 may refer to the operation of block 601 described above with reference to fig. 6.

Second calculation module 1020 is configured to calculate a conversion submodel (W) for the second training participant using secret-shared-matrix multiplication_B') sample subset of transformed features (X) with a first training participant_A') and a conversion submodel (W) of the first training participant_A') sample subset of transformed features (X) with a second training participant_B') of the second matrix. The operation of the second calculation module 1020 may refer to the operation of block 603 described above with reference to fig. 6 and the operation described with reference to fig. 7-8.

The matrix product decomposition module 1030 is configured to decompose the calculated second matrix product to obtain 2 second partial matrix products. The operation of the matrix product decomposition module 1030 may refer to the operation of block 604 described above with reference to fig. 6.

The matrix product transmit/receive module 1040 is configured to transmit a second partial matrix product to the second training participant and receive a third partial matrix product from the second training participant. The third partial matrix product is obtained by decomposing the third matrix product at the second training participant. The third matrix product is the conversion submodel (W) at the first training participant performed by the second training participant_A') sample subset of transformed features (X) with a second training participant_B') of the matrix product. The operation of the matrix product transmit/receive module 1040 may refer to the operations of blocks 606 and 607 described above with reference to fig. 6.

The first summation module 1050 is configured to sum the first matrix product, the second partial matrix product, and the third partial matrix product to obtain a first matrix product-sum value at the first training participant. The operation of the first summing module 1050 may refer to the operation of block 608 described above with reference to fig. 6.

The sum value transmission/reception module 1060 is configured to receive a second matrix product sum value (Z) obtained at a second training participant_B) And multiplying and summing (Z) a first matrix obtained at the first training participant_A) And sending to the second training participant. The operation of the sum value transmission/reception module 1060 may refer to the operation of block 610/611 described above with reference to fig. 6.

The second summation module 1070 is configured to sum the resulting first and second matrix product-sum values to obtain the current predicted values of the linear/logistic regression model for the feature sample set. The operation of the second summing module 1070 may refer to the operation of block 612 described above with reference to fig. 6.

In one example of the present disclosure, the second computing module 1020 may be configured to: computing a conversion submodel (W) for a second training participant using a secret shared matrix multiplication with a trusted initiator_B') sample subset of transformed features (X) with a first training participant_A') and a conversion submodel (W) of the first training participant_A') sample subset of transformed features (X) with a second training participant_B') of the second matrix. The operations of second calculation module 1020 may refer to the operations performed at first training participant a described above with reference to fig. 7.

In another example of the present disclosure, the second calculation module 1120 may be configured to: computing a conversion submodel (W) for a second training participant using untrusted initializer secret shared matrix multiplication_B') sample subset of transformed features (X) with a first training participant_A') and a conversion submodel (W) of the first training participant_A') sample subset of transformed features (X) with a second training participant_B') of the second matrix. Second calculation modelThe operations of block 1020 may refer to the operations performed at first training participant a described above with reference to fig. 8.

Fig. 11 shows a schematic diagram of an apparatus (hereinafter referred to as model training apparatus) 1100 for collaborative training of a linear/logistic regression model via two training participants, according to an embodiment of the present disclosure. Each training participant has one sub-model of said linear/logistic regression model, the first training participant (Alice) has a first subset of feature samples and labeled values, the second training participant (Bob) has a second subset of feature samples, the first and second subsets of feature samples are obtained by vertically slicing the set of feature samples used for model training, and the model training apparatus 1100 is located at the side of the second training participant.

As shown in fig. 11, the model training apparatus 1100 includes a model conversion unit 1110, a sample conversion unit 1120, a prediction value acquisition unit 1130, a model update amount receiving unit 1140, a model update amount determining unit 1150, a model update amount decomposition unit 1160, a model update amount sending unit 1170, a model update unit 1180, and a model determining unit 1190.

The model transformation unit 1110 is configured to perform a model transformation process on the sub-models of the respective training participants to obtain transformed sub-models of the respective training participants. The operation of the model conversion unit 1110 may refer to the operation of block 302 described above with reference to fig. 3 and the operation described with reference to fig. 4.

In performing model training, the sample conversion unit 1120, the predicted value acquisition unit 1130, the model update amount reception unit 1140, the model update amount determination unit 1150, the model update amount decomposition unit 1160, the model update amount transmission unit 1170, and the model update unit 1180 are configured to perform operations in a loop until a loop end condition is satisfied. The loop-ending condition may include: reaching a predetermined cycle number; or the determined prediction difference is within a predetermined range. When the loop process is not finished, the updated conversion submodel of each training participant is used as the current conversion submodel of the next loop process.

In particular, during each iteration, the sample conversion unit 1120 is configured to perform a vertical-horizontal slicing conversion on the feature sample set to obtain a converted feature sample subset at each training participant. The operation of the sample conversion unit 1120 may refer to the process described above with reference to fig. 5. Also, the sample conversion unit 1120 may have the same structure as the sample conversion unit 920.

The predictor obtaining unit 1130 is configured to obtain a current predictor for the set of feature samples using secret sharing matrix multiplication based on the current conversion submodel and the conversion feature sample subset of the respective training participant. Here, the predictor acquisition unit 1130 may be configured to obtain the current predictor for the feature sample set using trusted initializer secret sharing matrix multiplication or untrusted initializer secret sharing matrix multiplication. The operation of the prediction value acquisition unit 1130 may refer to the operation of the block 304 described above with reference to fig. 3. The predicted value acquisition unit 1130 may employ the same structure as the predicted value acquisition unit 930 (i.e., the structure shown in fig. 10). Accordingly, the second calculation module in the predictor acquisition unit 1130 is configured to calculate the conversion submodel (W) for the second training participant_B') sample subset of transformed features (X) with a first training participant_A') and a conversion submodel (W) of the first training participant_A') sample subset of transformed features (X) with a second training participant_B') of the second matrix.

The model update amount receiving unit 1140 is configured to receive a first partial model update amount from a first training participant, the first partial model update amount being obtained by decomposing the first model update amount at the first training participant, the first model update amount being determined at the first training participant using a prediction difference value and a conversion feature sample subset at the first training participant, wherein the prediction difference value is a difference value between a current prediction value and a corresponding label value. The operation of the model update amount receiving unit 1140 may refer to the operation of block 308 described above with reference to fig. 3.

The second model update amount determination unit 1150 is configured to perform a secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain a second model update amount. The operation of the second model update amount determination unit 1150 may refer to the operation of block 309 described above with reference to fig. 3. Here, the second model update amount determination unit 1150 may be implemented using the second calculation module 1020 described in fig. 10. That is, the second model update amount determination unit 1150 may be configured to perform a trusted initializer secret sharing matrix multiplication or a non-trusted initializer secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain the second model update amount.

The model update amount decomposition unit 1160 is configured to decompose the second model update amount into two second partial model update amounts. The operation of the model update amount decomposition unit 1160 may refer to the operation of block 310 described above with reference to fig. 3.

The model update amount transmitting unit 1170 is configured to transmit a second partial model update amount to the first training participant. The operation of the model update amount transmitting unit 1170 may refer to the operation of the block 311 described above with reference to fig. 3.

The model update unit 1180 is configured to update the current converter model of the second training participant based on the remaining second partial model update quantity and the received first partial model update quantity. The operation of the model update unit 1180 may refer to the operation of block 313 described above with reference to fig. 3.

The model determination unit 1190 is configured to determine a sub-model of the second training participant based on the transformed sub-models of the first and second training participants when the loop end condition is fulfilled. The operation of model determination unit 1190 may refer to the operation of block 315 described above with reference to fig. 3.

Embodiments of a model training method, apparatus and system according to the present disclosure are described above with reference to fig. 1 through 11. The above model training device can be implemented by hardware, or can be implemented by software, or a combination of hardware and software.

FIG. 12 illustrates a hardware block diagram of a computing device 1200 for implementing collaborative training of a linear/logistic regression model via two training participants, according to an embodiment of the disclosure. As shown in fig. 12, computing device 1200 may include at least one processor 1210, storage (e.g., non-volatile storage) 1220, memory 1230, and a communication interface 1240, and the at least one processor 1210, storage 1220, memory 1230, and communication interface 1240 are connected together via a bus 1260. The at least one processor 1210 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 1210 to: carrying out model conversion processing on the submodels of all the training participants to obtain conversion submodels of all the training participants; the following loop process is executed until a loop end condition is satisfied: performing vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant; obtaining current predicted values for the feature sample set using secret shared matrix multiplication based on the current transformation submodel and the transformation feature sample subset for each training participant; determining a prediction difference value between the current prediction value and the corresponding mark value; determining a first model update quantity using the prediction difference and the subset of transformed feature samples at the first training participant; decomposing the first model updating quantity into two first part model updating quantities, and sending one first part model updating quantity to a second training participant; and receiving a second partial model update quantity from the second training participant, the second partial model update quantity being obtained by decomposing the second model update quantity at the second training participant, the second model update quantity being obtained by performing secret sharing matrix multiplication on the prediction difference and the conversion feature sample subset at the second training participant; updating a current transition submodel at the first training participant based on the remaining first partial model update quantity and the received second partial model update quantity, wherein the updated transition submodel of each training participant is used as the current transition submodel for the next cycle process when the cycle process is not ended; when the loop ending condition is met, determining a sub-model of the first training participant based on the conversion sub-models of the first training participant and the second training participant.

It should be understood that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1210 to perform the various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present disclosure.

FIG. 13 illustrates a hardware block diagram of a computing device 1300 for implementing collaborative training of a linear/logistic regression model via two training participants, according to an embodiment of the disclosure. As shown in fig. 13, computing device 1300 may include at least one processor 1310, storage (e.g., non-volatile storage) 1320, memory 1330, and communication interface 1340, and the at least one processor 1310, storage 1320, memory 1330, and communication interface 1340 are connected together via a bus 1360. The at least one processor 1310 executes at least one computer-readable instruction (i.e., the elements described above as being implemented in software) stored or encoded in memory.

In one embodiment, computer-executable instructions are stored in the memory that, when executed, cause the at least one processor 1310 to: carrying out model conversion processing on the submodels of all the training participants to obtain conversion submodels of all the training participants; the following loop process is executed until a loop end condition is satisfied: performing vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant; obtaining current predicted values for the feature sample set using secret shared matrix multiplication based on the current transformation submodel and the transformation feature sample subset for each training participant; receiving a first partial model update quantity from a first training participant, the first partial model update quantity being obtained by decomposing the first model update quantity at the first training participant, the first model update quantity being determined at the first training participant using a prediction difference value and a conversion feature sample subset at the first training participant, wherein the prediction difference value is a difference value between a current prediction value and a corresponding label value; performing secret sharing matrix multiplication on the prediction difference and the conversion feature sample subset at the second training participant to obtain a second model update quantity; decomposing the second model updating quantity into two second part model updating quantities, and sending one second part model updating quantity to the first training participant; and updating the current transition submodel of the second training participant based on the remaining second partial model update quantity and the received first partial model update quantity, wherein, when the cycle process is not finished, the updated transition submodel of each training participant is used as the current transition submodel of the next cycle process; determining a sub-model of a second training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied.

It should be understood that the computer-executable instructions stored in the memory, when executed, cause the at least one processor 1310 to perform the various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present disclosure.

According to one embodiment, a program product, such as a machine-readable medium (e.g., a non-transitory machine-readable medium), is provided. A machine-readable medium may have instructions (i.e., elements described above as being implemented in software) that, when executed by a machine, cause the machine to perform various operations and functions described above in connection with fig. 1-11 in the various embodiments of the present disclosure. Specifically, a system or apparatus may be provided which is provided with a readable storage medium on which software program code implementing the functions of any of the above embodiments is stored, and causes a computer or processor of the system or apparatus to read out and execute instructions stored in the readable storage medium.

In this case, the program code itself read from the readable medium can realize the functions of any of the above-described embodiments, and thus the machine-readable code and the readable storage medium storing the machine-readable code form part of the present invention.

Examples of the readable storage medium include floppy disks, hard disks, magneto-optical disks, optical disks (e.g., CD-ROMs, CD-R, CD-RWs, DVD-ROMs, DVD-RAMs, DVD-RWs), magnetic tapes, nonvolatile memory cards, and ROMs. Alternatively, the program code may be downloaded from a server computer or from the cloud via a communications network.

It will be understood by those skilled in the art that various changes and modifications may be made in the above-disclosed embodiments without departing from the spirit of the invention. Accordingly, the scope of the invention should be determined from the following claims.

It should be noted that not all steps and units in the above flows and system structure diagrams are necessary, and some steps or units may be omitted according to actual needs. The execution order of the steps is not fixed, and can be determined as required. The apparatus structures described in the above embodiments may be physical structures or logical structures, that is, some units may be implemented by the same physical entity, or some units may be implemented by a plurality of physical entities, or some units may be implemented by some components in a plurality of independent devices.

In the above embodiments, the hardware units or modules may be implemented mechanically or electrically. For example, a hardware unit, module or processor may comprise permanently dedicated circuitry or logic (such as a dedicated processor, FPGA or ASIC) to perform the corresponding operations. The hardware units or processors may also include programmable logic or circuitry (e.g., a general purpose processor or other programmable processor) that may be temporarily configured by software to perform the corresponding operations. The specific implementation (mechanical, or dedicated permanent, or temporarily set) may be determined based on cost and time considerations.

The detailed description set forth above in connection with the appended drawings describes exemplary embodiments but does not represent all embodiments that may be practiced or fall within the scope of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous" over other embodiments. The detailed description includes specific details for the purpose of providing an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described embodiments.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertically slicing the set of feature samples, the method being performed by the first training participant, the method comprising:

carrying out model conversion processing on the submodels of all the training participants to obtain conversion submodels of all the training participants;

the following loop process is executed until a loop end condition is satisfied:

performing vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant;

obtaining current predicted values for the feature sample set using secret sharing matrix multiplication based on current transformation submodels and transformation feature sample subsets of respective training participants;

determining a prediction difference value between the current prediction value and the corresponding mark value;

determining a first model update quantity using the prediction difference and a subset of transformed feature samples at the first training participant;

decomposing the first model update quantity into two first part model update quantities, and sending one first part model update quantity to the second training participant; and

receiving a second partial model update quantity from the second training participant, the second partial model update quantity being obtained by decomposing a second model update quantity at the second training participant, the second model update quantity being obtained by performing secret sharing matrix multiplication on the prediction difference and the conversion feature sample subset at the second training participant;

updating the current transition submodel at the first training participant based on the remaining first partial model update quantity and the received second partial model update quantity, wherein the updated transition submodel of each training participant is used as the current transition submodel for the next cycle process when the cycle process is not ended;

determining a sub-model of the first training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied.

2. The method of claim 1, wherein performing a vertical-to-horizontal slicing transform on the feature sample set to obtain transformed feature sample subsets at each training participant comprises:

decomposing the first subset of feature samples into two first partial subsets of feature samples;

sending a first subset of the feature samples to the second training participant;

receiving a second partial feature sample subset from the second training participant, the second partial feature sample subset being derived by decomposing the feature sample subset at the second training participant; and

and splicing the remaining first part of feature sample subset and the received second part of feature sample subset to obtain a conversion feature sample subset at the first training participant.

3. The method of claim 1 or 2, wherein using secret sharing matrix multiplication to obtain current predictors for the set of feature samples based on current transformation submodels and transformation feature sample subsets of respective training participants comprises:

obtaining current predictors for the feature sample set using a trusted initializer secret sharing matrix multiplication based on the current transformation submodel and the transformation feature sample subset for each training participant.

4. The method of claim 1 or 2, wherein using secret sharing matrix multiplication to obtain current predictors for the set of feature samples based on current transformation submodels and transformation feature sample subsets of respective training participants comprises:

obtaining current predictors for the feature sample set using untrusted initializer secret sharing matrix multiplication based on current transformation submodels and transformation feature sample subsets of respective training participants.

5. The method of any of claims 1 to 4, wherein the end-of-loop condition comprises:

a predetermined number of cycles; or

The prediction difference is within a predetermined range.

6. A method for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having one sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertically slicing the set of feature samples, the method being performed by the second training participant, the method comprising:

the following loop process is executed until a loop end condition is satisfied:

receiving a first partial model update quantity from the first training participant, the first partial model update quantity being a result of a decomposition of a first model update quantity at the first training participant, the first model update quantity being determined at the first training participant using a prediction difference and a subset of transformed feature samples at the first training participant, wherein the prediction difference is a difference between the current prediction value and a corresponding marker value;

performing secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain a second model update;

decomposing the second model update quantity into two second partial model update quantities, and sending one second partial model update quantity to the first training participant; and

updating the current transition submodel of the second training participant based on the remaining second partial model update quantity and the received first partial model update quantity, wherein the updated transition submodel of each training participant is used as the current transition submodel of the next cycle process when the cycle process is not finished;

determining a sub-model of the second training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied.

7. The method of claim 6, wherein performing a vertical-to-horizontal slicing transform on the feature sample set to obtain transformed feature sample subsets at each training participant comprises:

decomposing the second subset of feature samples into two second partial subsets of feature samples;

sending a second subset of partial feature samples to the first training participant;

receiving a first partial feature sample subset from the first training participant, the first partial feature sample subset being derived by decomposing a feature sample subset at the first training participant; and

and splicing the remaining second part of the feature sample subset and the received first part of the feature sample subset to obtain a conversion feature sample subset at the second training participant.

8. The method of claim 6 or 7, wherein using secret sharing matrix multiplication to obtain current predictors for the set of feature samples based on current transformation submodels and transformation feature sample subsets of respective training participants comprises:

obtaining current predicted values for the feature sample set using a trusted initializer secret sharing matrix multiplication based on a current transformation submodel and a transformation feature sample subset of each training participant; or

9. The method of claim 6 or 7, wherein performing a secret sharing matrix multiplication on the predicted difference and the transformed feature sample subset at the second training participant to obtain a second model update quantity comprises:

performing trusted initializer secret sharing matrix multiplication on the prediction difference and the conversion feature sample subset at the second training participant to obtain a second model updating amount; or

Performing untrusted initializer secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain a second model update.

10. An apparatus for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having a sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the apparatus being located on the first training participant side, the apparatus comprising:

the model conversion unit is configured to perform model conversion processing on the submodels of the training participants to obtain conversion submodels of the training participants;

the sample conversion unit is configured to perform vertical-horizontal segmentation conversion on the feature sample set to obtain a conversion feature sample subset at each training participant;

a prediction value obtaining unit configured to obtain a current prediction value for the feature sample set using secret sharing matrix multiplication based on a current conversion submodel and a conversion feature sample subset of each training participant;

a prediction difference determination unit configured to determine a prediction difference between the current prediction value and a corresponding marker value;

a model update amount determination unit configured to determine a first model update amount using the prediction difference and a subset of transformed feature samples at a first training participant;

a model update amount decomposition unit configured to decompose the first model update amount into two first partial model update amounts;

a model update amount transmitting/receiving unit configured to transmit a first partial model update amount to the second training participant and receive a second partial model update amount from the second training participant, the second partial model update amount being obtained by decomposing a second model update amount at the second training participant, the second model update amount being obtained by performing secret sharing matrix multiplication on the prediction difference value and a conversion feature sample subset of the second training participant;

a model updating unit configured to update a current converter model at the first training participant based on the remaining first partial model update amount and the received second partial model update amount; and

a model determination unit configured to determine a sub-model of the first training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied,

wherein the sample conversion unit, the predicted value acquisition unit, the predicted difference value determination unit, the model update amount decomposition unit, the model update amount transmission/reception unit, and the model update unit cyclically execute operations until a cycle end condition is satisfied,

and when the loop process is not finished, the updated conversion submodel of each training participant is used as the current conversion submodel of the next loop process.

11. The apparatus of claim 10, wherein the sample conversion unit comprises:

a sample decomposition module configured to decompose the first subset of feature samples into two first partial subsets of feature samples;

a sample sending/receiving module configured to send a first partial feature sample subset to the second training participant and receive a second partial feature sample subset from the second training participant, the second partial feature sample subset being obtained by decomposing the feature sample subset at the second training participant; and

a sample stitching module configured to stitch the remaining first partial subset of feature samples and the received second partial subset of feature samples to obtain a transformed subset of feature samples at the first training participant.

12. The apparatus according to claim 10 or 11, wherein the prediction value acquisition unit is configured to:

13. An apparatus for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having a sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples being obtained by vertical segmentation of a set of feature samples, the apparatus being located on the side of the second training participant, the apparatus comprising:

a model update amount receiving unit configured to receive a first partial model update amount from the first training participant, the first partial model update amount being obtained by decomposing a first model update amount at the first training participant, the first model update amount being determined at the first training participant using a prediction difference value and a conversion feature sample subset at the first training participant, wherein the prediction difference value is a difference value between the current prediction value and a corresponding label value;

a second model update amount determination unit configured to perform secret sharing matrix multiplication on the prediction difference and the transformed feature sample subset at the second training participant to obtain a second model update amount;

a model update amount decomposition unit configured to decompose the second model update amount into two second partial model update amounts;

a model update amount sending unit configured to send a second partial model update amount to the first training participant;

a model updating unit configured to update a current converter model of the second training participant based on the remaining second partial model update amount and the received first partial model update amount; and

a model determination unit configured to determine a sub-model of the second training participant based on the conversion sub-models of the first and second training participants when the loop end condition is satisfied,

wherein the sample conversion unit, the predicted value acquisition unit, the model update amount reception unit, the model update amount determination unit, the model update amount decomposition unit, the model update amount transmission unit, and the model update unit cyclically execute operations until a cycle end condition is satisfied,

14. The apparatus of claim 13, wherein the sample conversion unit comprises:

a sample decomposition module configured to decompose the second subset of feature samples into two second partial subsets of feature samples;

a sample sending/receiving module configured to send a second partial feature sample subset to the first training participant and receive a first partial feature sample subset from the first training participant, the first partial feature sample subset being obtained by decomposing the feature sample subset at the first training participant; and

a sample stitching module configured to stitch the remaining second subset of partial feature samples and the received first subset of partial feature samples to obtain a transformed subset of feature samples at the second training participant.

15. The apparatus according to claim 13 or 14, wherein the prediction value acquisition unit is configured to:

16. The apparatus of claim 13 or 14, wherein the model update amount determination unit is configured to:

17. A system for collaborative training of a linear/logistic regression model via first and second training participants, each training participant having a sub-model of the linear/logistic regression model, the first training participant having a first subset of feature samples and labeled values, the second training participant having a second subset of feature samples, the first and second subsets of feature samples obtained by vertically slicing the set of feature samples, the system comprising:

a first training participant device comprising the apparatus of any one of claims 10 to 12; and

a second training participant device comprising an apparatus as claimed in any one of claims 13 or 16.

18. A computing device, comprising:

at least one processor, and

a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 1-5.

19. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any one of claims 1 to 5.

20. A computing device, comprising:

at least one processor, and

a memory coupled with the at least one processor, the memory storing instructions that, when executed by the at least one processor, cause the at least one processor to perform the method of any of claims 6 to 9.

21. A machine-readable storage medium storing executable instructions that, when executed, cause the machine to perform the method of any one of claims 6 to 9.