Disclosure of Invention
In order to solve the technical problem, the invention provides a security model prediction method based on secret sharing, which comprises the following steps:
receiving a first set of random numbers from a third party;
generating a shared computation predictor using the first set of random numbers, a model coefficient vector, and a vector from a data provider; and
and performing model prediction by using the shared calculation prediction result.
Optionally, the generating the shared computation prediction result comprises:
generating an intermediate model vector using the model coefficient vector and the first set of random numbers;
sending the intermediate model vector to the data provider and receiving an intermediate data vector from the data provider;
generating an intermediate data value using the intermediate data vector and the first set of random numbers from the data provider;
receiving an intermediate model value from the data provider; and
generating the shared computation prediction result using the intermediate model value and the intermediate data value.
Optionally, the shared computed prediction result is a product of the intermediate model value and the intermediate data value.
Optionally, the method further comprises:
generating a second shared computation predictor using the model coefficient vector and a locally stored additional data vector; and
model prediction is performed using the shared computation prediction and the second shared computation prediction.
Optionally, the method further comprises:
generating a second shared computational predictor using the first set of random numbers, the model coefficient vector, and a vector from a second data provider; and
model prediction is performed using the shared computation prediction and the second shared computation prediction.
Optionally, the model prediction uses a logistic regression model and/or a linear regression model.
The embodiment of the application further provides a security model prediction method based on secret sharing, which comprises the following steps:
receiving a second set of random numbers from a third party;
generating an intermediate data vector using the second set of random numbers and a data vector;
sending the intermediate data vector to a data demander and receiving an intermediate model vector from the data demander;
generating an intermediate data value using the intermediate model vector and the second set of random numbers; and
providing the intermediate data value to the data consumer for model prediction.
Embodiments of the present application further provide an apparatus for secret sharing based security model prediction, comprising:
a receiving module configured to receive a first set of random numbers from a third party;
a prediction vector generation module configured to generate a shared computation prediction result using the first set of random numbers, a model coefficient vector, and a vector from a data provider; and
a model prediction module configured to perform model prediction using the shared computation prediction result.
Optionally, the receiving module is further configured to receive an intermediate data vector and an intermediate model value from the data provider;
the prediction vector generation module is further configured to:
generating an intermediate model vector using the model coefficient vector and the first set of random numbers;
generating an intermediate data value using an intermediate data vector and the first set of random numbers; and
generating the shared computation predictor using the intermediate model values and the intermediate data values;
the apparatus further includes a transmission module configured to transmit the intermediate model vector to the data provider.
Optionally, the shared computed prediction result is a product of the intermediate model value and the intermediate data value.
Optionally, the prediction vector generation module is further configured to:
generating a second shared computation predictor using the model coefficient vector and a locally stored additional data vector; and
model prediction is performed using the shared computation prediction and the second shared computation prediction.
Optionally, the prediction vector generation module is further configured to:
generating a second shared computational predictor using the first set of random numbers, the model coefficient vector, and a vector from a second data provider; and
model prediction is performed using the shared computation prediction and the second shared computation prediction.
Optionally, the model prediction uses a logistic regression model and/or a linear regression model.
Embodiments of the present application further provide an apparatus for secret sharing-based security model prediction, including:
a receiving module configured to receive a second set of random numbers from a third party and to receive an intermediate model vector from a data demander;
a prediction vector generation module configured to generate an intermediate data vector using the second set of random numbers and a data vector, and to generate an intermediate data value using the intermediate model vector and the second set of random numbers; and
a transfer module configured to send the intermediate data vector to a data consumer and to provide the intermediate data value to the data consumer for model prediction.
An embodiment of the present application further provides a security model prediction apparatus based on secret sharing, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a first set of random numbers from a third party;
generating a shared computation predictor using the first set of random numbers, a model coefficient vector, and a vector from a data provider; and
and performing model prediction by using the shared calculation prediction result.
An embodiment of the present application further provides a security model prediction apparatus based on secret sharing, including:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
receiving a second set of random numbers from a third party;
generating an intermediate data vector using the second set of random numbers and a data vector;
sending the intermediate data vector to a data demander and receiving an intermediate model vector from the data demander;
generating an intermediate data value using the intermediate model vector and the second set of random numbers; and providing the intermediate data value to the data consumer for model prediction.
The invention provides a safe decentration model prediction method, which achieves the following technical advantages:
1. the data can not be out of respective boundaries, a trusted third party is not required to perform data fusion, and the data of any party is not required to be deployed or introduced into other parties, so that model prediction can be completed.
2. And in combination with secret sharing, data privacy of all the cooperative parties is protected. And calculating each party by using a data splitting mode, wherein the partner does not expose own plaintext data to the other party, and only calculates the unidentifiable values split by the partner to obtain the final accurate calculation result.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and thus the present invention is not limited to the specific embodiments disclosed below.
FIG. 1 is an architecture diagram of a multi-party data collaboration system based on secret sharing, in accordance with aspects of the present invention.
As shown in fig. 1, the multi-party data collaboration system based on secret sharing of the present invention includes a data demand party (also referred to as a model party), a data supply party (also referred to as a data party), and a third party (a fair third party, for example, a fair judicial agency or government agency, etc.).
The data demand side possesses a model, wherein a model coefficient vector is W ═ { ω 1, ω 2, … …, ω n }, and the data provider side possesses a data vector X ═ { X1, X2, … …, xn }; and the third party generates a series of random numbers and distributes the random numbers to the data provider and the data demander respectively. The data demand side calculates by using the model coefficient and the random number distributed by the data demand side, the data provider side calculates by using the data owned by the data demand side and the random number distributed by the data provider side, the data demand side and the data provider side exchange calculation results for further processing, and then the results are summarized to obtain a model prediction result.
The technical solution of the present invention is illustrated by four specific embodiments below.
Example one
Referring to FIG. 2, one embodiment of a data demander in data collaboration with a data provider is illustrated, in accordance with aspects of the present invention.
In step 201, a third party generates a set of random numbers R1And R2。
For example, R1={a,c0},R2B, c1, where a and b are random number vectors, c0 and c1 are random numbers, and c is a × b, c0+ c 1. Where a b is a vector multiplication.
At step 202, a third party assembles R a random number1And R2And respectively sending the data to a data demand side and a data provider side.
In step 203, the data consumer uses a random number set R1And the model coefficient vector W ═ ω1,ω2,……,ωnAnd (6) calculating to obtain an intermediate model vector e. For example, e ═ W-a.
At step 204, the data provider uses a set of random numbers R2Sum data vector X ═ X1,x2,……,xnAnd (6) calculating to obtain an intermediate data vector f. For example, f ═ X-b.
In steps 205 and 206, the data demander and the data provider exchange the results calculated in steps 203 and 204.
Specifically, the data consumer may send the calculation result e to the data provider at step 205, and the data provider sends the calculation result f to the data consumer at step 206.
Note that although step 205 precedes step 206 in fig. 2, its order may be swapped or may be performed simultaneously.
In step 207, the data consumer uses a random number set R1And the intermediate data vector f provided by the data provider in step 206, to obtain an intermediate data value z 0. For example, z0 ═ a × f + c0, where a × f is the vector multiplication.
At step 208, the data provider uses a set of random numbers R2And the intermediate model vector provided by the data consumer in step 205, to obtain an intermediate model value z 1. For example, z1 ═ e × X + c1, where e × X is the vector multiplication.
At step 209, the data provider sends z1 to the data demander.
At step 210, the data demander aggregates z0 and z1 to obtain the product of model coefficients and data, WxX, which is also referred to herein as the shared computation prediction.
For example, z-z 0+ z 1-a × f + c0+ e × X + c1
=a×(X-b)+(W-a)×X+c
=a×X-a×b+W×X-a×X+a×b
=W×X
In step 211, model prediction is performed using the shared computational prediction results obtained in step 210.
For example, for Logistic Regression (Logistic Regression) model, the calculation
Wherein, omega and lambda are model coefficients provided by a model party. x is the input required for the calculation, private data belonging to the data provider.
Example two
In the embodiment illustrated in FIG. 2, only model information is provided by the data consumers. In some cases, the data consumers have both model information W and data information X'.
In this case, step 201-209 is the same as the embodiment illustrated in fig. 2, and is not described herein again. Only the differences from the process of fig. 2 will be described below.
At step 210, the data consumer calculates an additional intermediate data value z 0'.
z0’=W×X’。
In step 211, the data demander aggregates z0, z1, and z 0' to obtain a shared computation prediction:
z=z0+z1+z0’=W×X+W×X’。
at step 212, model prediction is performed using W × X + W × X'.
EXAMPLE III
The above illustrates an embodiment in which a data consumer collaborates with a data provider. In some cases, a data consumer may need data from multiple data providers in model prediction, whereby the data consumer needs to collaborate on data with multiple data providers. Fig. 3 illustrates an example of data collaboration by one data consumer with two data providers (data provider 1 and data provider 2).
In this embodiment, the data consumer has a model WA={ωA1,ωA2,……,ωAnW andB={ωB1,ωB2,……,ωBn}, data provider 1 has data XA={xA1,xA2,……,xAnAnd data provider 2 has data XB={xB1,xB2,……,xBn}. Sharing of computational prediction results W in model predictionA×XAAnd WB×XB。
In step 301, a third party generates a first set of random numbers { R }1、R2And a second set of random numbers R1’、R2', where a first set of random numbers is used for data collaboration by the data consumer with the data provider 1 and a second set of random numbers is used for data collaboration by the data consumer with the data provider 2.
In particular, R1={a,c0},R2B, c1, where c is a × b, c0+ c 1; r1’={a’,c0’},R2Where a, b and a ', b' are random number vectors, c0, c1 and c0 ', c 1' are random numbers, and c '═ a' × b ', c' ═ c0 '+ c 1'. Note that a × b and a '× b' are vector multiplications.
In step 302, a third party assembles R random numbers1And R1' providing to a data consumer, R2Providing R to the data provider 12' provided to the data provider 2.
In step 303, the data demander calculates e and e'.
Specifically, e ═ WA–a,e’=WB–a’。
In steps 304 and 305, the data provider 1 and the data provider 2 calculate f ═ X, respectivelyA-b and f ═ XB–b’。
In step 306-.
Specifically, the data consumer transmits the calculation result e to the data provider 1 at step 306, and transmits the calculation result e' to the data provider 2 at step 307.
The data provider 1 transmits the calculation result f to the data demander in step 308, and transmits the calculation result f' to the data demander in step 309.
Note that the specific order of steps 306-308 is shown in FIG. 3, but the order of these steps may be switched or may be performed simultaneously.
At step 310, the data consumer uses a set of random numbers R1And the settlement result f provided by the data provider 1 in step 308, to obtain the first intermediate data value z 0. For example, z0 ═ a × f + c 0.
The data demander also uses a random number set R1' and the settlement result f ' provided by the data provider 2 in step 309 are calculated to obtain the second intermediate data value z0 '. For example, z0 '═ a' × f '+ c 0'.
In step 311, the data provider 1 uses a random number set R2And the calculation result e provided by the data consumer in step 306, to obtain a first intermediate model value z 1. For example, z1 ═ e × XA+c1。
At step 312, the data provider 2 uses the set of random numbers R2' and the calculation result e ' provided by the data consumer in step 307 are calculated to obtain a second intermediate model value z1 '. For example, z1 ═ e' × XB+c1’。
At steps 313 and 314, data provider 1 sends z1 to the data demander and data provider 2 sends z 1' to the data demander.
In step 315, the data demander aggregates z0 and z1 to obtain the product W of model coefficients and dataAX, and sum z0 'and z 1' to get the product W of model coefficients and dataB×X。
For example, z-z 0+ z 1-a × f + c0+ e × XA+c1
=a×(XA-b)+(WA-a)×XA+c
=a×XA-a×b+WA×XA-a×XA+a×b
=WA×XA
z’=z0’+z1’=a’×f+c0’+e’×XB+c1’
=a’×(XB-b)+(W-a)×XB+c’
=a×XB-a×b+WB×XB-a×XB+a×b
=WB×XB
At step 316, model predictions are made using the results in steps 315 and 316 (also referred to as shared computational prediction results).
In one embodiment, model WAAnd WBMay be the same, in other words, the data consumer uses a model W ═ WA=WBAnd data from two data providers for model prediction.
Note that the process of data collaboration between one data requestor and two data providers is depicted in fig. 3 in a particular order, but other orders of steps are possible. The steps of data collaboration between the data demander and the data provider 1 and the steps of data collaboration between the data demander and the data provider 2 are independent and can be done at different times, respectively. For example, the step of data collaboration between the data consumer and the data provider 1 may be done before or after the data collaboration between the data consumer and the data provider 2, or some steps in the two processes may be temporally interleaved. And some steps may be split, e.g. the calculations e and e' in step 303 may be performed separately.
While data collaboration between one data consumer and two data providers is illustrated above, the process is also applicable to data collaboration between one data consumer and more than two data providers, which operates similarly to the process illustrated in fig. 3.
It should be noted that although the present invention is illustrated with a logistic regression model, other models can be applied to the present invention, such as a linear regression model, y ═ ω × x + e, and so on. Further, two specific random number generation methods are described above, but other random number generation methods are also within the scope of the present invention, and those skilled in the art can devise suitable random number generation methods according to actual needs.
FIG. 4 illustrates one example of a secret sharing based data collaboration method performed by a data consumer in accordance with aspects of the present invention.
Referring to fig. 4, in step 401, a first set of random numbers from a third party is received.
This step may correspond to steps 201, 202 described above with reference to fig. 2, and/or steps 301, 302 described with reference to fig. 3.
At step 402, a shared computation predictor is generated using the first set of random numbers, a model coefficient vector, and a vector from a data provider.
This step may correspond to step 203-.
At step 403, model prediction is performed using the shared computational prediction results.
This step may correspond to step 211 described above with reference to fig. 2, and/or step 303-316 described with reference to fig. 3.
FIG. 5 illustrates one example of a secret sharing based data collaboration method performed by a data consumer in accordance with aspects of the present invention.
Referring to FIG. 5, at step 501, a first set of random numbers R from a third party is received1。
In particular, the third party may generate a set of random numbers R ═ { a, b, c0, c1}, where c ═ a × b, c ═ c0+ c1, where the first set of random numbers R ═ a, b, c0, c1}, where the first set of random numbers R is a set of random numbers c, c1Is { a, c0}, and R2Provided to the data provider is { b, c1 }.
In another example, the third party may generate a set of random numbers R ═ { a, b, c0, c1}, where c ═ a0+ a1, c ═ b0+ b1, where the first set of random numbers R ═ a, b, c0, c1}, where the first set of random numbers R ═ a, c ═ b0+ b11A, c0, and R2The { b, c1} may be provided to the data provider.
In step 502, a model coefficient vector W and a first set of random numbers R are used1An intermediate model vector e is generated. For example, e ═ W-a.
At step 503, the intermediate model vector e is sent to the data provider and the intermediate data vector f is received from the data provider.
In step 504, an intermediate data vector f and the first set of random numbers R are used1To generate an intermediate data value z 0.
At step 505, an intermediate model value z1 is received from a data provider.
At step 506, shared computed predictions are generated using the intermediate model value z1 and the intermediate data value z 0.
In step 507, model prediction is performed using the shared computational prediction results.
Fig. 6 illustrates an example method of secret sharing based data collaboration performed by a data provider, in accordance with aspects of the present invention.
In step 601, a second set of random numbers R from a third party is received2。
At step 602, a second set of random numbers R is used2And the data vector X to generate an intermediate data vector f.
At step 603, the intermediate data vector f is sent to the data consumer and the intermediate model vector e is received from the data consumer.
In step 604, the intermediate model vector e and the second set of random numbers R are used2To generate an intermediate data value z 1.
At step 605, the intermediate data value z1 is provided to the data consumer for model prediction.
FIG. 7 illustrates a block diagram of a data requestor in accordance with aspects of the invention.
Specifically, the data demander (the model side) may include a receiving module 701, a prediction vector generation module 702, a model prediction module 703, a transmission module 704, and a memory 705. Wherein the memory 705 stores model coefficients.
The receiving module 701 may be configured to receive a first set of random numbers from a third party, receive an intermediate data vector and/or an intermediate model value from the data provider.
The prediction vector generation module 702 may be configured to generate a shared computation prediction result using the first set of random numbers, the model coefficient vector, and a vector from a data provider.
In particular, the prediction vector generation module 702 may be configured to generate an intermediate model vector using the model coefficient vector and a first set of random numbers; generating an intermediate data value using the intermediate data vector and the first set of random numbers; and generating a shared computation prediction result using the intermediate model value and the intermediate data value.
The prediction vector generation module 702 may be further configured to generate an intermediate model vector using the model coefficient vector and the first set of random numbers; the shared computational prediction is generated using the intermediate data vector and the intermediate model vector from the data provider.
The model prediction module 703 may be configured to perform model prediction using shared computational predictions.
The transmission module 704 may be configured to transmit the intermediate model vector to the data provider.
Fig. 8 illustrates a block diagram of a data provider in accordance with aspects of the invention.
Specifically, the data provider may include: a receiving module 803, a prediction vector generation module 802, a transmitting module 803, and a memory 804. Where the memory 804 may store private data.
The receiving module 801 may be configured to receive a second set of random numbers from a third party and to receive an intermediate model vector from a data demander.
The prediction vector generation module 802 may be configured to generate an intermediate data vector using the second set of random numbers and a data vector, and to generate an intermediate data value using the intermediate model vector and the second set of random numbers.
The transfer module 803 may be configured to send the intermediate data vector to a data demander and provide the intermediate data value to the data demander for model prediction.
Compared with the prior art, the invention has the following advantages:
1) private data of the parties can be protected from leakage. The data held by each party does not leave the own calculation boundary, and the parties interact locally in an encryption mode to complete the calculation. Although a fair third party participates, the third party only provides distribution of the random numbers and does not participate in a specific calculation process.
2) The docking cost is not high. The pure software scheme has no other extra hardware requirements except a basic server and the like, does not introduce other hardware security holes, and can complete calculation on line.
3) The calculation is completely lossless, and the result accuracy is not influenced.
4) The algorithm itself is not limited. The calculation result is returned in real time, four arithmetic operations such as addition, subtraction, multiplication, division and the like can be supported, and mixed calculation of the arithmetic operations is not limited.
5) The secret sharing secure multi-party computing algorithm can obtain a final result through intermediate splitting, converting, result summarizing and other modes without reserving information such as a secret key and the like. On the premise that the third party distributing the random numbers is fair, the intermediate value in the calculation process cannot be used for deducing the original plaintext.
The illustrations set forth herein in connection with the figures describe example configurations and are not intended to represent all examples that may be implemented or fall within the scope of the claims. The term "exemplary" as used herein means "serving as an example, instance, or illustration," and does not mean "preferred" or "advantageous over other examples. The detailed description includes specific details to provide an understanding of the described technology. However, the techniques may be practiced without these specific details. In some instances, well-known structures and devices are shown in block diagram form in order to avoid obscuring the concepts of the described examples.
In the drawings, similar components or features may have the same reference numerals. Further, various components of the same type may be distinguished by following the reference label by a dash and a second label that distinguishes among the similar components. If only the first reference label is used in the specification, the description is applicable to any one of the similar components having the same first reference label irrespective of the second reference label.
The various illustrative blocks and modules described in connection with the disclosure herein may be implemented or performed with a general purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
The functions described herein may be implemented in hardware, software executed by a processor, firmware, or any combination thereof. If implemented in software executed by a processor, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Other examples and implementations are within the scope of the disclosure and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hard-wired, or any combination thereof. Features that implement functions may also be physically located at various locations, including being distributed such that portions of functions are implemented at different physical locations. In addition, as used herein, including in the claims, "or" as used in a list of items (e.g., a list of items accompanied by a phrase such as "at least one of" or "one or more of") indicates an inclusive list, such that, for example, a list of at least one of A, B or C means a or B or C or AB or AC or BC or ABC (i.e., a and B and C). Also, as used herein, the phrase "based on" should not be read as referring to a closed condition set. For example, an exemplary step described as "based on condition a" may be based on both condition a and condition B without departing from the scope of the present disclosure. In other words, the phrase "based on," as used herein, should be interpreted in the same manner as the phrase "based, at least in part, on.
Computer-readable media includes both non-transitory computer storage media and communication media, including any medium that facilitates transfer of a computer program from one place to another. Non-transitory storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, non-transitory computer-readable media can comprise RAM, ROM, electrically erasable programmable read-only memory (EEPROM), Compact Disk (CD) ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other non-transitory medium that can be used to carry or store desired program code means in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disc), as used herein, includes CD, laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also included within the scope of computer-readable media.
The description herein is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.