CN115510502B - PCA method and system for privacy protection - Google Patents

PCA method and system for privacy protection Download PDF

Info

Publication number
CN115510502B
CN115510502B CN202211473530.9A CN202211473530A CN115510502B CN 115510502 B CN115510502 B CN 115510502B CN 202211473530 A CN202211473530 A CN 202211473530A CN 115510502 B CN115510502 B CN 115510502B
Authority
CN
China
Prior art keywords
server
data points
data
client
covariance matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211473530.9A
Other languages
Chinese (zh)
Other versions
CN115510502A (en
Inventor
王小伟
张旭
吴睿振
孙华锦
王凛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211473530.9A priority Critical patent/CN115510502B/en
Publication of CN115510502A publication Critical patent/CN115510502A/en
Application granted granted Critical
Publication of CN115510502B publication Critical patent/CN115510502B/en
Priority to PCT/CN2023/110312 priority patent/WO2024109149A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/085Secret sharing or secret splitting, e.g. threshold schemes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Storage Device Security (AREA)

Abstract

The invention relates to the technical field of information, in particular to a PCA method and a system for privacy protection. The method comprises the following steps: each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and respectively sends the split information to the server
Figure 300994DEST_PATH_IMAGE001
The method comprises the steps of carrying out a first treatment on the surface of the Server device
Figure 765473DEST_PATH_IMAGE001
Calculating a secret shared value of the overall covariance matrix, and obtaining a secret shared value of the overall covariance matrix by a server
Figure 716242DEST_PATH_IMAGE002
Adding noise to the calculation result, server
Figure 709606DEST_PATH_IMAGE002
Transmitting the result after adding noise to a server
Figure 995094DEST_PATH_IMAGE003
Server
Figure 528844DEST_PATH_IMAGE003
Obtaining an overall covariance matrix after adding noise; server device
Figure 583387DEST_PATH_IMAGE003
Singular value decomposition is carried out on a covariance matrix containing noise, and the maximum absolute value is obtained
Figure 747652DEST_PATH_IMAGE004
Feature vectors corresponding to the feature values; and sending the feature vector to the client, and reducing the dimension of the local data by the client. The invention reduces the dimension of the data in the federal learning, and can effectively improve the training speed of the federal learning.

Description

PCA method and system for privacy protection
Technical Field
The present invention relates to the field of information technologies, and in particular, to a privacy preserving PCA method, system, terminal, and storage medium.
Background
With the development of artificial intelligence, machine learning is widely applied to various fields such as recommendation systems, spam filtering, face recognition, and the like. In recent years, people pay attention to personal privacy, and the conventional machine learning mode is easy to cause disclosure of personal privacy information. In addition, based on the reasons of business secrets, enterprises often only allow data to be mastered in own hands, so that the problem of data island is inevitably generated, federal learning is used as a machine learning technology based on cloud computing, multi-party joint learning can be performed on the premise of protecting private information of users, and the method is widely applied to the fields of government systems, medical analysis, financial risk management and control, digital advertising, logistics management and the like.
In federal learning, data may originate from a wide variety of terminal devices, and there may be a great difference in their computing power, network bandwidth, time for which the devices may participate in the computation, etc., that is, a problem of device heterogeneity, which may seriously affect the efficiency of federal learning. In order to deal with the problem of device heterogeneity, scholars have proposed many solutions, such as control node (client) selection, data dimension reduction, gradient compression, model segmentation, asynchronous and semi-synchronous federal learning, etc. If the computing power of most clients in the network is weak, the data dimension reduction is a more suitable processing method.
Principal component analysis (Principal Component Analysis, PCA) is one of the most widely used data dimension reduction techniques, and it recombines vectors with certain correlation into a new set of vectors independent of each other by means of orthotopic transformation, and then derives a few principal components to achieve the purpose of dimension reduction processing. Of course, PCA technology is also used in federal learning to care for protecting the user's private information. The homomorphic encryption is used for calculating the covariance matrix and the garbled circuit is used for carrying out characteristic decomposition of the covariance matrix, so that the operation efficiency is low. The PCA security calculation is realized by combining homomorphic encryption technology and differential privacy, and the operation efficiency is lower because the homomorphic encryption is used for matrix operation. Through the noise sharing scheme, the sharing noise is added on the local covariance matrix, so that the privacy information of the user is protected. In order to reduce the distortion of the principal component, the total noise needs to be controlled within a certain range, but when the number of clients is large, the noise allocated to each client is particularly small, and there is a risk of disclosure of the privacy of the user. And the adopted normalization method is not a standard normalization method, and can cause inconsistent scaling of different data points, so that the result of PCA can be distorted.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a privacy protection PCA method, a terminal and a storage medium, local covariance matrix information is sent to two servers which are not mutually communicated in a secret sharing mode, and noise is added to a calculation result of one server and is sent to the other server. The other party can calculate the total covariance matrix after adding the noise, so that the privacy of the user can be protected while adding smaller noise. In addition, for data normalization, combining differential privacy and OT protocol, the mean value and the range of data points are obtained in a secret mode, and then the data normalization is carried out by using the mean value and the range.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in a first aspect, in one embodiment provided by the present invention, there is provided a privacy preserving PCA method comprising the steps of:
each client performs summation of local data points, maximum value and minimum value of data points and data point number informationSplitting and respectively sending to a server
Figure SMS_1
Server device
Figure SMS_2
Obtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated server
Figure SMS_3
Obtaining the number and the average value of the overall data points, and solving the range of the overall data points by combining an unintentional transmission (Oblivious Transfer is OT) technology, wherein the client normalizes the data points by using the average value and the range;
the client uses the normalized data to calculate a local covariance matrix, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively
Figure SMS_4
Server device
Figure SMS_5
Calculating a secret shared value of the overall covariance matrix, and obtaining a secret shared value of the overall covariance matrix by a server
Figure SMS_6
Adding noise to the calculation result, server
Figure SMS_7
Transmitting the result after adding noise to a server
Figure SMS_8
Server
Figure SMS_9
Obtaining an overall covariance matrix after adding noise;
server device
Figure SMS_10
Singular value decomposition is carried out on a covariance matrix containing noise, and the maximum absolute value is obtained
Figure SMS_11
Feature vectors corresponding to the feature values;
and sending the feature vector to the client, and reducing the dimension of the local data by the client.
As a further aspect of the present invention, the server
Figure SMS_12
The two are not communicated with each other.
As a further scheme of the invention, the data number of each client is respectively
Figure SMS_13
. Recording device
Figure SMS_14
. Set the first
Figure SMS_15
The data of each client is
Figure SMS_16
Wherein
Figure SMS_17
I.e. the dimensions of all data are
Figure SMS_18
As a further scheme of the invention, each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and respectively sends the split data points to the server
Figure SMS_19
The method comprises the steps of carrying out a first treatment on the surface of the Also included before is:
server through DH protocol
Figure SMS_20
Respectively carrying out key exchange with each client, and establishing a key used for data transmission;
set up the server
Figure SMS_21
And the first
Figure SMS_22
The key between individual clients is
Figure SMS_23
Server
Figure SMS_24
And the first
Figure SMS_25
The key between individual clients is
Figure SMS_26
As a further aspect of the present invention, the server
Figure SMS_27
Obtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated server
Figure SMS_28
Obtaining the number and the average value of the overall data points, and obtaining the range of the overall data points by combining the OT technology, and normalizing the data points by the client side by using the average value and the range, wherein the method comprises the following steps of:
each client sums up local data points
Figure SMS_29
Maximum of data points
Figure SMS_30
Minimum value
Figure SMS_31
Number of data
Figure SMS_32
Splitting both of them into two parts according to additive secret sharing, and splitting the two parts
Figure SMS_33
Respectively using the key encryption and then respectively sending to the server
Figure SMS_34
Server device
Figure SMS_35
Decrypting the encrypted data sent by the client, and summing the decrypted secret values of the data points and the number of the data points to obtain secret sharing values of all the data points and the number of the data points;
server device
Figure SMS_37
Sum all data points calculated
Figure SMS_40
Number of data points
Figure SMS_41
Sending the secret sharing value to the server
Figure SMS_38
Then by the server
Figure SMS_39
Calculate all data points
Figure SMS_42
Number of data points
Figure SMS_43
From them, the mean value of the data points is then determined
Figure SMS_36
Most, at bestThen the average value and the number of the data points are sent to each client;
server device
Figure SMS_44
The range of all data points is obtained by combining 1-out-of-N OT, and then the range is sent to each client;
after the average value, the number and the range of all the data points received by each client are calculated, the data are normalized, and then the coordinates of the data points are divided by the data points uniformly
Figure SMS_45
As a further aspect of the present invention, a server
Figure SMS_46
The range of all data points is found in combination with 1-out-of-N OT, comprising the steps of:
a. server device
Figure SMS_47
Calculation of
Figure SMS_48
Is a value of (2), server
Figure SMS_49
Calculation of
Figure SMS_50
Then the server
Figure SMS_51
Will be
Figure SMS_52
Is sent to the server
Figure SMS_53
b. Server device
Figure SMS_54
Preliminary judgment
Figure SMS_55
And (3) with
Figure SMS_56
Is of a size of (2);
c. server device
Figure SMS_57
By the method
Figure SMS_58
Comparison of
Figure SMS_59
And (3) with
Figure SMS_60
To further obtain the size of (1)
Figure SMS_61
And (3) with
Figure SMS_62
Is of a size of (2);
d. secret shared value sum using maximum corresponding to current maximum index
Figure SMS_63
Calculated according to the steps a-c to obtain
Figure SMS_64
The maximum index of (2) and so on until
Figure SMS_65
Obtaining
Figure SMS_66
The index of the maximum value in (1) is set as
Figure SMS_67
e. Can be obtained by a method similar to the steps a-d
Figure SMS_68
The index of the minimum value in (1) is set as
Figure SMS_69
f. Server device
Figure SMS_70
Calculation of
Figure SMS_71
Server
Figure SMS_72
Calculation of
Figure SMS_73
Then the server
Figure SMS_74
Will be
Figure SMS_75
Is sent to the server
Figure SMS_76
g. Server device
Figure SMS_77
Calculating the polar difference of the first coordinate of the data point, i.e
Figure SMS_78
i. According to the steps a-g, the polar difference of each coordinate of the data point is respectively obtained.
As a further aspect of the present invention, the server
Figure SMS_79
Preliminary judgment
Figure SMS_80
And (3) with
Figure SMS_81
The size of (2) is as follows:
Figure SMS_82
and
Figure SMS_83
are all greater than or equal to 0, can obtain
Figure SMS_84
Turning to step d, the process proceeds to step d,
Figure SMS_85
and
Figure SMS_86
are all 0 or less, can obtain
Figure SMS_87
Turning to step d, the process proceeds to step d,
and otherwise, the process goes to the step c.
As a further aspect of the present invention, the server
Figure SMS_88
By the method
Figure SMS_89
Comparison of
Figure SMS_90
And (3) with
Figure SMS_91
To further obtain the size of (1)
Figure SMS_92
And (3) with
Figure SMS_93
The size of (2) is as follows:
if it is
Figure SMS_94
Can obtain
Figure SMS_95
The process proceeds to step d of the method,
if it is
Figure SMS_96
Can obtain
Figure SMS_97
And (d) switching to the step (d).
As a further scheme of the invention, the client uses the normalized data to calculate a local covariance matrix, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively
Figure SMS_98
The method specifically comprises the following steps:
each client calculates a local covariance matrix, decomposes the covariance matrix into a sum of two matrices by utilizing additive secret sharing, encrypts the matrices by using corresponding keys respectively and then sends the encrypted matrices to a server
Figure SMS_99
As a further aspect of the invention, the local covariance matrix is according to
Figure SMS_101
Solving, wherein k represents the number of data volumes of each client; then matrix is formed
Figure SMS_104
Is decomposed into
Figure SMS_108
Using keys
Figure SMS_102
For a pair of
Figure SMS_103
Encrypted and sent to the client
Figure SMS_105
Using keys
Figure SMS_106
For a pair of
Figure SMS_100
Encrypted and sent to the client
Figure SMS_107
As a further aspect of the present invention, the server
Figure SMS_109
Decrypting the secret sharing values of the local covariance matrix sent by the client respectively, and summing the secret sharing values to obtain the secret sharing value of the overall covariance matrix.
As a further aspect of the present invention, the server
Figure SMS_110
Adding symmetric noise matrix meeting Gaussian mechanism to total covariance matrix of secret sharing, and sending result to server
Figure SMS_111
As a further aspect of the present invention, the server
Figure SMS_114
Secret sharing value and server for own covariance matrix
Figure SMS_116
Adding secret sharing values of the covariance matrix of the added noise, and summing to obtain a covariance matrix of the overall added noise; SVD is carried out to obtain a group of feature values which are arranged in descending order
Figure SMS_121
Wherein, the method comprises the steps of, wherein,
Figure SMS_113
a diagonal matrix is represented and,
Figure SMS_115
representing specific characteristic values; maximum is taken
Figure SMS_118
Feature vectors corresponding to the feature values
Figure SMS_120
Wherein
Figure SMS_112
Is that
Figure SMS_117
Corresponding feature vectors; will be
Figure SMS_119
To each client.
As a further aspect of the present invention, the data dimension reduction uses the formula
Figure SMS_122
In a second aspect, in yet another embodiment of the present invention, a privacy preserving PCA system is provided for applying to the above-mentioned privacy preserving PCA method.
In a third aspect, in yet another embodiment provided by the present invention, a terminal is provided, comprising a memory storing a computer program and a processor implementing steps of a PCA method of privacy protection when loading and executing the computer program.
In a fourth aspect, in yet another embodiment provided by the present invention, there is provided a storage medium storing a computer program which, when loaded and executed by a processor, performs the steps of the PCA method of privacy protection.
The technical scheme provided by the invention has the following beneficial effects:
the PCA method, the system, the terminal and the storage medium for privacy protection provided by the invention have the advantages that for the preprocessing of data, an average normalization method is used, so that the influence caused by different dimensions can be effectively eliminated, in the process, the secret sharing and OT technology are combined, the average value and the extremely poor of data points are calculated in a secret manner, and the privacy data of a user are effectively protected. The secret sharing and differential privacy technology is used, the local covariance matrix is split, noise is added to the secret sharing value of the overall covariance matrix, and privacy protection is carried out on the local covariance matrix and the overall covariance matrix. In addition, the secret sharing value between the client and the server is transmitted by using an encryption algorithm, so that an eavesdropper can be prevented from acquiring the information. The invention reduces the dimension of the data in the federal learning, and can effectively improve the training speed of the federal learning.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a PCA method of privacy protection in accordance with one embodiment of the present invention;
FIG. 2 is a flowchart showing a step S20 in the PCA method of privacy preserving in accordance with one embodiment of the present invention;
fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
In the figure: processor-701, communication interface-702, memory-703, communication bus-704.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In particular, embodiments of the present invention are further described below with reference to the accompanying drawings.
The symbols represent:
Figure SMS_127
representing a collection
Figure SMS_133
Figure SMS_144
Reading as
Figure SMS_125
Mould
Figure SMS_131
Representing
Figure SMS_135
Divided by
Figure SMS_139
Is used in the remainder of the (c) program,
Figure SMS_124
representation of
Figure SMS_138
Divided by
Figure SMS_141
The remainder of (2) is the same.
Figure SMS_148
Representing the real space of the real number,
Figure SMS_126
representation of
Figure SMS_132
The wieuro space. We generally use lowercase letters (e.g
Figure SMS_140
) Representing scalar quantities by
Figure SMS_147
Is expressed in terms of vectors, in capital letters (e.g
Figure SMS_129
) Representing the matrix.
Figure SMS_134
Representing the two norms of the vector, if
Figure SMS_149
Then
Figure SMS_150
Figure SMS_123
Representing the spectral norms of the matrix, i.e.
Figure SMS_137
Wherein
Figure SMS_143
Representation of
Figure SMS_146
Is the maximum eigenvalue of (c).
Figure SMS_128
Representation of
Figure SMS_136
And (3) with
Figure SMS_142
And performing bit-wise exclusive OR operation.
Figure SMS_145
Representing a sign function, which takes the value of
Figure SMS_130
Referring to fig. 1, fig. 1 is a flowchart of a privacy preserving PCA method according to an embodiment of the present invention, as shown in fig. 1, the privacy preserving PCA method includes steps S10 to S60. The method application and the client and server have a plurality of clients
Figure SMS_151
Is a system of (a).
S10, each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and sends the split information to the server respectively
Figure SMS_152
Wherein the server
Figure SMS_153
The two are not communicated with each other.
Wherein the data number of each client is respectively
Figure SMS_154
. Recording device
Figure SMS_155
. Set the first
Figure SMS_156
The data of each client is
Figure SMS_157
Wherein
Figure SMS_158
I.e. the dimensions of all data are
Figure SMS_159
In the embodiment of the present invention, the step S10 of each client splitting the sum of the local data points, the maximum and minimum values of the data points, and the data point number information, and sending the split information to the server
Figure SMS_160
The method comprises the steps of carrying out a first treatment on the surface of the Also included before is:
server through DH protocol
Figure SMS_161
And respectively carrying out key exchange with each client to establish a key used for data transfer.
Set up the server
Figure SMS_162
And the first
Figure SMS_163
The key between individual clients is
Figure SMS_164
Server
Figure SMS_165
And the first
Figure SMS_166
The key between individual clients is
Figure SMS_167
S20, clothingServer
Figure SMS_168
Obtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated server
Figure SMS_169
Obtaining the number and the average value of the overall data points, solving the range of the overall data points by combining the OT technology, and normalizing the data points by the client side through the average value and the range.
In the embodiment of the present invention, referring to fig. 2, the S20 server
Figure SMS_170
Obtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated server
Figure SMS_171
Obtaining the number and the average value of the overall data points, and obtaining the range of the overall data points by combining the OT technology, and normalizing the data points by the client side by using the average value and the range, wherein the method comprises the following steps of:
s201, each client obtains the sum of local data points, the maximum value, the minimum value and the data number of the data points, divides the data points into two parts according to the additive secret sharing, encrypts the two parts by using a secret key respectively, and then sends the encrypted two parts to a server respectively
Figure SMS_172
In the first place
Figure SMS_173
For example, a client first finds the sum of its local data points, the maximum and minimum of the data points, i.e
Figure SMS_174
Figure SMS_175
Figure SMS_176
Wherein T represents the transpose matrix;
then according to secret sharing
Figure SMS_177
Are split into the sum of two vectors, i.e
Figure SMS_178
Figure SMS_179
Figure SMS_180
And the number of the data is also split to obtain
Figure SMS_181
Finally, the key is used
Figure SMS_182
For a pair of
Figure SMS_183
After encryption, sending to the server
Figure SMS_184
Using keys
Figure SMS_185
For a pair of
Figure SMS_186
After encryption, sending to the server
Figure SMS_187
S202, the server A, B decrypts the encrypted data sent by the client, and sums up the secret values of the decrypted data point sum and the data point number to obtain the secret sharing value of all the data point sum and the data point number.
Server device
Figure SMS_188
Receive the first
Figure SMS_189
After data sent by each client, the key is used
Figure SMS_190
Decrypting it to obtain
Figure SMS_191
Then summing the data of all clients to obtain secret sharing value of all data points and data point number
Figure SMS_192
Similarly, the server
Figure SMS_193
Finding another partial secret shared value of all data points and the number of data points
Figure SMS_194
S203, server
Figure SMS_195
Transmitting the calculated secret sharing values of all data points and the number of the data points to a server
Figure SMS_196
Then by the server
Figure SMS_197
And calculating to obtain all data points and the number of the data points, then calculating the average value of the data points according to the data points and finally transmitting the average value and the number of the data points to each client.
Server device
Figure SMS_198
Transmitting information to server
Figure SMS_199
After that, the server
Figure SMS_200
Can calculate and obtain all data points and data point number
Figure SMS_201
Then calculate the data average
Figure SMS_202
S204, server
Figure SMS_203
The range of all data points is found in conjunction with 1-out-of-N OT and then sent to each client.
Wherein, the server
Figure SMS_204
The range of all data points is found in combination with 1-out-of-N OT, comprising the steps of:
a. server device
Figure SMS_205
Calculation of
Figure SMS_206
Is a value of (2), server
Figure SMS_207
Calculation of
Figure SMS_208
Then the server
Figure SMS_209
Will be
Figure SMS_210
Is sent to the server
Figure SMS_211
b. Server device
Figure SMS_212
Preliminary judgment
Figure SMS_213
And (3) with
Figure SMS_214
Is of a size of (a) and (b). The following cases are divided into
I、
Figure SMS_215
And
Figure SMS_216
are all greater than or equal to 0, can obtain
Figure SMS_217
Turning to step d, the process proceeds to step d,
II、
Figure SMS_218
and
Figure SMS_219
are all 0 or less, can obtain
Figure SMS_220
Turning to step d, the process proceeds to step d,
III, otherwise, turning to step c.
c. Server device
Figure SMS_221
By the method
Figure SMS_222
Comparison of
Figure SMS_223
And (3) with
Figure SMS_224
To further obtain the size of (1)
Figure SMS_225
And (3) with
Figure SMS_226
Is of a size of (a) and (b). The following cases are divided into
I. If it is
Figure SMS_227
Can obtain
Figure SMS_228
The process proceeds to step d of the method,
II. If it is
Figure SMS_229
Can obtain
Figure SMS_230
The process proceeds to step d of the method,
d. secret shared value sum using the maximum value corresponding to the current maximum value index (i.e., which value is the largest)
Figure SMS_231
Calculated according to the steps a-c to obtain
Figure SMS_232
The maximum index of (2) and so on until
Figure SMS_233
Can obtain
Figure SMS_234
The index of the maximum value in (1) is set as
Figure SMS_235
e. Can be obtained by a method similar to the steps a-d
Figure SMS_236
The index of the minimum value in (1) is set as
Figure SMS_237
f. Server device
Figure SMS_238
Calculation of
Figure SMS_239
Server
Figure SMS_240
Calculation of
Figure SMS_241
Then the server
Figure SMS_242
Will be
Figure SMS_243
Is sent to the server
Figure SMS_244
g. Server device
Figure SMS_245
Calculating the polar difference of the first coordinate of the data point, i.e
Figure SMS_246
i. According to the steps a-e, the polar difference of each coordinate of the data point is respectively obtained.
S205, after the average value, the number and the range of all the data points received by each client are the same, the data is normalized, namely the process is carried out
Figure SMS_247
. Then uniformly dividing the coordinates of the data points by
Figure SMS_248
Normalized data for each client is obtained.
Since the gaussian mechanism requires that the two norms of each line of data be less than 1, the coordinates of the data points are uniformly divided here by
Figure SMS_249
. The first step after treatment
Figure SMS_250
The data of each client is
Figure SMS_251
Wherein
Figure SMS_252
S30, the client obtains a local covariance matrix by using the normalized data, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively
Figure SMS_253
Specifically, the S30 client obtains a local covariance matrix by using the normalized data, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively
Figure SMS_254
The method specifically comprises the following steps:
each client calculates a local covariance matrix, decomposes the covariance matrix into a sum of two matrices by utilizing additive secret sharing, encrypts the matrices by using corresponding keys respectively and then sends the encrypted matrices to a server
Figure SMS_255
Local covariance matrix basis
Figure SMS_258
Solving, wherein k represents what number of data amounts are per client, k=j. Then matrix is formed
Figure SMS_261
Is decomposed into
Figure SMS_263
Using keys
Figure SMS_257
For a pair of
Figure SMS_260
Encrypted and sent to the client
Figure SMS_262
Using keys
Figure SMS_264
For a pair of
Figure SMS_256
Encrypted and sent to the client
Figure SMS_259
S40, server
Figure SMS_265
Calculating a secret shared value of the overall covariance matrix, and obtaining a secret shared value of the overall covariance matrix by a server
Figure SMS_266
Adding noise to the calculation result, server
Figure SMS_267
Transmitting the result after adding noise to a server
Figure SMS_268
Server
Figure SMS_269
And obtaining the overall covariance matrix after adding noise.
In an embodiment of the present invention,server device
Figure SMS_270
Decrypting the secret sharing values of the local covariance matrix sent by the client respectively, and summing the secret sharing values to obtain the secret sharing value of the overall covariance matrix.
Server device
Figure SMS_271
Using secret keys
Figure SMS_274
For a pair of
Figure SMS_276
Decryption to obtain
Figure SMS_273
Server
Figure SMS_275
Using secret keys
Figure SMS_277
For a pair of
Figure SMS_278
Decryption and then find
Figure SMS_272
In an embodiment of the invention, the server
Figure SMS_279
Adding a symmetric noise matrix satisfying a Gaussian mechanism to the overall covariance matrix of its secret sharing, and then transmitting the result to a server
Figure SMS_280
Given the requirements of differential privacy
Figure SMS_281
After that, get
Figure SMS_282
From distribution of
Figure SMS_283
The noise is extracted, the upper triangle element of the matrix is generated, then a symmetrical noise distribution matrix is generated, and the noise distribution matrix is set as
Figure SMS_284
Then the server
Figure SMS_285
Calculated to obtain
Figure SMS_286
And send it to the server
Figure SMS_287
S50, server
Figure SMS_288
Singular value decomposition (Singular Value Decomposition is called SVD for short) is carried out on the covariance matrix containing noise, and the maximum absolute value is obtained
Figure SMS_289
And feature vectors corresponding to the feature values.
In an embodiment of the present invention, the server
Figure SMS_292
Secret sharing value and server for own covariance matrix
Figure SMS_294
And the secret shared values of the covariance matrix added with the noise are summed to obtain the covariance matrix of the overall body added with the noise. SVD is carried out to obtain a group of feature values which are arranged in descending order
Figure SMS_296
Wherein, the method comprises the steps of, wherein,
Figure SMS_291
a diagonal matrix is represented and,
Figure SMS_295
representing specific characteristic values; maximum is taken
Figure SMS_298
Feature vectors corresponding to the feature values
Figure SMS_299
Wherein
Figure SMS_290
Is that
Figure SMS_293
Corresponding feature vectors; will be
Figure SMS_297
To each client.
Server device
Figure SMS_300
Can obtain
Figure SMS_301
Due to covariance matrix
Figure SMS_302
Is a symmetric matrix of the type,
Figure SMS_303
is also a symmetric matrix, so
Figure SMS_304
Also a symmetric matrix.
And S60, sending the feature vector to the client, and reducing the dimension of the local data by the client.
Data dimension reduction using formula
Figure SMS_305
Among them, secret sharing is an important technology in multiparty security computing, and is widely used in the field of privacy protection because it is relatively simple to use. Only two-way additive secret sharing is used in the present invention, which is briefly described below.
In additive secret sharing, data
Figure SMS_306
Is randomly divided into two data
Figure SMS_307
Sum, i.e
Figure SMS_308
Then
Figure SMS_309
Respectively stored in the participants
Figure SMS_310
. Must combine participants
Figure SMS_311
Can the data of (a) be recovered
Figure SMS_312
Can not hinder the establishment of participants
Figure SMS_313
Possession data
Figure SMS_314
Another party
Figure SMS_315
Possession data
Figure SMS_316
. The additive secret sharing may be implemented as follows.
1、
Figure SMS_317
Generating random numbers
Figure SMS_318
And calculate to obtain
Figure SMS_319
Then will
Figure SMS_320
Send to
Figure SMS_321
2、
Figure SMS_322
Generating random numbers
Figure SMS_323
And calculate to obtain
Figure SMS_324
Then will
Figure SMS_325
Send to
Figure SMS_326
3、
Figure SMS_327
Calculated to obtain
Figure SMS_328
Figure SMS_329
Calculated to obtain
Figure SMS_330
At this time, calculate to obtain
Figure SMS_331
And (3) with
Figure SMS_332
And only need to
Figure SMS_333
And
Figure SMS_334
summarizing to the demand side, and obtaining the result of
Figure SMS_335
. It can be seen that no data is leaked during this process
Figure SMS_336
And (3) with
Figure SMS_337
The data privacy is well protected.
In addition, if data information is eavesdropped during the data transmission process, the data privacy still has the risk of leakage, the data can be encrypted before the sender sends the data, and for the sake of simplicity of calculation, a traditional symmetric encryption algorithm can be used. With respect to the transmission of the key, we use Diffie-Hellman (hereinafter referred to as DH) key exchange method. The DH key exchange protocol is briefly described next.
Alice and Bob want to share a single key for symmetric encryption. But the communication channel between them is not secure. All information passing through this channel is adversary: eve sees. How they exchange information, so that Eve does not know the key
Figure SMS_338
The security of the DH algorithm depends on the degree of difficulty in computing discrete logarithms. The concept of primitive root is needed in the following schemes, and we give a definition of this.
Definition 1: if it is made to
Figure SMS_339
Least positive power of establishment
Figure SMS_340
Satisfy the following requirements
Figure SMS_341
Then call it
Figure SMS_342
Is that
Figure SMS_343
Is a primitive root of (1). Wherein the method comprises the steps of
Figure SMS_344
Is an Euler function.
Thus, for any integer
Figure SMS_345
Sum prime number
Figure SMS_346
Primitive root of (C)
Figure SMS_347
With a unique power
Figure SMS_348
So that
Figure SMS_349
. The discrete logarithm difficulty problem is given by
Figure SMS_350
To calculate
Figure SMS_351
Is difficult.
The following is a DH protocol scheme:
alice and Bob first pair
Figure SMS_352
And
Figure SMS_353
agree on that
Figure SMS_354
Is a large prime number, and the number of the prime numbers is,
Figure SMS_355
is that
Figure SMS_356
And will be
Figure SMS_357
And
Figure SMS_358
is disclosed. Eve also knows their values.
Alice takes a private integer
Figure SMS_359
Not let anyone know, send Bob the calculation result:
Figure SMS_360
. Eve also sees
Figure SMS_361
Is a value of (2).
3. Similarly, bob takes a private integer
Figure SMS_362
Send the result to Alice
Figure SMS_363
. Eve will also see the delivery
Figure SMS_364
What is.
Alice calculates
Figure SMS_365
Bob can also calculate
Figure SMS_366
Alice and Bob now have a common key
Figure SMS_367
. Although Eve sees
Figure SMS_368
However, in view of the difficulty in computing discrete logarithms, she cannot know
Figure SMS_369
And
Figure SMS_370
specific values of (3). Eve is unaware of the key
Figure SMS_371
What is.
Differential privacy is an effective means of preventing differential attacks, and by adding a proper amount of noise to the statistical result, it is ensured that the statistical result will not change significantly after modifying (including adding and deleting) a record in the dataset.
Definition 2:
Figure SMS_372
differential privacy. Data set
Figure SMS_373
And (3) with
Figure SMS_374
Most different one record (neighbor dataset), given an algorithm
Figure SMS_375
For all of
Figure SMS_376
All have
Figure SMS_377
Then call algorithm
Figure SMS_378
Satisfy the following requirements
Figure SMS_379
Differential privacy, parameters
Figure SMS_380
For the purpose of a privacy budget,
Figure SMS_381
the smaller the privacy protection level is, the higher the privacy protection level is. When the parameter is
Figure SMS_382
When it is called
Figure SMS_383
Satisfy the following requirements
Figure SMS_384
Differential privacy.
If an algorithm
Figure SMS_385
. For any pair of all neighbor datasets
Figure SMS_386
Is called as
Figure SMS_387
Is that
Figure SMS_388
A kind of electronic device
Figure SMS_389
Sensitivity.
We use
Figure SMS_390
Representation matrix
Figure SMS_391
Is the first of (2)
Figure SMS_392
Line, set up
Figure SMS_393
I.e. the two norms of each row are at most 1, we mark the set of all matrices satisfying this condition as
Figure SMS_394
The gaussian mechanism is one privacy protection mechanism commonly used in differential privacy. Regarding the gaussian mechanism, the following theorem is given.
Theorem 1: is provided with
Figure SMS_395
Is a vector value function, let
Figure SMS_396
The gaussian mechanism is distributed from random
Figure SMS_397
Extracting noise and adding to
Figure SMS_398
Can ensure that at each output of (a)
Figure SMS_399
Differential privacy.
The function of interest is
Figure SMS_400
It can be regarded as
Figure SMS_401
Vector of dimensions. Due to
Figure SMS_402
The sensitivity of the matrix can be obtained to be at most 1, so that we can directly select
Figure SMS_403
OT is one of the most basic protocols for multiparty secure computing, and schemes such as a garbled circuit, a zero knowledge proof protocol and the like can be constructed by using OT. We directly introduce the 1-out-of-N OT protocol. It is used for solving the following problems:
alice owns
Figure SMS_404
Numerical value of
Figure SMS_405
Bob wants to know one of them
Figure SMS_406
By executing the OT protocol, bob can acquire
Figure SMS_407
But cannot obtain the value of (2)
Figure SMS_408
Is a value of (2). While Alice does not know what value Bob obtained, i.e. Alice does not know
Figure SMS_409
Is a value of (2).
The 1-out-of-N OT can be implemented as follows:
1. and (5) a preparation stage. Protocol is large prime number in order
Figure SMS_410
Is a group of (3)
Figure SMS_411
The upper operations (i.e. the results of the operations in the present protocol are all modulo
Figure SMS_412
Results in the sense), select group
Figure SMS_413
Is a primitive root of (a)
Figure SMS_414
. Selecting a random predictive function
Figure SMS_415
(e.g., SHA-1). Parameters (parameters)
Figure SMS_416
Shared by Alice and Bob.
2. An initialization stage: alice selection
Figure SMS_418
Random number
Figure SMS_420
. ThenSelecting a random number
Figure SMS_422
And calculate
Figure SMS_419
Then will
Figure SMS_421
To Bob. Alice precalculation
Figure SMS_423
. (Bob cannot obtain because of discrete logarithm difficulty
Figure SMS_424
Discrete logarithm of (a)
Figure SMS_417
Is a value of (2).
3. On-line computing stage:
a. Bob selects a random number
Figure SMS_425
Setting up
Figure SMS_426
Then will
Figure SMS_427
Sending Alice. (Alice cannot acquire
Figure SMS_428
Is a value of (2).
b. Alice calculation
Figure SMS_429
Then calculate
Figure SMS_430
. Then select a random string
Figure SMS_431
(here
Figure SMS_432
Is chosen long enough to ensure that the Hash values corresponding to two different data are different) for each
Figure SMS_433
Encryption is performed
Figure SMS_434
Then sum the encryption result
Figure SMS_435
To Bob.
c. Bob can calculate to
Figure SMS_436
Then use
Figure SMS_437
Decryption to obtain
Figure SMS_438
The two numbers can then be safely compared in size using this protocol.
Suppose Alice owns the data
Figure SMS_439
Bob owns the data
Figure SMS_440
. And is also provided with
Figure SMS_441
. We can compare with the following steps
Figure SMS_442
Is of a size of (a) and (b).
1. Alice structure
Figure SMS_443
Bar plaintext message
Figure SMS_444
2. Bob obtains through 1-out-of-N OT
Figure SMS_445
Can obtain the result
Figure SMS_446
Bob then sends the size result of both to Alice.
If it is
Figure SMS_449
Is a general real number, can be used
Figure SMS_450
Represented as
Figure SMS_455
In the form of a binary system, e.g.
Figure SMS_448
That is
Figure SMS_452
Integer is
Figure SMS_454
Bits, decimal numbers
Figure SMS_456
Bits.
Figure SMS_447
Can also be expressed as
Figure SMS_451
Form of a system
Figure SMS_453
. The two sizes are then compared starting from the most significant bit,
1. if it is
Figure SMS_457
Then
Figure SMS_458
2. If present
Figure SMS_459
When (when)
Figure SMS_460
Sometimes have
Figure SMS_461
When (when)
Figure SMS_462
Sometimes have
Figure SMS_463
Then
Figure SMS_464
3. If present
Figure SMS_465
When (when)
Figure SMS_466
Sometimes have
Figure SMS_467
When (when)
Figure SMS_468
Sometimes have
Figure SMS_469
Then
Figure SMS_470
We use
Figure SMS_471
Representing the process of comparing the two sizes.
We describe the process of PCA. Let the data set be
Figure SMS_472
Wherein
Figure SMS_473
As a total number of samples,
Figure SMS_474
for the number of attributes, each
Figure SMS_475
Is a piece of data.
If the dimensions of the data are not consistent, the data need to be normalized, we use the mean normalization approach to process,
Figure SMS_476
wherein the method comprises the steps of
Figure SMS_477
. The denominator is very poor of the data. We note this process as
Figure SMS_478
Let the centralized data set be
Figure SMS_479
. Covariance matrix of data set is
Figure SMS_480
For a pair of
Figure SMS_481
SVD is performed to obtain a group of feature values arranged in descending order
Figure SMS_482
Take the maximum
Figure SMS_483
Feature vectors corresponding to the feature values
Figure SMS_484
The data after dimension reduction is
Figure SMS_485
The invention performs average normalization on the client data in a secret manner by means of secret sharing and OT technology, and in the process, the invention effectively protects the maximum value and the minimum value of the private data of the user. The invention also uses secret sharing and differential privacy to carry out privacy protection on the local covariance matrix and the general covariance matrix, and finally realizes PCA, thereby achieving the purpose of data dimension reduction. In addition, a key exchange algorithm is used, and secret sharing values transmitted between the client and the server are encrypted by using the exchanged keys, so that eavesdropping attacks of other people are prevented. Our PCA algorithm can boost the training speed of the Union learning.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In an embodiment, the invention further provides a privacy preserving PCA system, and the privacy preserving PCA method is applied.
In one embodiment, referring to fig. 3, a terminal is further provided in an embodiment of the present invention, where the terminal includes a processor 701, a communication interface 702, a memory 703, and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete communication with each other through the communication bus 704.
A memory 703 for storing a computer program;
the processor 701 is configured to execute the PCA method of privacy protection when executing the computer program stored in the memory 703, and the processor executes the instructions to implement the steps in the method embodiments described above.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral ComponentInterconnect, abbreviated as PCI) bus or an extended industry standard architecture (ExtendedIndustry StandardArchitecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a Network Processor (NP), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC for short), field-programmable gate arrays (Field-ProgrammableGate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The terminal comprises user equipment and network equipment. Wherein the user equipment includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes, but is not limited to, a single network server, a server group of multiple network servers, or a Cloud based Cloud Computing (Cloud Computing) consisting of a large number of computers or network servers, where Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computer sets. The terminal can independently operate to realize the invention, and can also access the network and realize the invention through the interaction operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
The terminal comprises user equipment and network equipment. Wherein the user equipment includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes, but is not limited to, a single network server, a server group of multiple network servers, or a Cloud based Cloud Computing (Cloud Computing) consisting of a large number of computers or network servers, where Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computer sets. The terminal can independently operate to realize the invention, and can also access the network and realize the invention through the interaction operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In one embodiment of the invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.

Claims (12)

1. A method of PCA for privacy protection, the method comprising:
each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and respectively sends the split information to the server
Figure QLYQS_1
The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the servers A, B are not mutually communicated;
server device
Figure QLYQS_2
Obtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; comprehensive server- >
Figure QLYQS_3
Obtaining the number and the average value of the overall data points, and solving the range of the overall data points by combining the OT technology, wherein the client normalizes the data points by using the average value and the range;
the client uses the normalized data to calculate a local covariance matrix, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively
Figure QLYQS_4
Server device
Figure QLYQS_5
Calculating a secret shared value of the overall covariance matrix and obtaining a secret shared value by the server>
Figure QLYQS_6
Adding noise to the calculation result, server +.>
Figure QLYQS_7
Transmitting the result after adding noise to the server +.>
Figure QLYQS_8
Server->
Figure QLYQS_9
Obtaining a general covariance matrix containing noise;
server device
Figure QLYQS_10
Singular value decomposition is performed on the overall covariance matrix containing noise to obtain +.>
Figure QLYQS_11
Feature vectors corresponding to the feature values;
server device
Figure QLYQS_12
The feature vector is sent to a client, and the client performs dimension reduction on the local data;
the method for solving the total data point range by combining the OT technology specifically comprises the following steps of:
a. server device
Figure QLYQS_13
Calculate->
Figure QLYQS_14
Is a value of (2), server->
Figure QLYQS_15
Calculate->
Figure QLYQS_16
Is then server->
Figure QLYQS_17
Will->
Figure QLYQS_18
Send to server->
Figure QLYQS_19
b. Server device
Figure QLYQS_20
Preliminary judgment->
Figure QLYQS_21
And->
Figure QLYQS_22
Is of a size of (2);
c. server device
Figure QLYQS_23
Adopts the method->
Figure QLYQS_24
Comparison->
Figure QLYQS_25
And->
Figure QLYQS_26
To obtain ∈K>
Figure QLYQS_27
And->
Figure QLYQS_28
Is of a size of (2);
d. secret shared value sum using maximum corresponding to current maximum index
Figure QLYQS_29
Calculated according to the steps a-c, the +.>
Figure QLYQS_30
The maximum value index of (2) is thus always back until +.>
Figure QLYQS_31
Obtaining
Figure QLYQS_32
The index of the maximum value in (1) is set to +.>
Figure QLYQS_33
e. According to a method similar to the steps a-d
Figure QLYQS_34
The index of the minimum value of (1) is set to +.>
Figure QLYQS_35
f. Server device
Figure QLYQS_36
Calculate->
Figure QLYQS_37
Server->
Figure QLYQS_38
Calculation of
Figure QLYQS_39
Then server->
Figure QLYQS_40
Will->
Figure QLYQS_41
Send to server->
Figure QLYQS_42
g. Server device
Figure QLYQS_43
Calculating the polar difference of the first coordinate of the data point, i.e. +.>
Figure QLYQS_44
i. According to the steps a-g, the range of each coordinate of the data point is respectively obtained;
wherein the server
Figure QLYQS_45
Preliminary judgment->
Figure QLYQS_46
And->
Figure QLYQS_47
The size of (2) is as follows:
Figure QLYQS_48
and->
Figure QLYQS_49
Are all greater than or equal to 0, get +.>
Figure QLYQS_50
Turning to step d, the process proceeds to step d,
Figure QLYQS_51
and->
Figure QLYQS_52
Are all less than or equal to 0, get +.>
Figure QLYQS_53
Go to step d, & lt & gt>
And otherwise, the process goes to the step c.
2. The method for PCA as in claim 1, wherein the number of data per client is respectively
Figure QLYQS_54
The method comprises the steps of carrying out a first treatment on the surface of the Record->
Figure QLYQS_55
The method comprises the steps of carrying out a first treatment on the surface of the Let go of>
Figure QLYQS_56
The data of the individual clients are +.>
Figure QLYQS_57
Wherein
Figure QLYQS_58
T represents the transposed matrix, all the dimensions of the data are
Figure QLYQS_59
3. The privacy-preserving PCA method of claim 1, wherein each client splits the sum of the local data points, the maximum and minimum values of the data points, and the data point number information and sends the split data point sum, the maximum and minimum values of the data points, and the data point number information to the server respectively
Figure QLYQS_60
The method comprises the steps of carrying out a first treatment on the surface of the Also included before is:
server through DH protocol
Figure QLYQS_61
Respectively carrying out key exchange with each client, and establishing a key used for data transmission;
set up the server
Figure QLYQS_62
And->
Figure QLYQS_63
The key between individual clients is +.>
Figure QLYQS_64
Server->
Figure QLYQS_65
And->
Figure QLYQS_66
The key between individual clients is +.>
Figure QLYQS_67
4. The privacy preserving PCA method of claim 2 wherein the server
Figure QLYQS_68
Obtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; comprehensive server->
Figure QLYQS_69
Obtaining the number and the average value of the overall data points, and obtaining the range of the overall data points by combining the OT technology, and normalizing the data points by the client side by using the average value and the range, wherein the method comprises the following steps of:
each client sums up local data points
Figure QLYQS_70
Maximum ∈of data points>
Figure QLYQS_71
Minimum->
Figure QLYQS_72
Number of data- >
Figure QLYQS_73
Splitting both parts according to additive secret sharing, splitting both parts +.>
Figure QLYQS_74
Encryption using keys respectively, and then sending to the server +.>
Figure QLYQS_75
Server device
Figure QLYQS_76
Decrypting the encrypted data sent by the client, and summing the decrypted secret values of the data points and the number of the data points to obtain secret sharing values of all the data points and the number of the data points;
server device
Figure QLYQS_77
All data points calculated and +.>
Figure QLYQS_80
Number of data points->
Figure QLYQS_82
Secret sharing value sent to server +.>
Figure QLYQS_78
Then by the server->
Figure QLYQS_81
Calculate all data points and +.>
Figure QLYQS_83
Number of data points->
Figure QLYQS_84
From them, the mean value of the data points is then determined
Figure QLYQS_79
Finally, the average value and the number of the data points are sent to each client;
server device
Figure QLYQS_85
The range of all data points is obtained by combining 1-out-of-N OT, and then the range is sent to each client;
after the average value, the number and the range of all the data points received by each client are calculated, the data are normalized, and then the coordinates of the data points are divided by the data points uniformly
Figure QLYQS_86
Obtaining normalized data of each client, i.e. setting the processed +.>
Figure QLYQS_87
The data of the individual clients are +.>
Figure QLYQS_88
Wherein->
Figure QLYQS_89
5. The privacy preserving PCA method of claim 1 wherein the server
Figure QLYQS_90
By the method
Figure QLYQS_91
Comparison->
Figure QLYQS_92
And->
Figure QLYQS_93
To obtain ∈K>
Figure QLYQS_94
And->
Figure QLYQS_95
The size of (2) is as follows:
if it is
Figure QLYQS_96
Then get +.>
Figure QLYQS_97
The process proceeds to step d of the method,
if it is
Figure QLYQS_98
Then get +.>
Figure QLYQS_99
And (d) switching to the step (d).
6. The PCA method of privacy protection as in claim 1, wherein the client uses the normalized data to solve a local covariance matrix, and splits it into two parts in combination with an additive secret sharing, and sends the two parts to the server respectively
Figure QLYQS_100
The method specifically comprises the following steps:
each client calculates a local covariance matrix, decomposes the covariance matrix into a sum of two matrices by utilizing additive secret sharing, encrypts the matrices by using corresponding keys respectively and then sends the encrypted matrices to a server
Figure QLYQS_101
7. The privacy preserving PCA method of claim 4 wherein the local covariance matrix is in terms of
Figure QLYQS_103
Solving, wherein k represents the number of data volumes of each client; then matrix is formed
Figure QLYQS_106
Break down into->
Figure QLYQS_107
Use key +.>
Figure QLYQS_104
For->
Figure QLYQS_105
Encrypted and sent to the client->
Figure QLYQS_109
Use key +.>
Figure QLYQS_110
For->
Figure QLYQS_102
Encrypted and sent to the client->
Figure QLYQS_108
8. The privacy preserving PCA method of claim 7 wherein the server
Figure QLYQS_111
Decrypting the secret sharing values of the local covariance matrix sent by the client respectively, and summing the secret sharing values to obtain the secret sharing value of the overall covariance matrix.
9. The privacy preserving PCA method of claim 8 wherein the server
Figure QLYQS_112
Adding a symmetric noise matrix satisfying a Gaussian mechanism to the total covariance matrix of the secret sharing, and transmitting the result to a server +.>
Figure QLYQS_113
10. The privacy preserving PCA method of claim 9 wherein the server
Figure QLYQS_114
Secret shared value and server for own covariance matrix>
Figure QLYQS_119
Adding secret sharing values of the covariance matrix of the added noise, and summing to obtain a covariance matrix of the overall added noise; SVD is performed to obtain a group of characteristic values in descending order>
Figure QLYQS_122
Wherein->
Figure QLYQS_115
Representing diagonal matrix +.>
Figure QLYQS_117
Representing specific characteristic values; maximum +.>
Figure QLYQS_121
Feature vector corresponding to the individual feature value +.>
Figure QLYQS_123
Wherein
Figure QLYQS_116
Is->
Figure QLYQS_118
Corresponding feature vectors; will->
Figure QLYQS_120
To each client.
11. The privacy preserving PCA method of claim 10, wherein the data dimension reduction uses a formula
Figure QLYQS_124
12. A privacy preserving PCA system applying the privacy preserving PCA method of any of claims 1-11.
CN202211473530.9A 2022-11-23 2022-11-23 PCA method and system for privacy protection Active CN115510502B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211473530.9A CN115510502B (en) 2022-11-23 2022-11-23 PCA method and system for privacy protection
PCT/CN2023/110312 WO2024109149A1 (en) 2022-11-23 2023-07-31 Principal component analysis method and system for privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211473530.9A CN115510502B (en) 2022-11-23 2022-11-23 PCA method and system for privacy protection

Publications (2)

Publication Number Publication Date
CN115510502A CN115510502A (en) 2022-12-23
CN115510502B true CN115510502B (en) 2023-05-26

Family

ID=84514083

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211473530.9A Active CN115510502B (en) 2022-11-23 2022-11-23 PCA method and system for privacy protection

Country Status (2)

Country Link
CN (1) CN115510502B (en)
WO (1) WO2024109149A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117439731B (en) * 2023-12-21 2024-03-12 山东大学 Privacy protection big data principal component analysis method and system based on homomorphic encryption

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113300828B (en) * 2021-05-27 2022-07-05 南开大学 Distributed differential privacy aggregation method
CN113949501A (en) * 2021-09-08 2022-01-18 天翼电子商务有限公司 Semi-homomorphic encryption-based transversely distributed PCA dimension reduction method
CN113904874B (en) * 2021-11-30 2022-03-04 北京中超伟业信息安全技术股份有限公司 Unmanned aerial vehicle data secure transmission method

Also Published As

Publication number Publication date
WO2024109149A1 (en) 2024-05-30
CN115510502A (en) 2022-12-23

Similar Documents

Publication Publication Date Title
WO2022237450A1 (en) Secure multi-party computation method and apparatus, and device and storage medium
CN112182649A (en) Data privacy protection system based on safe two-party calculation linear regression algorithm
WO2018184407A1 (en) K-means clustering method and system having privacy protection
Liu et al. Intelligent and secure content-based image retrieval for mobile users
Liu et al. Secure multi-label data classification in cloud by additionally homomorphic encryption
Erkin et al. Privacy-preserving distributed clustering
CN115510502B (en) PCA method and system for privacy protection
CN115842627A (en) Decision tree evaluation method, device, equipment and medium based on secure multi-party computation
CN111259440B (en) Privacy protection decision tree classification method for cloud outsourcing data
Liu et al. Privacy preserving pca for multiparty modeling
Zheng et al. Towards secure and practical machine learning via secret sharing and random permutation
Zhao et al. SGBoost: An efficient and privacy-preserving vertical federated tree boosting framework
CN114564730A (en) Symmetric encryption-based federal packet statistic calculation method, device and medium
Zhao et al. VFLR: An efficient and privacy-preserving vertical federated framework for logistic regression
CN117353912A (en) Three-party privacy set intersection base number calculation method and system based on bilinear mapping
Wang et al. Face detection for privacy protected images
CN116094686B (en) Homomorphic encryption method, homomorphic encryption system, homomorphic encryption equipment and homomorphic encryption terminal for quantum convolution calculation
CN116743376A (en) Multiparty secret sharing data privacy comparison method based on efficient ciphertext confusion technology
CN116681141A (en) Federal learning method, terminal and storage medium for privacy protection
CN116170142A (en) Distributed collaborative decryption method, device and storage medium
Zhou et al. Toward scalable and privacy-preserving deep neural network via algorithmic-cryptographic co-design
CN115150060A (en) Data privacy protection method based on secure multi-party clustering method
CN115564447A (en) Credit card transaction risk detection method and device
CN115333789A (en) Privacy protection intersection calculation method and device based on large-scale data set in asymmetric mode
CN114358323A (en) Third-party-based efficient Pearson coefficient calculation method in federated learning environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant