CN115510502B - PCA method and system for privacy protection - Google Patents
PCA method and system for privacy protection Download PDFInfo
- Publication number
- CN115510502B CN115510502B CN202211473530.9A CN202211473530A CN115510502B CN 115510502 B CN115510502 B CN 115510502B CN 202211473530 A CN202211473530 A CN 202211473530A CN 115510502 B CN115510502 B CN 115510502B
- Authority
- CN
- China
- Prior art keywords
- server
- data points
- data
- client
- covariance matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 239000011159 matrix material Substances 0.000 claims abstract description 88
- 239000013598 vector Substances 0.000 claims abstract description 22
- 238000004364 calculation method Methods 0.000 claims abstract description 18
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 6
- 230000008569 process Effects 0.000 claims description 18
- 239000000654 additive Substances 0.000 claims description 15
- 230000000996 additive effect Effects 0.000 claims description 15
- 238000005516 engineering process Methods 0.000 claims description 15
- 230000009467 reduction Effects 0.000 claims description 10
- 230000007246 mechanism Effects 0.000 claims description 8
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000012549 training Methods 0.000 abstract description 3
- 238000004891 communication Methods 0.000 description 12
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 238000010606 normalization Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000007726 management method Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000000513 principal component analysis Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 241000287196 Asthenes Species 0.000 description 1
- 241000764238 Isis Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0816—Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
- H04L9/085—Secret sharing or secret splitting, e.g. threshold schemes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Storage Device Security (AREA)
Abstract
The invention relates to the technical field of information, in particular to a PCA method and a system for privacy protection. The method comprises the following steps: each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and respectively sends the split information to the serverThe method comprises the steps of carrying out a first treatment on the surface of the Server deviceCalculating a secret shared value of the overall covariance matrix, and obtaining a secret shared value of the overall covariance matrix by a serverAdding noise to the calculation result, serverTransmitting the result after adding noise to a serverServerObtaining an overall covariance matrix after adding noise; server deviceSingular value decomposition is carried out on a covariance matrix containing noise, and the maximum absolute value is obtainedFeature vectors corresponding to the feature values; and sending the feature vector to the client, and reducing the dimension of the local data by the client. The invention reduces the dimension of the data in the federal learning, and can effectively improve the training speed of the federal learning.
Description
Technical Field
The present invention relates to the field of information technologies, and in particular, to a privacy preserving PCA method, system, terminal, and storage medium.
Background
With the development of artificial intelligence, machine learning is widely applied to various fields such as recommendation systems, spam filtering, face recognition, and the like. In recent years, people pay attention to personal privacy, and the conventional machine learning mode is easy to cause disclosure of personal privacy information. In addition, based on the reasons of business secrets, enterprises often only allow data to be mastered in own hands, so that the problem of data island is inevitably generated, federal learning is used as a machine learning technology based on cloud computing, multi-party joint learning can be performed on the premise of protecting private information of users, and the method is widely applied to the fields of government systems, medical analysis, financial risk management and control, digital advertising, logistics management and the like.
In federal learning, data may originate from a wide variety of terminal devices, and there may be a great difference in their computing power, network bandwidth, time for which the devices may participate in the computation, etc., that is, a problem of device heterogeneity, which may seriously affect the efficiency of federal learning. In order to deal with the problem of device heterogeneity, scholars have proposed many solutions, such as control node (client) selection, data dimension reduction, gradient compression, model segmentation, asynchronous and semi-synchronous federal learning, etc. If the computing power of most clients in the network is weak, the data dimension reduction is a more suitable processing method.
Principal component analysis (Principal Component Analysis, PCA) is one of the most widely used data dimension reduction techniques, and it recombines vectors with certain correlation into a new set of vectors independent of each other by means of orthotopic transformation, and then derives a few principal components to achieve the purpose of dimension reduction processing. Of course, PCA technology is also used in federal learning to care for protecting the user's private information. The homomorphic encryption is used for calculating the covariance matrix and the garbled circuit is used for carrying out characteristic decomposition of the covariance matrix, so that the operation efficiency is low. The PCA security calculation is realized by combining homomorphic encryption technology and differential privacy, and the operation efficiency is lower because the homomorphic encryption is used for matrix operation. Through the noise sharing scheme, the sharing noise is added on the local covariance matrix, so that the privacy information of the user is protected. In order to reduce the distortion of the principal component, the total noise needs to be controlled within a certain range, but when the number of clients is large, the noise allocated to each client is particularly small, and there is a risk of disclosure of the privacy of the user. And the adopted normalization method is not a standard normalization method, and can cause inconsistent scaling of different data points, so that the result of PCA can be distorted.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a privacy protection PCA method, a terminal and a storage medium, local covariance matrix information is sent to two servers which are not mutually communicated in a secret sharing mode, and noise is added to a calculation result of one server and is sent to the other server. The other party can calculate the total covariance matrix after adding the noise, so that the privacy of the user can be protected while adding smaller noise. In addition, for data normalization, combining differential privacy and OT protocol, the mean value and the range of data points are obtained in a secret mode, and then the data normalization is carried out by using the mean value and the range.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
in a first aspect, in one embodiment provided by the present invention, there is provided a privacy preserving PCA method comprising the steps of:
each client performs summation of local data points, maximum value and minimum value of data points and data point number informationSplitting and respectively sending to a server;
Server deviceObtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated server Obtaining the number and the average value of the overall data points, and solving the range of the overall data points by combining an unintentional transmission (Oblivious Transfer is OT) technology, wherein the client normalizes the data points by using the average value and the range;
the client uses the normalized data to calculate a local covariance matrix, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively;
Server deviceCalculating a secret shared value of the overall covariance matrix, and obtaining a secret shared value of the overall covariance matrix by a serverAdding noise to the calculation result, serverTransmitting the result after adding noise to a serverServerObtaining an overall covariance matrix after adding noise;
server deviceSingular value decomposition is carried out on a covariance matrix containing noise, and the maximum absolute value is obtainedFeature vectors corresponding to the feature values;
and sending the feature vector to the client, and reducing the dimension of the local data by the client.
As a further aspect of the present invention, the serverThe two are not communicated with each other.
As a further scheme of the invention, the data number of each client is respectively. Recording device. Set the firstThe data of each client isWherein
As a further scheme of the invention, each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and respectively sends the split data points to the serverThe method comprises the steps of carrying out a first treatment on the surface of the Also included before is:
server through DH protocolRespectively carrying out key exchange with each client, and establishing a key used for data transmission;
set up the serverAnd the firstThe key between individual clients isServerAnd the firstThe key between individual clients is。
As a further aspect of the present invention, the serverObtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated serverObtaining the number and the average value of the overall data points, and obtaining the range of the overall data points by combining the OT technology, and normalizing the data points by the client side by using the average value and the range, wherein the method comprises the following steps of:
each client sums up local data pointsMaximum of data pointsMinimum valueNumber of dataSplitting both of them into two parts according to additive secret sharing, and splitting the two partsRespectively using the key encryption and then respectively sending to the server ;
Server deviceDecrypting the encrypted data sent by the client, and summing the decrypted secret values of the data points and the number of the data points to obtain secret sharing values of all the data points and the number of the data points;
server deviceSum all data points calculatedNumber of data pointsSending the secret sharing value to the serverThen by the serverCalculate all data pointsNumber of data pointsFrom them, the mean value of the data points is then determinedMost, at bestThen the average value and the number of the data points are sent to each client;
server deviceThe range of all data points is obtained by combining 1-out-of-N OT, and then the range is sent to each client;
after the average value, the number and the range of all the data points received by each client are calculated, the data are normalized, and then the coordinates of the data points are divided by the data points uniformly。
As a further aspect of the present invention, a serverThe range of all data points is found in combination with 1-out-of-N OT, comprising the steps of:
a. server deviceCalculation ofIs a value of (2), serverCalculation ofThen the serverWill beIs sent to the server;
c. server deviceBy the methodComparison ofAnd (3) withTo further obtain the size of (1) And (3) withIs of a size of (2);
d. secret shared value sum using maximum corresponding to current maximum indexCalculated according to the steps a-c to obtainThe maximum index of (2) and so on untilObtainingThe index of the maximum value in (1) is set as;
e. Can be obtained by a method similar to the steps a-dThe index of the minimum value in (1) is set as;
i. According to the steps a-g, the polar difference of each coordinate of the data point is respectively obtained.
As a further aspect of the present invention, the serverPreliminary judgmentAnd (3) withThe size of (2) is as follows:
and otherwise, the process goes to the step c.
As a further aspect of the present invention, the serverBy the methodComparison ofAnd (3) withTo further obtain the size of (1)And (3) withThe size of (2) is as follows:
As a further scheme of the invention, the client uses the normalized data to calculate a local covariance matrix, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively The method specifically comprises the following steps:
each client calculates a local covariance matrix, decomposes the covariance matrix into a sum of two matrices by utilizing additive secret sharing, encrypts the matrices by using corresponding keys respectively and then sends the encrypted matrices to a server。
As a further aspect of the invention, the local covariance matrix is according toSolving, wherein k represents the number of data volumes of each client; then matrix is formedIs decomposed intoUsing keysFor a pair ofEncrypted and sent to the clientUsing keysFor a pair ofEncrypted and sent to the client。
As a further aspect of the present invention, the serverDecrypting the secret sharing values of the local covariance matrix sent by the client respectively, and summing the secret sharing values to obtain the secret sharing value of the overall covariance matrix.
As a further aspect of the present invention, the serverAdding symmetric noise matrix meeting Gaussian mechanism to total covariance matrix of secret sharing, and sending result to server。
As a further aspect of the present invention, the serverSecret sharing value and server for own covariance matrixAdding secret sharing values of the covariance matrix of the added noise, and summing to obtain a covariance matrix of the overall added noise; SVD is carried out to obtain a group of feature values which are arranged in descending order Wherein, the method comprises the steps of, wherein,a diagonal matrix is represented and,representing specific characteristic values; maximum is takenFeature vectors corresponding to the feature valuesWhereinIs thatCorresponding feature vectors; will beTo each client.
In a second aspect, in yet another embodiment of the present invention, a privacy preserving PCA system is provided for applying to the above-mentioned privacy preserving PCA method.
In a third aspect, in yet another embodiment provided by the present invention, a terminal is provided, comprising a memory storing a computer program and a processor implementing steps of a PCA method of privacy protection when loading and executing the computer program.
In a fourth aspect, in yet another embodiment provided by the present invention, there is provided a storage medium storing a computer program which, when loaded and executed by a processor, performs the steps of the PCA method of privacy protection.
The technical scheme provided by the invention has the following beneficial effects:
the PCA method, the system, the terminal and the storage medium for privacy protection provided by the invention have the advantages that for the preprocessing of data, an average normalization method is used, so that the influence caused by different dimensions can be effectively eliminated, in the process, the secret sharing and OT technology are combined, the average value and the extremely poor of data points are calculated in a secret manner, and the privacy data of a user are effectively protected. The secret sharing and differential privacy technology is used, the local covariance matrix is split, noise is added to the secret sharing value of the overall covariance matrix, and privacy protection is carried out on the local covariance matrix and the overall covariance matrix. In addition, the secret sharing value between the client and the server is transmitted by using an encryption algorithm, so that an eavesdropper can be prevented from acquiring the information. The invention reduces the dimension of the data in the federal learning, and can effectively improve the training speed of the federal learning.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
These and other aspects of the invention will be more readily apparent from the following description of the embodiments. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a PCA method of privacy protection in accordance with one embodiment of the present invention;
FIG. 2 is a flowchart showing a step S20 in the PCA method of privacy preserving in accordance with one embodiment of the present invention;
fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
In the figure: processor-701, communication interface-702, memory-703, communication bus-704.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations.
It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In particular, embodiments of the present invention are further described below with reference to the accompanying drawings.
The symbols represent:representing a collection,Reading asMouldRepresentingDivided byIs used in the remainder of the (c) program,representation ofDivided byThe remainder of (2) is the same.Representing the real space of the real number,representation ofThe wieuro space. We generally use lowercase letters (e.g) Representing scalar quantities byIs expressed in terms of vectors, in capital letters (e.g) Representing the matrix.Representing the two norms of the vector, ifThen。Representing the spectral norms of the matrix, i.e.WhereinRepresentation ofIs the maximum eigenvalue of (c).Representation ofAnd (3) withAnd performing bit-wise exclusive OR operation.Representing a sign function, which takes the value of。
Referring to fig. 1, fig. 1 is a flowchart of a privacy preserving PCA method according to an embodiment of the present invention, as shown in fig. 1, the privacy preserving PCA method includes steps S10 to S60. The method application and the client and server have a plurality of clientsIs a system of (a).
S10, each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and sends the split information to the server respectively;
Wherein the data number of each client is respectively . Recording device. Set the firstThe data of each client isWherein
In the embodiment of the present invention, the step S10 of each client splitting the sum of the local data points, the maximum and minimum values of the data points, and the data point number information, and sending the split information to the serverThe method comprises the steps of carrying out a first treatment on the surface of the Also included before is:
server through DH protocolAnd respectively carrying out key exchange with each client to establish a key used for data transfer.
Set up the serverAnd the firstThe key between individual clients isServerAnd the firstThe key between individual clients is。
S20, clothingServerObtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated serverObtaining the number and the average value of the overall data points, solving the range of the overall data points by combining the OT technology, and normalizing the data points by the client side through the average value and the range.
In the embodiment of the present invention, referring to fig. 2, the S20 serverObtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; integrated server Obtaining the number and the average value of the overall data points, and obtaining the range of the overall data points by combining the OT technology, and normalizing the data points by the client side by using the average value and the range, wherein the method comprises the following steps of:
s201, each client obtains the sum of local data points, the maximum value, the minimum value and the data number of the data points, divides the data points into two parts according to the additive secret sharing, encrypts the two parts by using a secret key respectively, and then sends the encrypted two parts to a server respectively;
In the first placeFor example, a client first finds the sum of its local data points, the maximum and minimum of the data points, i.e
And the number of the data is also split to obtain
Finally, the key is usedFor a pair ofAfter encryption, sending to the serverUsing keysFor a pair ofAfter encryption, sending to the server。
S202, the server A, B decrypts the encrypted data sent by the client, and sums up the secret values of the decrypted data point sum and the data point number to obtain the secret sharing value of all the data point sum and the data point number.
Server deviceReceive the firstAfter data sent by each client, the key is used Decrypting it to obtainThen summing the data of all clients to obtain secret sharing value of all data points and data point number
Similarly, the serverFinding another partial secret shared value of all data points and the number of data points。
S203, serverTransmitting the calculated secret sharing values of all data points and the number of the data points to a serverThen by the serverAnd calculating to obtain all data points and the number of the data points, then calculating the average value of the data points according to the data points and finally transmitting the average value and the number of the data points to each client.
Server deviceTransmitting information to serverAfter that, the serverCan calculate and obtain all data points and data point number
Then calculate the data average
S204, serverThe range of all data points is found in conjunction with 1-out-of-N OT and then sent to each client.
Wherein, the serverThe range of all data points is found in combination with 1-out-of-N OT, comprising the steps of:
a. server deviceCalculation ofIs a value of (2), serverCalculation ofThen the serverWill beIs sent to the server。
b. Server devicePreliminary judgmentAnd (3) withIs of a size of (a) and (b). The following cases are divided into
I、Andare all greater than or equal to 0, can obtain Turning to step d, the process proceeds to step d,
III, otherwise, turning to step c.
c. Server deviceBy the methodComparison ofAnd (3) withTo further obtain the size of (1)And (3) withIs of a size of (a) and (b). The following cases are divided into
d. secret shared value sum using the maximum value corresponding to the current maximum value index (i.e., which value is the largest)Calculated according to the steps a-c to obtainThe maximum index of (2) and so on untilCan obtainThe index of the maximum value in (1) is set as。
e. Can be obtained by a method similar to the steps a-dThe index of the minimum value in (1) is set as。
i. According to the steps a-e, the polar difference of each coordinate of the data point is respectively obtained.
S205, after the average value, the number and the range of all the data points received by each client are the same, the data is normalized, namely the process is carried out. Then uniformly dividing the coordinates of the data points by Normalized data for each client is obtained.
Since the gaussian mechanism requires that the two norms of each line of data be less than 1, the coordinates of the data points are uniformly divided here by. The first step after treatmentThe data of each client isWherein。
S30, the client obtains a local covariance matrix by using the normalized data, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively;
Specifically, the S30 client obtains a local covariance matrix by using the normalized data, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectivelyThe method specifically comprises the following steps:
each client calculates a local covariance matrix, decomposes the covariance matrix into a sum of two matrices by utilizing additive secret sharing, encrypts the matrices by using corresponding keys respectively and then sends the encrypted matrices to a server。
Local covariance matrix basisSolving, wherein k represents what number of data amounts are per client, k=j. Then matrix is formedIs decomposed intoUsing keysFor a pair ofEncrypted and sent to the clientUsing keysFor a pair ofEncrypted and sent to the client。
S40, serverCalculating a secret shared value of the overall covariance matrix, and obtaining a secret shared value of the overall covariance matrix by a server Adding noise to the calculation result, serverTransmitting the result after adding noise to a serverServerAnd obtaining the overall covariance matrix after adding noise.
In an embodiment of the present invention,server deviceDecrypting the secret sharing values of the local covariance matrix sent by the client respectively, and summing the secret sharing values to obtain the secret sharing value of the overall covariance matrix.
Server deviceUsing secret keysFor a pair ofDecryption to obtainServerUsing secret keysFor a pair ofDecryption and then find。
In an embodiment of the invention, the serverAdding a symmetric noise matrix satisfying a Gaussian mechanism to the overall covariance matrix of its secret sharing, and then transmitting the result to a server。
Given the requirements of differential privacyAfter that, getFrom distribution ofThe noise is extracted, the upper triangle element of the matrix is generated, then a symmetrical noise distribution matrix is generated, and the noise distribution matrix is set asThen the serverCalculated to obtainAnd send it to the server。
S50, serverSingular value decomposition (Singular Value Decomposition is called SVD for short) is carried out on the covariance matrix containing noise, and the maximum absolute value is obtainedAnd feature vectors corresponding to the feature values.
In an embodiment of the present invention, the server Secret sharing value and server for own covariance matrixAnd the secret shared values of the covariance matrix added with the noise are summed to obtain the covariance matrix of the overall body added with the noise. SVD is carried out to obtain a group of feature values which are arranged in descending orderWherein, the method comprises the steps of, wherein,a diagonal matrix is represented and,representing specific characteristic values; maximum is takenFeature vectors corresponding to the feature valuesWhereinIs thatCorresponding feature vectors; will beTo each client.
Server deviceCan obtainDue to covariance matrixIs a symmetric matrix of the type,is also a symmetric matrix, soAlso a symmetric matrix.
And S60, sending the feature vector to the client, and reducing the dimension of the local data by the client.
Among them, secret sharing is an important technology in multiparty security computing, and is widely used in the field of privacy protection because it is relatively simple to use. Only two-way additive secret sharing is used in the present invention, which is briefly described below.
In additive secret sharing, dataIs randomly divided into two dataSum, i.eThenRespectively stored in the participants. Must combine participantsCan the data of (a) be recovered 。
Can not hinder the establishment of participantsPossession dataAnother partyPossession data. The additive secret sharing may be implemented as follows.
At this time, calculate to obtainAnd (3) withAnd only need toAndsummarizing to the demand side, and obtaining the result of. It can be seen that no data is leaked during this processAnd (3) withThe data privacy is well protected.
In addition, if data information is eavesdropped during the data transmission process, the data privacy still has the risk of leakage, the data can be encrypted before the sender sends the data, and for the sake of simplicity of calculation, a traditional symmetric encryption algorithm can be used. With respect to the transmission of the key, we use Diffie-Hellman (hereinafter referred to as DH) key exchange method. The DH key exchange protocol is briefly described next.
Alice and Bob want to share a single key for symmetric encryption. But the communication channel between them is not secure. All information passing through this channel is adversary: eve sees. How they exchange information, so that Eve does not know the key
The security of the DH algorithm depends on the degree of difficulty in computing discrete logarithms. The concept of primitive root is needed in the following schemes, and we give a definition of this.
Definition 1: if it is made toLeast positive power of establishmentSatisfy the following requirementsThen call itIs thatIs a primitive root of (1). Wherein the method comprises the steps ofIs an Euler function.
Thus, for any integerSum prime numberPrimitive root of (C)With a unique powerSo that. The discrete logarithm difficulty problem is given byTo calculateIs difficult.
The following is a DH protocol scheme:
alice and Bob first pairAndagree on thatIs a large prime number, and the number of the prime numbers is,is thatAnd will beAndis disclosed. Eve also knows their values.
Alice takes a private integerNot let anyone know, send Bob the calculation result:. Eve also seesIs a value of (2).
3. Similarly, bob takes a private integerSend the result to Alice. Eve will also see the deliveryWhat is.
Alice and Bob now have a common key. Although Eve seesHowever, in view of the difficulty in computing discrete logarithms, she cannot knowAndspecific values of (3). Eve is unaware of the keyWhat is.
Differential privacy is an effective means of preventing differential attacks, and by adding a proper amount of noise to the statistical result, it is ensured that the statistical result will not change significantly after modifying (including adding and deleting) a record in the dataset.
Definition 2:differential privacy. Data setAnd (3) withMost different one record (neighbor dataset), given an algorithmFor all ofAll have
Then call algorithmSatisfy the following requirementsDifferential privacy, parametersFor the purpose of a privacy budget,the smaller the privacy protection level is, the higher the privacy protection level is. When the parameter isWhen it is calledSatisfy the following requirementsDifferential privacy.
If an algorithm. For any pair of all neighbor datasetsIs called asIs thatA kind of electronic deviceSensitivity.
We useRepresentation matrixIs the first of (2)Line, set upI.e. the two norms of each row are at most 1, we mark the set of all matrices satisfying this condition as。
The gaussian mechanism is one privacy protection mechanism commonly used in differential privacy. Regarding the gaussian mechanism, the following theorem is given.
Theorem 1: is provided withIs a vector value function, letThe gaussian mechanism is distributed from randomExtracting noise and adding toCan ensure that at each output of (a)Differential privacy.
The function of interest isIt can be regarded as Vector of dimensions. Due toThe sensitivity of the matrix can be obtained to be at most 1, so that we can directly select。
OT is one of the most basic protocols for multiparty secure computing, and schemes such as a garbled circuit, a zero knowledge proof protocol and the like can be constructed by using OT. We directly introduce the 1-out-of-N OT protocol. It is used for solving the following problems:
alice ownsNumerical value ofBob wants to know one of themBy executing the OT protocol, bob can acquireBut cannot obtain the value of (2)Is a value of (2). While Alice does not know what value Bob obtained, i.e. Alice does not knowIs a value of (2).
The 1-out-of-N OT can be implemented as follows:
1. and (5) a preparation stage. Protocol is large prime number in orderIs a group of (3)The upper operations (i.e. the results of the operations in the present protocol are all moduloResults in the sense), select groupIs a primitive root of (a). Selecting a random predictive function(e.g., SHA-1). Parameters (parameters)Shared by Alice and Bob.
2. An initialization stage: alice selectionRandom number. ThenSelecting a random numberAnd calculateThen willTo Bob. Alice precalculation. (Bob cannot obtain because of discrete logarithm difficultyDiscrete logarithm of (a)Is a value of (2).
3. On-line computing stage:
b. Alice calculationThen calculate. Then select a random string(hereIs chosen long enough to ensure that the Hash values corresponding to two different data are different) for eachEncryption is performedThen sum the encryption resultTo Bob.
The two numbers can then be safely compared in size using this protocol.
Suppose Alice owns the dataBob owns the data. And is also provided with. We can compare with the following stepsIs of a size of (a) and (b).
Bob then sends the size result of both to Alice.
If it isIs a general real number, can be usedRepresented asIn the form of a binary system, e.g.That isInteger isBits, decimal numbersBits.Can also be expressed asForm of a system. The two sizes are then compared starting from the most significant bit,
We describe the process of PCA. Let the data set beWhereinAs a total number of samples,for the number of attributes, eachIs a piece of data.
If the dimensions of the data are not consistent, the data need to be normalized, we use the mean normalization approach to process,
wherein the method comprises the steps of. The denominator is very poor of the data. We note this process as。
For a pair ofSVD is performed to obtain a group of feature values arranged in descending orderTake the maximumFeature vectors corresponding to the feature valuesThe data after dimension reduction is
The invention performs average normalization on the client data in a secret manner by means of secret sharing and OT technology, and in the process, the invention effectively protects the maximum value and the minimum value of the private data of the user. The invention also uses secret sharing and differential privacy to carry out privacy protection on the local covariance matrix and the general covariance matrix, and finally realizes PCA, thereby achieving the purpose of data dimension reduction. In addition, a key exchange algorithm is used, and secret sharing values transmitted between the client and the server are encrypted by using the exchanged keys, so that eavesdropping attacks of other people are prevented. Our PCA algorithm can boost the training speed of the Union learning.
It should be understood that although described in a certain order, the steps are not necessarily performed sequentially in the order described. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, some steps of the present embodiment may include a plurality of steps or stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.
In an embodiment, the invention further provides a privacy preserving PCA system, and the privacy preserving PCA method is applied.
In one embodiment, referring to fig. 3, a terminal is further provided in an embodiment of the present invention, where the terminal includes a processor 701, a communication interface 702, a memory 703, and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete communication with each other through the communication bus 704.
A memory 703 for storing a computer program;
the processor 701 is configured to execute the PCA method of privacy protection when executing the computer program stored in the memory 703, and the processor executes the instructions to implement the steps in the method embodiments described above.
The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (Peripheral ComponentInterconnect, abbreviated as PCI) bus or an extended industry standard architecture (ExtendedIndustry StandardArchitecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the terminal and other devices.
The memory may include random access memory (Random Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a Network Processor (NP), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC for short), field-programmable gate arrays (Field-ProgrammableGate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The terminal comprises user equipment and network equipment. Wherein the user equipment includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes, but is not limited to, a single network server, a server group of multiple network servers, or a Cloud based Cloud Computing (Cloud Computing) consisting of a large number of computers or network servers, where Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computer sets. The terminal can independently operate to realize the invention, and can also access the network and realize the invention through the interaction operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
The terminal comprises user equipment and network equipment. Wherein the user equipment includes, but is not limited to, a computer, a smart phone, a PDA, etc.; the network device includes, but is not limited to, a single network server, a server group of multiple network servers, or a Cloud based Cloud Computing (Cloud Computing) consisting of a large number of computers or network servers, where Cloud Computing is one of distributed Computing, and is a super virtual computer consisting of a group of loosely coupled computer sets. The terminal can independently operate to realize the invention, and can also access the network and realize the invention through the interaction operation with other terminals in the network. The network where the terminal is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
In one embodiment of the invention there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.
Claims (12)
1. A method of PCA for privacy protection, the method comprising:
each client splits the sum of the local data points, the maximum value and the minimum value of the data points and the data point number information and respectively sends the split information to the serverThe method comprises the steps of carrying out a first treatment on the surface of the Wherein, the servers A, B are not mutually communicated;
server deviceObtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; comprehensive server- >Obtaining the number and the average value of the overall data points, and solving the range of the overall data points by combining the OT technology, wherein the client normalizes the data points by using the average value and the range;
the client uses the normalized data to calculate a local covariance matrix, and splits the covariance matrix into two parts by combining with additive secret sharing, and sends the two parts to the server respectively;
Server deviceCalculating a secret shared value of the overall covariance matrix and obtaining a secret shared value by the server>Adding noise to the calculation result, server +.>Transmitting the result after adding noise to the server +.>Server->Obtaining a general covariance matrix containing noise;
server deviceSingular value decomposition is performed on the overall covariance matrix containing noise to obtain +.>Feature vectors corresponding to the feature values;
server deviceThe feature vector is sent to a client, and the client performs dimension reduction on the local data;
the method for solving the total data point range by combining the OT technology specifically comprises the following steps of:
a. server deviceCalculate->Is a value of (2), server->Calculate->Is then server->Will->Send to server->;
d. secret shared value sum using maximum corresponding to current maximum indexCalculated according to the steps a-c, the +.>The maximum value index of (2) is thus always back until +.>ObtainingThe index of the maximum value in (1) is set to +.>;
e. According to a method similar to the steps a-dThe index of the minimum value of (1) is set to +.>;
g. Server deviceCalculating the polar difference of the first coordinate of the data point, i.e. +.>;
i. According to the steps a-g, the range of each coordinate of the data point is respectively obtained;
And otherwise, the process goes to the step c.
2. The method for PCA as in claim 1, wherein the number of data per client is respectivelyThe method comprises the steps of carrying out a first treatment on the surface of the Record->The method comprises the steps of carrying out a first treatment on the surface of the Let go of>The data of the individual clients are +.>Wherein
3. The privacy-preserving PCA method of claim 1, wherein each client splits the sum of the local data points, the maximum and minimum values of the data points, and the data point number information and sends the split data point sum, the maximum and minimum values of the data points, and the data point number information to the server respectivelyThe method comprises the steps of carrying out a first treatment on the surface of the Also included before is:
server through DH protocolRespectively carrying out key exchange with each client, and establishing a key used for data transmission;
4. The privacy preserving PCA method of claim 2 wherein the serverObtaining the number and average value of the overall data points according to the obtained sum of the local data points, the maximum and minimum values of the data points and the number information of the data points; comprehensive server->Obtaining the number and the average value of the overall data points, and obtaining the range of the overall data points by combining the OT technology, and normalizing the data points by the client side by using the average value and the range, wherein the method comprises the following steps of:
each client sums up local data pointsMaximum ∈of data points>Minimum->Number of data- >Splitting both parts according to additive secret sharing, splitting both parts +.>Encryption using keys respectively, and then sending to the server +.>;
Server deviceDecrypting the encrypted data sent by the client, and summing the decrypted secret values of the data points and the number of the data points to obtain secret sharing values of all the data points and the number of the data points;
server deviceAll data points calculated and +.>Number of data points->Secret sharing value sent to server +.>Then by the server->Calculate all data points and +.>Number of data points->From them, the mean value of the data points is then determinedFinally, the average value and the number of the data points are sent to each client;
server deviceThe range of all data points is obtained by combining 1-out-of-N OT, and then the range is sent to each client;
after the average value, the number and the range of all the data points received by each client are calculated, the data are normalized, and then the coordinates of the data points are divided by the data points uniformlyObtaining normalized data of each client, i.e. setting the processed +.>The data of the individual clients are +.>Wherein->。
6. The PCA method of privacy protection as in claim 1, wherein the client uses the normalized data to solve a local covariance matrix, and splits it into two parts in combination with an additive secret sharing, and sends the two parts to the server respectivelyThe method specifically comprises the following steps:
7. The privacy preserving PCA method of claim 4 wherein the local covariance matrix is in terms ofSolving, wherein k represents the number of data volumes of each client; then matrix is formedBreak down into->Use key +.>For->Encrypted and sent to the client->Use key +.>For->Encrypted and sent to the client->。
10. The privacy preserving PCA method of claim 9 wherein the serverSecret shared value and server for own covariance matrix>Adding secret sharing values of the covariance matrix of the added noise, and summing to obtain a covariance matrix of the overall added noise; SVD is performed to obtain a group of characteristic values in descending order>Wherein->Representing diagonal matrix +.>Representing specific characteristic values; maximum +.>Feature vector corresponding to the individual feature value +.>WhereinIs->Corresponding feature vectors; will->To each client.
12. A privacy preserving PCA system applying the privacy preserving PCA method of any of claims 1-11.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211473530.9A CN115510502B (en) | 2022-11-23 | 2022-11-23 | PCA method and system for privacy protection |
PCT/CN2023/110312 WO2024109149A1 (en) | 2022-11-23 | 2023-07-31 | Principal component analysis method and system for privacy protection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211473530.9A CN115510502B (en) | 2022-11-23 | 2022-11-23 | PCA method and system for privacy protection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115510502A CN115510502A (en) | 2022-12-23 |
CN115510502B true CN115510502B (en) | 2023-05-26 |
Family
ID=84514083
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211473530.9A Active CN115510502B (en) | 2022-11-23 | 2022-11-23 | PCA method and system for privacy protection |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115510502B (en) |
WO (1) | WO2024109149A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117439731B (en) * | 2023-12-21 | 2024-03-12 | 山东大学 | Privacy protection big data principal component analysis method and system based on homomorphic encryption |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113300828B (en) * | 2021-05-27 | 2022-07-05 | 南开大学 | Distributed differential privacy aggregation method |
CN113949501A (en) * | 2021-09-08 | 2022-01-18 | 天翼电子商务有限公司 | Semi-homomorphic encryption-based transversely distributed PCA dimension reduction method |
CN113904874B (en) * | 2021-11-30 | 2022-03-04 | 北京中超伟业信息安全技术股份有限公司 | Unmanned aerial vehicle data secure transmission method |
-
2022
- 2022-11-23 CN CN202211473530.9A patent/CN115510502B/en active Active
-
2023
- 2023-07-31 WO PCT/CN2023/110312 patent/WO2024109149A1/en unknown
Also Published As
Publication number | Publication date |
---|---|
WO2024109149A1 (en) | 2024-05-30 |
CN115510502A (en) | 2022-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022237450A1 (en) | Secure multi-party computation method and apparatus, and device and storage medium | |
CN112182649A (en) | Data privacy protection system based on safe two-party calculation linear regression algorithm | |
WO2018184407A1 (en) | K-means clustering method and system having privacy protection | |
Liu et al. | Intelligent and secure content-based image retrieval for mobile users | |
Liu et al. | Secure multi-label data classification in cloud by additionally homomorphic encryption | |
Erkin et al. | Privacy-preserving distributed clustering | |
CN115510502B (en) | PCA method and system for privacy protection | |
CN115842627A (en) | Decision tree evaluation method, device, equipment and medium based on secure multi-party computation | |
CN111259440B (en) | Privacy protection decision tree classification method for cloud outsourcing data | |
Liu et al. | Privacy preserving pca for multiparty modeling | |
Zheng et al. | Towards secure and practical machine learning via secret sharing and random permutation | |
Zhao et al. | SGBoost: An efficient and privacy-preserving vertical federated tree boosting framework | |
CN114564730A (en) | Symmetric encryption-based federal packet statistic calculation method, device and medium | |
Zhao et al. | VFLR: An efficient and privacy-preserving vertical federated framework for logistic regression | |
CN117353912A (en) | Three-party privacy set intersection base number calculation method and system based on bilinear mapping | |
Wang et al. | Face detection for privacy protected images | |
CN116094686B (en) | Homomorphic encryption method, homomorphic encryption system, homomorphic encryption equipment and homomorphic encryption terminal for quantum convolution calculation | |
CN116743376A (en) | Multiparty secret sharing data privacy comparison method based on efficient ciphertext confusion technology | |
CN116681141A (en) | Federal learning method, terminal and storage medium for privacy protection | |
CN116170142A (en) | Distributed collaborative decryption method, device and storage medium | |
Zhou et al. | Toward scalable and privacy-preserving deep neural network via algorithmic-cryptographic co-design | |
CN115150060A (en) | Data privacy protection method based on secure multi-party clustering method | |
CN115564447A (en) | Credit card transaction risk detection method and device | |
CN115333789A (en) | Privacy protection intersection calculation method and device based on large-scale data set in asymmetric mode | |
CN114358323A (en) | Third-party-based efficient Pearson coefficient calculation method in federated learning environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |