CN114091043A - Correlation coefficient calculation method, device, equipment and computer storage medium - Google Patents

Correlation coefficient calculation method, device, equipment and computer storage medium Download PDF

Info

Publication number
CN114091043A
CN114091043A CN202010770749.XA CN202010770749A CN114091043A CN 114091043 A CN114091043 A CN 114091043A CN 202010770749 A CN202010770749 A CN 202010770749A CN 114091043 A CN114091043 A CN 114091043A
Authority
CN
China
Prior art keywords
correlation coefficient
characteristic variable
public key
equipment
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010770749.XA
Other languages
Chinese (zh)
Inventor
游正朋
唐小勇
朱磊
罗柯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Chengdu ICT Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Chengdu ICT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Chengdu ICT Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202010770749.XA priority Critical patent/CN114091043A/en
Publication of CN114091043A publication Critical patent/CN114091043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • H04L63/0442Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload wherein the sending and receiving network entities apply asymmetric encryption, i.e. different keys for encryption and decryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0869Generation of secret information including derivation or calculation of cryptographic keys or passwords involving random numbers or seeds

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Bioethics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a correlation coefficient calculation method, a correlation coefficient calculation device, correlation coefficient calculation equipment and a computer storage medium. The method is applied to the first equipment, and comprises the following steps: acquiring a public key based on a homomorphic encryption algorithm; homomorphically encrypting the first characteristic variable through the public key to obtain a second characteristic variable; sending a second characteristic variable to the second equipment, so that the second equipment can obtain a first target correlation coefficient according to the second characteristic variable; receiving a first target correlation coefficient sent by second equipment; and decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient. The embodiment of the invention can improve the calculation efficiency and reduce the calculation time, and is suitable for the calculation of various correlation coefficients.

Description

Correlation coefficient calculation method, device, equipment and computer storage medium
Technical Field
The invention belongs to the field of information security, and particularly relates to a correlation coefficient calculation method, a correlation coefficient calculation device, correlation coefficient calculation equipment and a computer storage medium.
Background
The society is in the big data age of information interconnection and intercommunication nowadays, and the information privacy and the security problem of the field data such as medical treatment, finance and education are more and more concerned about.
In order to better analyze and learn data in each field and meet the requirement of safety, in the application of each field, data is often analyzed and learned through federal learning.
In order to ensure the data privacy of each party of data, each party generally does not directly transmit original data, but encrypts the data through complex operations such as encryption processing, normalization and the like, and establishes a virtual common model under the condition of not violating the data privacy regulation. However, the current calculation method has a large calculation amount and low calculation efficiency.
Disclosure of Invention
Embodiments of the present invention provide a correlation coefficient calculation method, apparatus, device, and computer storage medium, which can improve calculation efficiency and reduce calculation time, and are suitable for calculation of multiple correlation coefficients.
In a first aspect, an embodiment of the present invention provides a correlation coefficient calculation method, where the method is applied to a first device, and the method includes: acquiring a public key based on a homomorphic encryption algorithm;
homomorphically encrypting the first characteristic variable through the public key to obtain a second characteristic variable;
sending a second characteristic variable to the second equipment, so that the second equipment can obtain a first target correlation coefficient according to the second characteristic variable;
receiving a first target correlation coefficient sent by second equipment;
and decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
In some implementation manners of the first aspect, homomorphically encrypting the first feature variable through the public key to obtain the second feature variable includes:
carrying out stretching change processing on the first characteristic variable to obtain a processed first characteristic variable;
and homomorphically encrypting the processed first characteristic variable through the public key to obtain a second characteristic variable.
In some realizations of the first aspect, performing stretch change processing on the first characteristic variable to obtain a processed first characteristic variable includes:
randomly generating a random positive real number and a random real number;
and multiplying the first characteristic variable by a random positive real number, and adding the first characteristic variable and the random real number to obtain a processed first characteristic variable.
In some implementations of the first aspect, the method further comprises:
and sending the public key to the second equipment so that the second equipment can use the public key to homomorphically encrypt the third characteristic variable to obtain a fourth characteristic variable.
In some implementations of the first aspect, the method further comprises:
and sending the data identifier of the first characteristic variable to the second equipment, so that the second equipment combines the second characteristic variable and the fourth characteristic variable according to the data identifier.
In some implementations of the first aspect, the homomorphic encryption algorithm comprises a fully homomorphic encryption algorithm or a semi-homomorphic encryption algorithm.
In some realizations of the first aspect, the first target correlation coefficient includes at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
In a second aspect, an embodiment of the present invention provides a correlation coefficient calculation method, where the method is applied to a second device, and the method includes: receiving a second characteristic variable, a public key and a data identifier of the first characteristic variable sent by the first equipment;
homomorphically encrypting the third characteristic variable in the second equipment through the public key to obtain a fourth characteristic variable;
combining the second characteristic variable and the fourth characteristic variable according to the data identification;
calculating to obtain a first target correlation coefficient according to the combined second characteristic variable and the combined fourth characteristic variable;
and sending the first target correlation coefficient to the first equipment so that the first equipment can obtain a second target correlation coefficient according to the first target correlation coefficient.
In some implementations of the second aspect, the homomorphic encryption is fully homomorphic encryption or semi-homomorphic encryption.
In some realizations of the second aspect, the first target correlation coefficient includes at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
In a third aspect, an embodiment of the present invention provides a correlation coefficient calculation apparatus, including:
the acquisition module is used for acquiring a public key based on a homomorphic encryption algorithm;
the encryption module is used for homomorphically encrypting the first characteristic variable through a public key to obtain a second characteristic variable;
the sending module is used for sending the second characteristic variable to the second equipment so that the second equipment can obtain the first target correlation coefficient according to the second characteristic variable;
the receiving module is used for receiving the first target correlation coefficient sent by the second equipment;
and the decryption module is used for decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
In a fourth aspect, an embodiment of the present invention provides a correlation coefficient calculation apparatus, including:
the receiving module is used for receiving the second characteristic variable, the public key and the data identifier of the first characteristic variable sent by the first equipment;
the encryption module is used for homomorphically encrypting the third characteristic variable in the second equipment through the public key to obtain a fourth characteristic variable;
the combination module is used for combining the second characteristic variable and the fourth characteristic variable according to the data identification;
the calculation module is used for calculating to obtain a first target correlation coefficient according to the combined second characteristic variable and the combined fourth characteristic variable;
and the sending module is used for sending the first target correlation coefficient to the first equipment so that the first equipment can obtain a second target correlation coefficient according to the first target correlation coefficient.
In a fifth aspect, the present invention provides a correlation coefficient calculation apparatus comprising: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the correlation coefficient calculation method described in the first aspect or any of the realizable manners of the second aspect or any of the realizable manners of the first aspect.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement the correlation coefficient calculation method described in the first aspect or any of the realizable manners of the first aspect or the second aspect or any of the realizable manners of the first aspect.
The embodiment of the invention provides a correlation coefficient calculation method, which comprises the steps of obtaining a public key based on a homomorphic encryption algorithm, carrying out homomorphic encryption on a first characteristic variable in first equipment by the public key to obtain a second characteristic variable, and then sending the second characteristic variable to second equipment, so that no plaintext is present between the first equipment and the second equipment, and the safety of interactive data is ensured; the second device also receives the public key sent by the first device, so that after the second device receives the public key, the second device encrypts the local data of the second device by using the public key, and calculates the target correlation coefficient by combining the received second characteristic variable to obtain the first target correlation coefficient, so that the first target correlation coefficient calculated by the second device is consistent with the correlation coefficient calculated by using a plaintext, and the calculation accuracy of the target correlation coefficient is ensured; and after the second equipment calculates the obtained first target correlation coefficient, the first target correlation coefficient is sent to the first equipment, and the first equipment can obtain the actual correlation coefficient only by decrypting the first target correlation coefficient by using a private key corresponding to the public key. In the information interaction process and among the devices participating in the calculation, plaintext data except the plaintext data are unknown, so that not only can the data safety be guaranteed, but also the calculation efficiency can be further improved and the time required by calculation can be further reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a correlation coefficient calculation method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another correlation coefficient calculation method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of another correlation coefficient calculation method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a correlation coefficient calculation apparatus according to an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of another correlation coefficient calculation apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a correlation coefficient calculation device according to an embodiment of the present invention.
Detailed Description
Features and exemplary embodiments of various aspects of the present invention will be described in detail below, and in order to make objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It will be apparent to one skilled in the art that the present invention may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present invention by illustrating examples of the present invention.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone.
The society is in the big data age of information interconnection and intercommunication nowadays, and the information privacy and the security problem of the field data such as medical treatment, finance and education are more and more concerned about.
In order to better analyze and learn data in each field and meet the requirement of safety, in the application of each field, data is often analyzed and learned through federal learning.
In the longitudinal federal learning, feature correlation among data is generally observed before model training, and only one feature with high correlation is reserved to reduce the complexity of the model and simultaneously reduce communication overhead and calculation overhead.
In order to ensure the data privacy of each party of data, each party generally does not directly transmit original data, but encrypts the data through complex operations such as encryption processing, normalization and the like, and establishes a virtual common model under the condition of not violating the data privacy regulation. However, the current calculation method has a large calculation amount and low calculation efficiency.
On the other hand, according to the current technical solution, the party a sends its own data to the party B, and the party B performs the calculation of the correlation coefficient, wherein the calculation is performed by normalizing the feature data of both parties involved, and in order to reduce the model complexity and simultaneously reduce the communication overhead and the calculation overhead, the existing technical solution is only effective for the pearson correlation coefficient, and cannot be well applied to the calculation of other correlation coefficients.
In view of the above, embodiments of the present invention provide a correlation coefficient calculation method, an apparatus, a device, and a computer storage medium, in which a first feature variable of a first device is homomorphically encrypted to obtain a second feature variable, so that a plaintext does not appear between the first device and a second device, thereby ensuring security of interactive data, and then a first target correlation coefficient calculated by the second device is fed back to the first device, so that the second device can calculate the first target correlation coefficient without knowing the plaintext of the first device.
The following first describes a correlation coefficient calculation method provided by an embodiment of the present invention with reference to the accompanying drawings. Fig. 1 is a schematic flow chart illustrating a correlation coefficient calculation method according to an embodiment of the present invention. As shown in fig. 1, the method is applicable to a first device comprising first characteristic data, and the method may comprise the steps of:
and S110, acquiring a public key based on a homomorphic encryption algorithm.
The first device, as one of the parties providing the feature data, may generate a pair of a public key and a private key based on homomorphic encryption, followed by S120.
And S120, homomorphic encryption is carried out on the first characteristic variable through the public key to obtain a second characteristic variable.
The first device serves as a participant to perform homomorphic encryption on a first characteristic variable in the device to obtain a second characteristic variable, wherein the first characteristic variable refers to a plaintext of characteristic data of the first device, so that the participation of the second device in calculation is ensured that the plaintext of the characteristic data of the first device cannot be obtained.
For example, the first characteristic variable in the first device may be as shown in table 1.
TABLE 1
UUID X1 X2 X3 Z
U1 23 8000 2 10
U2 55 9000 3 1
U3 22 6555 5 9
The Universal Unique Identifier (UUID) may be used as an Identifier of the feature data. X1, X2, X3, Y are the characteristics of the data in the first device, respectively.
In some embodiments, in order to improve the level of security of the data of the first device, the homomorphic encryption is performed on the first feature variable through the public key to obtain the second feature variable, and the method may further include the following steps: carrying out stretching change processing on the first characteristic variable to obtain a processed first characteristic variable; and homomorphically encrypting the processed first characteristic variable through the public key to obtain a second characteristic variable.
In some embodiments, the homomorphic encryption algorithm may be a fully homomorphic encryption algorithm or a semi-homomorphic encryption algorithm.
As a specific embodiment, in order to prevent the data from being leaked, the data may be stretched and changed by using two random numbers, so that the first feature variable loses its original meaning, and then homomorphic encryption is performed to ensure the security of the data. Wherein, performing stretch change processing on the first characteristic variable to obtain the processed first characteristic variable may include: randomly generating a random positive real number and a random real number; and multiplying the first characteristic variable by a random positive real number, and adding the first characteristic variable and the random real number to obtain a processed first characteristic variable.
Illustratively, the first device includes feature data (X1, X2), and after the feature data (X1, X2) are subjected to stretching change, (X1 a + b, X2 c + d) can be obtained, wherein a >0, c >0, and b and d are any real numbers. Further, homomorphic encryption is performed on the processed first feature variable to obtain ([ [ X1 a + b ] ], [ [ X2 c + d ] ]), wherein "[ ]" represents homomorphic encryption.
The first device may perform S130 after obtaining the second characteristic variable.
And S130, sending the second characteristic variable to the second equipment.
In some examples, the first device further sends the public key to the second device, so that the second device performs homomorphic encryption on the third feature variable by using the public key to obtain a fourth feature variable.
In some examples, the first device sends the data identification of the first characteristic variable to the second device for the second device to combine the second characteristic variable and the fourth characteristic variable according to the data identification. Wherein the data identification of the first characteristic variable does not require encryption.
After receiving the second characteristic variable, the public key and the data identifier of the first characteristic variable sent by the first device, the second device may calculate to obtain a first target correlation coefficient, and send the first target correlation coefficient to the first device.
Wherein the first target correlation coefficient includes at least one of: pearson correlation coefficient (Person), Spearman correlation coefficient (Spearman), Kendall correlation coefficient (Kendall tau).
In some embodiments, the first device may perform S140 and S150.
And S140, receiving the first target correlation coefficient sent by the second equipment.
When the second device calculates the correlation coefficient between the characteristic variables, the second characteristic variable after homomorphic encryption based on the public key and the fourth characteristic variable after homomorphic encryption are adopted, and the first target correlation coefficient calculated by the second device also belongs to the encrypted data. Therefore, the first device performs S150 after receiving the first target correlation coefficient.
S150, decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
Since the pearson correlation coefficient, the spearman correlation coefficient, and the kentucker correlation coefficient are calculated based on the ciphertext obtained by homomorphic encryption, it is theoretically ensured that the correlation coefficient calculated using the ciphertext obtained by security transformation and homomorphic encryption is the same as the correlation coefficient calculated directly using the plaintext based on the transformation properties of the pearson correlation coefficient, that is, the second target correlation coefficient obtained by the first device decrypting the first target correlation coefficient using the public key can be considered as the result obtained by calculation based on the plaintext of the participating party. Therefore, the data privacy and the data security of the participants are effectively protected.
In some embodiments, after the second characteristic variable, the public key and the data identification of the first characteristic variable are sent by the first device, as shown in fig. 2, the second device may perform the following steps:
s210, receiving the second characteristic variable, the public key and the data identification of the first characteristic variable sent by the first device.
The second device, which is one of the parties to the correlation coefficient calculation, includes a third characteristic variable in the second device.
For example, the third characteristic variable in the second device may be as shown in table 2.
TABLE 2
UUID X4 X5
U1 2 12000
U2 5 45000
U3 9 65000
X5 and X5 are the characteristics of the data in the second device, respectively.
The second device performs S220 after receiving the data for calculating the target correlation coefficient transmitted by the first device.
S220, homomorphic encryption is carried out on the third characteristic variable in the second equipment through the public key to obtain a fourth characteristic variable.
Illustratively, the second device encrypts the third feature data using the received public key, e.g., homomorphic encrypting the feature data (X4, X5), i.e., (X4, X5) to ([ [ X4] ], [ [ X5] ]), wherein "[ ] ]" means homomorphic encryption and the data identification of the feature data does not require encryption.
In some embodiments, the homomorphic encryption algorithm may be a fully homomorphic encryption algorithm or a semi-homomorphic encryption algorithm.
The second device executes S230 after obtaining the fourth characteristic variable.
And S230, combining the second characteristic variable and the fourth characteristic variable according to the data identification.
Because the data identification of the first characteristic variable in the first device and the data identification of the third characteristic variable in the second device are not encrypted, the second device can combine the second characteristic variable and the fourth characteristic variable respectively obtained after homomorphic encryption according to the data identification to obtain characteristic data with richer characteristic data, which is equivalent to enriching the characteristic variables of the participants, so that the longitudinal federal learning model can be more comprehensive and accurate, and a target correlation coefficient with higher reliability can be obtained through calculation.
For example, the combined second characteristic variable and fourth characteristic variable may be as shown in table 3.
TABLE 3
UUID X1 X2 X3 X4 X5 Z
U1 23 8000 2 2 12000 10
U2 55 9000 3 5 45000 1
U3 22 6555 5 9 65000 9
And S240, calculating to obtain a first target correlation coefficient according to the combined second characteristic variable and the combined fourth characteristic variable.
Wherein the first target correlation coefficient includes at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
The pearson correlation coefficient can be obtained according to equation (1):
Figure BDA0002616521720000091
where ρ represents a pearson correlation coefficient, X and Y represent feature data for calculation of features, respectively, cov (X, Y) is a covariance of X, Y, σ X represents a standard deviation of X, and σ Y represents a standard deviation of Y.
The spearman correlation coefficient can be obtained according to equation (2):
Figure BDA0002616521720000101
wherein S represents a Spireman correlation coefficient, diRepresenting the difference in the levels of two variables observed, where di=xi-yiAnd n represents a descending position in the overall data according to which the original data is averaged.
The Kendel correlation coefficient can be obtained according to equation (3):
Figure BDA0002616521720000102
wherein, the number of the consecutive pages represents the same-order pair of the statistical objects sorted in the same order,
the number of discordant calls represents an unordered pair in which the statistical objects are ordered in an unordered order, and n represents the number of the objects to be counted.
And S250, sending the first target correlation coefficient to the first equipment.
In some embodiments, the second device may calculate a plurality of target correlation coefficients and send the obtained target correlation coefficients to the first device.
When the number of the first target correlation coefficients calculated by the second device is plural, the second device may assemble the calculated first target correlation coefficients and send the assembled first target correlation coefficients to the first device.
The data group comprises five fields, namely feature variable name plaintext (Xi), feature variable name plaintext (Xj), a ciphertext of a Pearson correlation coefficient, a ciphertext of a Spireman correlation coefficient and a ciphertext of a Kendel correlation coefficient.
And (3) a set of data of three correlation coefficients between the characteristic variable and the target variable. The group of data includes five fields, and is sequentially a variable name plaintext (X), a target variable name plaintext (Z), a ciphertext of a pearson correlation coefficient, a ciphertext of a spearman correlation coefficient, and a ciphertext of a kendell correlation coefficient.
Fig. 3 is a schematic flow chart of another correlation coefficient calculation method according to an embodiment of the present invention, and the correlation coefficient calculation method according to an embodiment of the present invention is further described with reference to fig. 3.
First, S301-S305 are performed by the first device.
S301, a public key is obtained based on a homomorphic encryption algorithm.
S302, homomorphic encryption is carried out on the first characteristic variable through the public key to obtain a second characteristic variable.
And S303, sending the second characteristic variable to the second equipment.
S304, the public key is sent to the second device.
S305, sending the data identification of the first characteristic variable to the second equipment.
Next, S306-S307 are performed by the second device.
S306, homomorphic encryption is carried out on the third characteristic variable in the second equipment through the public key to obtain a fourth characteristic variable.
And S307, combining the second characteristic variable and the fourth characteristic variable according to the data identification.
And S308, calculating to obtain a first target correlation coefficient according to the combined second characteristic variable and the combined fourth characteristic variable.
S309, sending the first target correlation coefficient to the first equipment.
Finally, after receiving the first target correlation coefficient, the first device performs S310
S310, decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
According to the correlation coefficient calculation method provided by the embodiment of the invention, after the public key is obtained based on the homomorphic encryption algorithm, the public key is used for homomorphic encrypting the first characteristic variable in the first device to obtain the second characteristic variable, and then the second characteristic variable is sent to the second device, so that no plaintext is present between the first device and the second device, and the safety of interactive data is ensured; the second device also receives the public key sent by the first device, so that after the second device receives the public key, the second device encrypts the local data of the second device by using the public key, and calculates the target correlation coefficient by combining the received second characteristic variable to obtain the first target correlation coefficient, so that the first target correlation coefficient calculated by the second device is consistent with the correlation coefficient calculated by using a plaintext, and the calculation accuracy of the target correlation coefficient is ensured; and after the second equipment calculates the obtained first target correlation coefficient, the first target correlation coefficient is sent to the first equipment, and the first equipment can obtain the actual correlation coefficient only by decrypting the first target correlation coefficient by using a private key corresponding to the public key. In the information interaction process and among the devices participating in the calculation, plaintext data except the plaintext data are unknown, so that not only can the data safety be guaranteed, but also the calculation efficiency can be further improved and the time required by calculation can be further reduced.
Fig. 4 is a schematic structural diagram of a correlation coefficient calculation apparatus according to an embodiment of the present invention, and as shown in fig. 4, the correlation coefficient calculation apparatus 400 may include: an obtaining module 410, an encrypting module 420, a sending module 430, a receiving module 440, and a decrypting module 450.
An obtaining module 410, configured to obtain a public key based on a homomorphic encryption algorithm.
The encryption module 420 is configured to perform homomorphic encryption on the first feature variable through the public key to obtain a second feature variable.
A sending module 430, configured to send the second characteristic variable to the second device, so that the second device obtains a first target correlation coefficient according to the second characteristic variable.
A receiving module 440, configured to receive the first target correlation coefficient sent by the second device.
The decryption module 450 is configured to decrypt the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
In some embodiments, the encryption module 420 is further configured to perform stretch change processing on the first characteristic variable to obtain a processed first characteristic variable; and homomorphically encrypting the processed first characteristic variable through the public key to obtain a second characteristic variable.
In some embodiments, the encryption module 420 is further configured to randomly generate a random positive real number and a random real number; and multiplying the first characteristic variable by a random positive real number, and adding the first characteristic variable and the random real number to obtain a processed first characteristic variable.
In some embodiments, the sending module 430 is further configured to send the public key to the second device, so that the second device performs homomorphic encryption on the third feature variable by using the public key to obtain a fourth feature variable.
In some embodiments, the sending module 430 is further configured to send the data identifier of the first characteristic variable to the second device, so that the second device combines the second characteristic variable and the fourth characteristic variable according to the data identifier.
In some embodiments, the homomorphic encryption algorithm comprises a fully homomorphic encryption algorithm or a semi-homomorphic encryption algorithm.
In some embodiments, the first target correlation coefficient includes at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
It is understood that the correlation coefficient calculation apparatus 400 according to the embodiment of the present invention may correspond to an execution subject of the correlation coefficient calculation method described in the embodiment of the present invention, and specific details of operations and/or functions of each module/unit of the correlation coefficient calculation apparatus 400 may refer to the descriptions of corresponding parts in the correlation coefficient calculation method described in the embodiment of the present invention, and are not described herein again for brevity.
In the correlation coefficient calculation apparatus 400 according to the embodiment of the present invention, after the public key is obtained based on the homomorphic encryption algorithm, the public key is used to homomorphically encrypt the first characteristic variable in the first device to obtain the second characteristic variable, and then the second characteristic variable is sent to the second device, so that no plaintext appears between the first device and the second device, and the security of the interactive data is ensured; the second device also receives the public key sent by the first device, so that after the second device receives the public key, the second device encrypts the local data of the second device by using the public key, and calculates the target correlation coefficient by combining the received second characteristic variable to obtain the first target correlation coefficient, so that the first target correlation coefficient calculated by the second device is consistent with the correlation coefficient calculated by using a plaintext, and the calculation accuracy of the target correlation coefficient is ensured; and after the second equipment calculates the obtained first target correlation coefficient, the first target correlation coefficient is sent to the first equipment, and the first equipment can obtain the actual correlation coefficient only by decrypting the first target correlation coefficient by using a private key corresponding to the public key. In the information interaction process and among the devices participating in the calculation, plaintext data except the plaintext data are unknown, so that not only can the data safety be guaranteed, but also the calculation efficiency can be further improved and the time required by calculation can be further reduced.
Fig. 5 is a schematic structural diagram of a correlation coefficient calculation apparatus according to an embodiment of the present invention, and as shown in fig. 5, the correlation coefficient calculation apparatus 500 may include: a receiving module 510, an encryption module 520, a combining module 530, a calculating module 540, and a transmitting module 550.
A receiving module 510, configured to receive a second characteristic variable, a public key, and a data identifier of the first characteristic variable sent by the first device;
the encryption module 520 is configured to perform homomorphic encryption on the third feature variable in the second device through the public key to obtain a fourth feature variable;
a combining module 530, configured to combine the second characteristic variable and the fourth characteristic variable according to the data identifier;
a calculating module 540, configured to calculate a first target correlation coefficient according to the combined second characteristic variable and the fourth characteristic variable;
a sending module 550, configured to send the first target correlation coefficient to the first device, so that the first device obtains a second target correlation coefficient according to the first target correlation coefficient.
In some embodiments, the homomorphic encryption is fully homomorphic encryption or semi-homomorphic encryption.
In some embodiments, the first target correlation coefficient includes at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
It is understood that the correlation coefficient calculation apparatus 500 according to the embodiment of the present invention may correspond to an execution subject of the correlation coefficient calculation method described in the embodiment of the present invention, and specific details of operations and/or functions of each module/unit of the correlation coefficient calculation apparatus 500 may refer to the descriptions of corresponding parts in the correlation coefficient calculation method described in the embodiment of the present invention, and are not described herein again for brevity.
The correlation coefficient calculation apparatus 500 of the embodiment of the present invention obtains the public key based on the homomorphic encryption algorithm, and then homomorphically encrypts the first characteristic variable in the first device with the public key to obtain the second characteristic variable, and then sends the second characteristic variable to the second device, so that no plaintext appears between the first device and the second device, thereby ensuring the security of the interactive data; the second device also receives the public key sent by the first device, so that after the second device receives the public key, the second device encrypts the local data of the second device by using the public key, and calculates the target correlation coefficient by combining the received second characteristic variable to obtain the first target correlation coefficient, so that the first target correlation coefficient calculated by the second device is consistent with the correlation coefficient calculated by using a plaintext, and the calculation accuracy of the target correlation coefficient is ensured; and after the second equipment calculates the obtained first target correlation coefficient, the first target correlation coefficient is sent to the first equipment, and the first equipment can obtain the actual correlation coefficient only by decrypting the first target correlation coefficient by using a private key corresponding to the public key. In the information interaction process and among the devices participating in the calculation, plaintext data except the plaintext data are unknown, so that not only can the data safety be guaranteed, but also the calculation efficiency can be further improved and the time required by calculation can be further reduced.
Fig. 6 is a schematic hardware structure diagram of a correlation coefficient calculation device according to an embodiment of the present invention.
As shown in fig. 6, the correlation coefficient calculation device 600 in the present embodiment includes an input device 601, an input interface 602, a central processor 603, a memory 604, an output interface 605, and an output device 606. The input interface 602, the central processing unit 603, the memory 604, and the output interface 605 are connected to each other via a bus 610, and the input device 601 and the output device 606 are connected to the bus 610 via the input interface 602 and the output interface 605, respectively, and further connected to other components of the correlation coefficient calculation device 600.
Specifically, the input device 601 receives input information from the outside, and transmits the input information to the central processor 603 through the input interface 602; the central processor 603 processes input information based on computer-executable instructions stored in the memory 604 to generate output information, stores the output information temporarily or permanently in the memory 604, and then transmits the output information to the output device 606 through the output interface 605; the output device 606 outputs the output information to the outside of the correlation coefficient calculation device 600 for use by the user.
That is, the correlation coefficient calculation device shown in fig. 6 may also be implemented to include: a memory storing computer-executable instructions; and a processor which, when executing computer-executable instructions, may implement the correlation coefficient calculation method described in connection with the embodiment examples of the present invention.
In one embodiment, the correlation coefficient calculation apparatus 600 shown in fig. 6 includes: a memory 604 for storing programs; the processor 603 is configured to execute a program stored in the memory to perform the correlation coefficient calculation method according to the embodiment of the present invention.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium has computer program instructions stored thereon; the computer program instructions, when executed by a processor, implement the correlation coefficient calculation method provided by the embodiments of the present invention.
It is to be understood that the invention is not limited to the specific arrangements and instrumentality described above and shown in the drawings. A detailed description of known methods is omitted herein for the sake of brevity. In the above embodiments, several specific steps are described and shown as examples. However, the method processes of the present invention are not limited to the specific steps described and illustrated, and those skilled in the art can make various changes, modifications and additions or change the order between the steps after comprehending the spirit of the present invention.
The functional blocks shown in the above-described structural block diagrams may be implemented as hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic Circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, plug-in, function card, or the like. When implemented in software, the elements of the invention are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine-readable medium or transmitted by a data signal carried in a carrier wave over a transmission medium or a communication link. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuits, semiconductor Memory devices, Read-Only memories (ROMs), flash memories, erasable ROMs (eroms), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, Radio Frequency (RF) links, and so forth. The code segments may be downloaded via computer networks such as the internet, intranet, etc.
It should also be noted that the exemplary embodiments mentioned in this patent describe some methods or systems based on a series of steps or devices. However, the present invention is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such a processor may be, but is not limited to, a general purpose processor, a special purpose processor, an application specific processor, or a field programmable logic circuit. It will also be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware for performing the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As described above, only the specific embodiments of the present invention are provided, and it can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present invention, and these modifications or substitutions should be covered within the scope of the present invention.

Claims (14)

1. A correlation coefficient calculation method applied to a first device, the method comprising:
acquiring a public key based on a homomorphic encryption algorithm;
homomorphically encrypting the first characteristic variable through the public key to obtain a second characteristic variable;
sending the second characteristic variable to the second device, so that the second device obtains a first target correlation coefficient according to the second characteristic variable;
receiving the first target correlation coefficient sent by the second device;
and decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
2. The method according to claim 1, wherein said homomorphically encrypting the first feature variable by the public key to obtain a second feature variable comprises:
performing stretching change processing on the first characteristic variable to obtain a processed first characteristic variable;
and homomorphically encrypting the processed first characteristic variable through the public key to obtain the second characteristic variable.
3. The method according to claim 2, wherein the performing a stretch change process on the first characteristic variable to obtain a processed first characteristic variable comprises:
randomly generating a random positive real number and a random real number;
and multiplying the first characteristic variable by the random positive real number, and adding the first characteristic variable and the random real number to obtain a processed first characteristic variable.
4. The method of claim 1, further comprising:
and sending the public key to the second equipment so that the second equipment can use the public key to perform homomorphic encryption on the third characteristic variable to obtain a fourth characteristic variable.
5. The method of claim 4, further comprising:
and sending the data identifier of the first characteristic variable to the second equipment, so that the second equipment combines the second characteristic variable and the fourth characteristic variable according to the data identifier.
6. The method of claim 1, wherein the homomorphic encryption algorithm comprises a fully homomorphic encryption algorithm or a semi-homomorphic encryption algorithm.
7. The method of claim 1, wherein the first target correlation coefficient comprises at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
8. A correlation coefficient calculation method applied to a second device, the method comprising:
receiving a second characteristic variable, a public key and a data identifier of the first characteristic variable sent by the first equipment;
homomorphically encrypting the third characteristic variable in the second equipment through the public key to obtain a fourth characteristic variable;
combining the second characteristic variable and the fourth characteristic variable according to the data identification;
calculating to obtain a first target correlation coefficient according to the combined second characteristic variable and the combined fourth characteristic variable;
and sending the first target correlation coefficient to the first equipment so that the first equipment can obtain a second target correlation coefficient according to the first target correlation coefficient.
9. The method of claim 8, wherein the homomorphic encryption is fully homomorphic encryption or semi-homomorphic encryption.
10. The method of claim 8, wherein the first target correlation coefficient comprises at least one of: pearson correlation coefficient, Spireman correlation coefficient, Kendel correlation coefficient.
11. A correlation coefficient calculation apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring a public key based on a homomorphic encryption algorithm;
the encryption module is used for homomorphically encrypting the first characteristic variable through the public key to obtain a second characteristic variable;
a sending module, configured to send the second characteristic variable to the second device, so that the second device obtains a first target correlation coefficient according to the second characteristic variable;
a receiving module, configured to receive the first target correlation coefficient sent by the second device;
and the decryption module is used for decrypting the first target correlation coefficient by using a private key corresponding to the public key to obtain a second target correlation coefficient.
12. A correlation coefficient calculation apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving the second characteristic variable, the public key and the data identifier of the first characteristic variable sent by the first equipment;
the encryption module is used for homomorphically encrypting the third characteristic variable in the second equipment through the public key to obtain a fourth characteristic variable;
the combination module is used for combining the second characteristic variable and the fourth characteristic variable according to the data identification;
the calculation module is used for calculating to obtain a first target correlation coefficient according to the combined second characteristic variable and the combined fourth characteristic variable;
a sending module, configured to send the first target correlation coefficient to the first device, so that the first device obtains a second target correlation coefficient according to the first target correlation coefficient.
13. A correlation coefficient calculation device, characterized in that the device comprises: a processor, and a memory storing computer program instructions;
the processor reads and executes the computer program instructions to implement the correlation coefficient calculation method of any one of claims 1 to 10.
14. A computer storage medium having computer program instructions stored thereon, which when executed by a processor implement the correlation coefficient calculation method of any one of claims 1 to 10.
CN202010770749.XA 2020-08-04 2020-08-04 Correlation coefficient calculation method, device, equipment and computer storage medium Pending CN114091043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010770749.XA CN114091043A (en) 2020-08-04 2020-08-04 Correlation coefficient calculation method, device, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010770749.XA CN114091043A (en) 2020-08-04 2020-08-04 Correlation coefficient calculation method, device, equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN114091043A true CN114091043A (en) 2022-02-25

Family

ID=80295159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010770749.XA Pending CN114091043A (en) 2020-08-04 2020-08-04 Correlation coefficient calculation method, device, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN114091043A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640509A (en) * 2022-12-26 2023-01-24 北京融数联智科技有限公司 Data correlation calculation method and system in federated privacy calculation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640509A (en) * 2022-12-26 2023-01-24 北京融数联智科技有限公司 Data correlation calculation method and system in federated privacy calculation

Similar Documents

Publication Publication Date Title
US10129029B2 (en) Proofs of plaintext knowledge and group signatures incorporating same
US8903090B2 (en) Securely classifying data
CN111510281B (en) Homomorphic encryption method and device
Bárász et al. Passive attack against the M2AP mutual authentication protocol for RFID tags
US20240163084A1 (en) Method of data transmission, and electronic devic
US20130114805A1 (en) Encryption system using discrete chaos function
CN117118617A (en) Distributed threshold encryption and decryption method based on mode component homomorphism
CN114091043A (en) Correlation coefficient calculation method, device, equipment and computer storage medium
US20060129812A1 (en) Authentication for admitting parties into a network
US20170053566A1 (en) Cryptographic system and computer readable medium
CN114465708B (en) Privacy data processing method, device, system, electronic equipment and storage medium
CN114221753B (en) Key data processing method and electronic equipment
Liu et al. An Integratable Verifiable Secret Sharing Mechanism.
CN116681141A (en) Federal learning method, terminal and storage medium for privacy protection
CN113807537B (en) Data processing method and device for multi-source data, electronic equipment and storage medium
CN115361196A (en) Service interaction method based on block chain network
Prihandoko et al. Stream-keys generation based on graph labeling for strengthening Vigenere encryption
EP3924811B1 (en) Distributed randomness generation via multi-party computation
Paar et al. Introduction to cryptography and data security
CN113761570A (en) Privacy intersection-oriented data interaction method
CN110874479B (en) Method, system, data terminal and processing terminal for safely processing decision tree model
CN115344882A (en) Multi-party computing method, device and storage medium based on trusted computing environment
Caballero-Gil et al. Strong solutions to the identification problem
Hayashi et al. Secure modulo zero-sum randomness as cryptographic resource
Shikata Design and analysis of information-theoretically secure authentication codes with non-uniformly random keys

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination