CN114614974A - Privacy set intersection method, system and device for power grid data cross-industry sharing - Google Patents

Privacy set intersection method, system and device for power grid data cross-industry sharing Download PDF

Info

Publication number
CN114614974A
CN114614974A CN202210313113.1A CN202210313113A CN114614974A CN 114614974 A CN114614974 A CN 114614974A CN 202210313113 A CN202210313113 A CN 202210313113A CN 114614974 A CN114614974 A CN 114614974A
Authority
CN
China
Prior art keywords
value
bloom filter
intersection
probability
column
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210313113.1A
Other languages
Chinese (zh)
Other versions
CN114614974B (en
Inventor
毛正雄
李辉
黄祖源
杨传旭
常荣
时燕
金彦旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Center of Yunnan Power Grid Co Ltd
Original Assignee
Information Center of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Center of Yunnan Power Grid Co Ltd filed Critical Information Center of Yunnan Power Grid Co Ltd
Priority to CN202210313113.1A priority Critical patent/CN114614974B/en
Publication of CN114614974A publication Critical patent/CN114614974A/en
Application granted granted Critical
Publication of CN114614974B publication Critical patent/CN114614974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Water Supply & Treatment (AREA)
  • Strategic Management (AREA)
  • Power Engineering (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a privacy set intersection method, a system and a device for power grid data cross-industry sharing; the invention utilizes bloom filter technology to map the local data, thereby greatly reducing the calculation cost in the process of set intersection. By using the random response technology to replace the homomorphic encryption technology, the communication cost is effectively reduced, the requirement of processing a large amount of data in an actual scene can be met, and meanwhile, the random response technology randomly overturns the bloom filter to realize local differential privacy. Protocol parameters calculated in each running are agreed by a user in advance and are not shared with the server, so that malicious attacks of the server can be avoided to a certain extent. And the disturbance bloom filter shared by the user and the server is subjected to random response processing, so that the local privacy data is further protected.

Description

Privacy set intersection method, system and device for power grid data cross-industry sharing
Technical Field
The invention belongs to the field of data security, and relates to a privacy set intersection method, system and device for power grid data cross-industry sharing.
Background
The electric power data contains important information of enterprise institutions and family production and life, and can objectively reflect the operation state of the social life and the social life. However, while the power grid data contains value, a large amount of sensitive information is also contained, for example, the power fee payment condition of an enterprise can reflect the production condition of the enterprise, the power consumption information of a family can reflect the life law of family members, and the open sharing of the sensitive information has huge privacy risks. The privacy calculation can realize the joint mining of the data value on the premise that the data is not local, and can ensure that the data privacy of each party is not exposed in the joint mining process. It can play an important role in the open sharing of power big data. The encryption sample intersection (PSI) is used as a basic stone technology for privacy calculation, the intersection of data IDs of all parties can be calculated on the premise that data are not local, data alignment is carried out, and the method has important significance for cross-industry sharing and enabling of power grid data. Most of the existing encryption sample intersection directly calculates intersection data among different participants based on technologies such as inadvertent transmission, public key encryption and the like. The method has the disadvantages that in order to ensure the safety, all adopted encryption technologies usually need extremely high key digits to achieve the corresponding safety intensity, and the calculation efficiency of the algorithm is greatly reduced, so that the method cannot be applied to mass data scenes of a power grid. Therefore, the privacy set intersection method which is high in efficiency and low in consumption and is suitable for power grid data cross-industry sharing has important practical significance.
Disclosure of Invention
The invention aims to solve the problems in the prior art, and provides a privacy set intersection method, a system and a device for power grid data cross-industry sharing, which can ensure that the privacy data of a user are not acquired by a server and that the user cannot acquire any personal privacy data of other users except intersection; the method and the device protect the user privacy data based on the bloom filter and the local differential privacy technology, and simultaneously reduce the calculation and communication overhead.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a privacy set intersection method for power grid data cross-industry sharing comprises the following steps:
step 1: setting an initial protocol parameter, and setting a privacy budget epsilon, a hash function h (-) and the length L of a bloom filter, wherein all positions of the initial bloom filter are 0;
step 2, calculating a bloom filter of the local data, and aiming at the local data set DiCalculates the local data set D based on a hash function h (·)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
Step 3, random response processing is carried out on the local bloom filter, and BF (D) is processediL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Figure 2
Step 4, integrating all the disturbance bloom filters and constructing a disturbance matrix
Figure BDA0003569087580000022
Wherein N is the number of users;
step 5, calculating a disturbance matrix
Figure BDA0003569087580000023
The ratio of 1 to the value of the element in the j column
Figure BDA0003569087580000024
And in accordance with
Figure BDA0003569087580000025
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure BDA0003569087580000026
For each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Figure 100002_1
Step 6, calculating probability threshold value, to
Figure BDA0003569087580000028
The variance var is analyzed and calculated to obtain a probability threshold value
Figure BDA0003569087580000029
Step 7, obtaining an intersection bloom filter through probability threshold comparison, and initializing the intersection bloom filter with the length of L; will be provided with
Figure BDA00035690875800000210
Each estimate of the ratio and the probability threshold of
Figure BDA00035690875800000211
Make a comparison if
Figure BDA00035690875800000212
Setting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter(L);
Step 8, local data set D is processediCalculating the hash value h (x) of each element x in BF(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
The invention is further improved in that:
step 1, step 2, step 3 and step 8 are the work of the user; step 4, step 5, step 6 and step 7 are the work of the server;
step 3 alsoThe method comprises the following steps: user will disturb bloom filters
Figure BDA0003569087580000031
Sending the data to a server; step 7 also includes: server BF intersection bloom filter(L) sending to the user; step 8 further comprises: user reception intersection bloom filter BF(L)。
The step 3 specifically comprises the following steps:
for BF (D)iL), the user BF (D) for each component in accordance with the random response rule as shown in equation (1)i,L)[j]Is subject to random turnover to obtain
Figure BDA0003569087580000032
Figure BDA0003569087580000033
Where epsilon is the given privacy budget cost.
Further comprises transforming the original matrix V into a disturbance matrix
Figure BDA0003569087580000034
The method specifically comprises the following steps:
the probability that a certain column of elements of the original matrix V takes a value of 1 is ρ, and if there is t probability inversion, the probability that the corresponding column of elements in the disturbance matrix takes a value of 1 is as shown in formula (2):
ρ′=(1-t)ρ+t(1-ρ) (2)
disturbance matrix
Figure BDA0003569087580000035
The ratio of the element value of a certain column in the matrix is rho', the turnover probability is t, and the ratio probability of the corresponding column value of 1 in the original matrix V is large
Figure BDA0003569087580000036
In step 5, the server calculates the proportion of the value of the element in the jth column in the disturbance matrix as 1
Figure BDA0003569087580000037
And according to
Figure BDA0003569087580000038
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure BDA0003569087580000039
Hypothesis perturbation matrix
Figure BDA00035690875800000310
The number of the elements with the value of 1 in the jth column is N, and the number of the elements with the value of 0 is N-N; ρ is a unit of a gradientjThe method is characterized in that the ratio of the original value of 1 in the jth column of the original matrix V is represented, and the derivation and calculation of the step 5 comprises the following steps:
step 5.1: calculating any disturbing bloom Filter BF (D)iL), calculating the probability that the j element takes values of 0 and 1; from the given set conditions, it is easy to know that:
Figure BDA00035690875800000311
Figure BDA0003569087580000041
Figure BDA0003569087580000042
BF (D) for perturbing the bloom filteriL), calculating the probability that the jth element takes a value of 1 as:
Figure BDA0003569087580000043
correspondingly, the probability that the jth element takes a value of 0 is as follows:
Figure BDA0003569087580000044
step 5.2, calculating rhojThe maximum likelihood estimate of (a).
Construction of rhojLikelihood function of (d):
Figure BDA0003569087580000045
with respect to likelihood function L (ρ)j) Taking a logarithm:
Figure BDA0003569087580000046
and is also provided with
Figure BDA0003569087580000047
Figure BDA0003569087580000048
Due to the fact that
Figure BDA0003569087580000049
Therefore, when
Figure BDA00035690875800000410
When log (L) takes the maximum value; at this time ρjEstimated value
Figure BDA00035690875800000411
Is composed of
Figure BDA00035690875800000412
The server determines for each component j e [1, L]After the operation is executed, the probability that the value of the element in each column in the original matrix V is 1 is obtained
Figure BDA00035690875800000413
In step 6, for
Figure BDA0003569087580000051
The variance var of (a) is analyzed and calculated to obtain a probability threshold value
Figure BDA0003569087580000052
The derivation and calculation of step 6 comprises the following steps:
step 6.1-calculating the estimated ratio
Figure BDA0003569087580000053
(iii) a desire; due to disturbance in the bloom filter
Figure BDA0003569087580000054
Can only be 0 or 1, then:
Figure BDA0003569087580000055
let n be the perturbation matrix
Figure BDA0003569087580000056
If the number of the elements in the j column is 1, n is the sum of independent and uniformly distributed random variables;
Figure BDA0003569087580000057
var(n)=Nvar(BF(Di,L)[j]) (12)
Figure BDA0003569087580000058
step 6.2 calculating the estimated ratio
Figure BDA0003569087580000059
The variance of (a);
Figure BDA00035690875800000510
Figure BDA0003569087580000061
a privacy set submission system for cross-industry sharing of grid data, comprising:
the system comprises an initial module, a privacy module and a bloom filter, wherein the initial module is used for setting an initial protocol parameter, setting a privacy budget epsilon, a hash function h (-) and the length L of the bloom filter, and all the positions of the initial bloom filter are 0;
a filter acquisition module for computing a bloom filter for local data for a local data set DiCalculates a local data set D based on a hash function h (·) for each element x in (b)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
A random response processing module for performing random response processing on the local bloom filter for BF (D)iL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Figure BDA0003569087580000062
A disturbance matrix acquisition module for integrating all disturbance bloom filters and constructing a disturbance matrix
Figure BDA0003569087580000063
Wherein N is the number of users;
a probability acquisition module to calculate a perturbation matrix
Figure BDA0003569087580000064
The ratio of 1 to the value of the element in the j column
Figure BDA0003569087580000065
And according to
Figure BDA0003569087580000066
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure BDA0003569087580000067
For each component j ∈ [1, L ]]After the processing, the probability that the value of the element in each column in the estimated original matrix V is 1 is obtained
Figure BDA0003569087580000068
A computing module for pairing
Figure BDA0003569087580000069
The variance var is analyzed and calculated to obtain a probability threshold value
Figure BDA00035690875800000610
The comparison module obtains an intersection bloom filter through probability threshold comparison and initializes the intersection bloom filter with the length of L; will be provided with
Figure BDA00035690875800000611
Each estimate of the ratio and the probability threshold of
Figure BDA00035690875800000612
Make a comparison if
Figure BDA00035690875800000613
Setting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter(L);
A judging module for judging the local data set DiCalculating the hash value h (x) of each element x in BF(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the above method when executing said computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Compared with the prior art, the invention has the following beneficial effects:
the invention utilizes the different bloom filters operated in the process of solving for the encrypted samples each time, does not share the Hash mapping function with the server, and can avoid the malicious attack of the server to a certain extent. And by using the bloom filter as an intermediate carrier for subsequent calculation and communication, the communication cost and the calculation overhead in the privacy set intersection process can be obviously reduced.
Furthermore, local differential privacy of a random response mechanism is introduced, so that the private data of the users can be ensured not to be acquired by the server, and any private data of other users except for intersection can not be acquired among the users. And any user can not obtain exact intersection information, so that the data security is further ensured. The safety requirements of the user are realized.
Further, a differential privacy technique is used instead of a homomorphic encryption technique or a public key encryption as a privacy protection technique. As homomorphic encryption relates to encryption and decryption of plaintext, the encryption and decryption process is time-consuming large prime number operation, and the differential privacy only needs simple arithmetic operation without the encryption and decryption process, so that the calculation cost and the communication overhead of the model can be greatly reduced.
Drawings
In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of a logic architecture of a privacy set intersection system method for cross-industry sharing of grid data;
FIG. 2 is a schematic diagram of an application flow of the embodiment of the present invention in a power grid scenario;
FIG. 3 is a schematic of the performance test of the present invention;
fig. 4 is a block diagram of a privacy aggregation request system for power grid data cross-industry sharing according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description of the present invention, it is to be understood that the term "data" refers to a record with non-repetitive identity tags, "user" refers to an enterprise or organization providing data, including a power grid company providing power data, and also includes other enterprises providing associated data, "server" refers to a third-party device that each user has agreed in advance to interact with information of each participant, "privacy set intersection" refers to calculating data in an intersection without exposing non-intersection data on the premise of not disclosing local data sets of each participant, "bloom filter" refers to a data structure storing data with different tags, respectively, "differential privacy" refers to obtaining calculated results with similar probabilities after performing specified calculation on two data sets with inconsistent data. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, the security of the protocol, whether for honest or non-honest users or servers, is achieved in the following two ways:
1) when the user and the server are honest, namely, they can honestly execute the protocol, both sides cannot obtain the original data information of any other side except the intersection according to the safety of the privacy set intersection technology.
2) When the user or server is not honest, the user or server can infer the original data information of other participants from the obtained intersection. Because the other users and the server only have the intersection set which is the response immediately, the intersection set element takes the appointed probability as the real intersection set element, and therefore the other users and the server cannot acquire any original data information except the intersection set element from the acquired intersection set. Even if individual users collude with the server, the original data information of other users likewise cannot be inferred.
The invention discloses a privacy set intersection method for power grid data cross-industry sharing, which comprises the following steps:
step 1: each user appoints initial protocol parameters, sets the privacy budget epsilon, the hash function h (-) and the length L of the bloom filter, and all the users negotiate and determine the privacy budget epsilon, the hash function h (-) and the length L of the bloom filter. Wherein all the positions of the initial bloom filter are 0;
step 2, the user calculates the bloom filter of the local data, and the local data set D is treatediCalculates a local data set D based on a hash function h (·) for each element x in (b)iAnd according to the obtained hash value, setting the value of the corresponding subscript to 1 in the bloom filter to obtain BF (D)iL); and storing the mapping relation between each data and the hash value thereof.
Each user performs mapping processing on local data by using a bloom filter, so that the calculation cost of set intersection is always O (N). Multiple hash functions are typically used in a bloom filter, but only bloom filters with a single hash function are considered in the present approach.
Step 3, the user checks the bookThe bloom Filter performs random response processing for BF (D)iL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Figure BDA0003569087580000101
User will disturb bloom filters
Figure BDA0003569087580000102
And sending the data to a server.
For BF (D)iL), the user BF (D) for each component in accordance with the random response rule as shown in equation (1)i,L)[j]Is subject to random turnover to obtain
Figure BDA0003569087580000103
Figure BDA0003569087580000104
Where epsilon is the given privacy budget cost.
Step 4, the server integrates all the disturbance bloom filters and constructs a disturbance matrix
Figure BDA0003569087580000105
Figure BDA0003569087580000106
Wherein N is the number of users;
step 5, the server calculates the disturbance matrix
Figure BDA0003569087580000107
The ratio of 1 to the value of the element in the j column
Figure BDA0003569087580000108
And according to
Figure BDA0003569087580000109
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure BDA00035690875800001010
For each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Figure BDA00035690875800001011
The probability that a certain column of elements of the original matrix v takes a value of 1 is ρ, and if t is inverted, the probability that the corresponding column of elements in the disturbance matrix takes a value of 1 is as shown in formula (2):
ρ′=(1-t)ρ+t(1-ρ) (2)
disturbance matrix
Figure BDA0003569087580000111
The ratio of the value of the element of a certain column in the original matrix V to 1 is rho', the turnover probability is t, and the ratio probability of the value of the corresponding column in the original matrix V to 1 is large
Figure BDA0003569087580000112
Hypothesis perturbation matrix
Figure BDA0003569087580000113
The number of elements in the jth column which take on the value of 1 is n, and the number of elements which take on the value of 0 is n
N-n;ρjThe method is characterized in that the ratio of the original value of 1 in the jth column of the original matrix V is represented, and the derivation and calculation of the step 5 comprises the following steps:
step 5.1: calculating any disturbing bloom Filter BF (D)iL), calculating the probability that the j element takes values of 0 and 1; from the given set conditions, it is easy to know that:
Figure BDA0003569087580000114
BF (D) for perturbing the bloom filteriL), calculating the probability that the jth element takes a value of 1 as:
Figure BDA0003569087580000115
correspondingly, the probability that the jth element takes a value of 0 is as follows:
Figure BDA0003569087580000116
step 5.2, calculating rhojThe maximum likelihood estimate of (a).
Construction of rhojLikelihood function of (d):
Figure BDA0003569087580000117
with respect to likelihood function L (ρ)j) Taking logarithm:
Figure BDA0003569087580000118
and is also provided with
Figure BDA0003569087580000121
Figure BDA0003569087580000122
Due to the fact that
Figure BDA0003569087580000123
Therefore, when
Figure BDA0003569087580000124
When log (L) takes the maximum value; at this time ρjEstimated value
Figure BDA0003569087580000125
Is composed of
Figure BDA0003569087580000126
The server determines for each component j e [1, L]After the operation is executed, the probability that the value of the element in each column in the original matrix V is 1 is obtained
Figure BDA0003569087580000127
Step 6, the server calculates a probability threshold value, to
Figure BDA0003569087580000128
The variance var is analyzed and calculated to obtain a probability threshold value
Figure BDA0003569087580000129
Step 6.1-calculating the estimated ratio
Figure BDA00035690875800001210
(iii) a desire; due to disturbance in the bloom filter
Figure BDA00035690875800001211
Can only be 0 or 1, then:
Figure BDA00035690875800001212
let n be the perturbation matrix
Figure BDA00035690875800001213
If the number of the elements in the j column is 1, n is the sum of independent and uniformly distributed random variables;
Figure BDA00035690875800001214
var(n)=Nvar(BF(Di,L)[j]) (12)
Figure BDA0003569087580000131
step 6.2 calculating the estimated ratio
Figure BDA0003569087580000132
The variance of (a);
Figure BDA0003569087580000133
step 7, the server obtains an intersection bloom filter through probability threshold comparison, and initializes the intersection bloom filter with the length of L; will be provided with
Figure BDA0003569087580000134
Each estimate of the ratio and the probability threshold of
Figure BDA0003569087580000135
Make a comparison if
Figure BDA0003569087580000136
Setting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter(L); server BF intersection bloom filter(L) sending to the user.
Step 8, the user receives the intersection bloom filter BF(L); for local data set DiCalculating the hash value h (x) of each element x in BF(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
Referring to fig. 2, the process of applying the method to the power grid scene is as follows:
step 1, a service system of a non-power grid user A initiates a cooperation request, and a power grid user B agrees;
step 2, the user B sends a data matching request to the user A, wherein the request comprises the identity information format of the data;
step 3, the user A and the user B appoint initial protocol parameters and server information;
step 4, the user A and the user B respectively process the identity information of the respective data sets locally;
step 5, the server receives the bloom filter and calculates possible intersection;
step 6, the server returns the intersection obtained by calculation to the user A and the user B;
and 7, respectively obtaining intersection parts in the local data sets by the user A and the user B.
Referring to fig. 3, using the harmonic mean of the recall ratio and the accuracy as the evaluation index of the algorithm, the calculation method is as follows:
Figure BDA0003569087580000141
Figure BDA0003569087580000142
Figure BDA0003569087580000143
where interreflection represents the true intersection and animation represents the estimated intersection according to the present invention.
To test the performance of the present invention, the complete data set was divided into 30, 60, 100, 200, 400 subsets respectively and it was assumed that each user held and only one subset. Tests were performed at different numbers of users and privacy budgets, and the results are shown in fig. 3. As can be seen from fig. 3, the performance of the present invention is greatly improved as the privacy budget increases.
Referring to fig. 4, the invention discloses a privacy set intersection system for power grid data cross-industry sharing, which comprises:
the system comprises an initial module, a privacy module and a bloom filter, wherein the initial module is used for setting an initial protocol parameter, setting a privacy budget epsilon, a hash function h (-) and the length L of the bloom filter, and all the positions of the initial bloom filter are 0;
a filter acquisition module for computing a bloom filter of local data forLocal data set DiCalculates the local data set D based on a hash function h (·)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
A random response processing module for performing random response processing on the local bloom filter for BF (D)iL) for each component BF (D) according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Figure BDA0003569087580000144
A disturbance matrix acquisition module for integrating all disturbance bloom filters and constructing a disturbance matrix
Figure BDA0003569087580000151
Wherein N is the number of users;
a probability acquisition module to calculate a perturbation matrix
Figure BDA0003569087580000152
The ratio of 1 to the value of the element in the j column
Figure BDA0003569087580000153
And in accordance with
Figure BDA0003569087580000154
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure BDA0003569087580000155
For each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Figure BDA0003569087580000156
A computing module for pairing
Figure BDA0003569087580000157
The variance var is analyzed and calculated to obtain a probability threshold value
Figure BDA0003569087580000158
The comparison module obtains an intersection bloom filter through probability threshold comparison and initializes the intersection bloom filter with the length of L; will be provided with
Figure BDA0003569087580000159
Each estimate of the ratio and the probability threshold of
Figure BDA00035690875800001510
Make a comparison if
Figure BDA00035690875800001511
Setting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter(L);
A judging module for judging the local data set DiCalculating the hash value h (x) of each element x in BF(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
The terminal device provided by the embodiment of the invention. The terminal device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor realizes the steps of the above-mentioned method embodiments when executing the computer program. Alternatively, the processor implements the functions of the modules/units in the above device embodiments when executing the computer program.
The computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention.
The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory.
The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc.
The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.
The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A privacy set intersection method for power grid data cross-industry sharing is characterized by comprising the following steps:
step 1: setting an initial protocol parameter, and setting a privacy budget epsilon, a hash function h (-) and the length L of a bloom filter, wherein all positions of the initial bloom filter are 0;
step 2, calculating a bloom filter of the local data, and aiming at the local data set DiCalculates the local data set D based on a hash function h (·)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
Step 3, random response processing is carried out on the local bloom filter, and BF (D) is processediL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Figure 1
Step 4, integrating all the disturbance bloom filters and constructing a disturbance matrix
Figure FDA0003569087570000012
Wherein N is the number of users;
step 5, calculating a disturbance matrix
Figure FDA0003569087570000013
The ratio of 1 to the value of the element in the j column
Figure FDA0003569087570000014
And according to
Figure FDA0003569087570000015
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure FDA0003569087570000016
For each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Figure FDA0003569087570000017
Step 6, calculating probability threshold value, to
Figure FDA0003569087570000018
The variance var is analyzed and calculated to obtain a probability threshold value
Figure FDA0003569087570000019
Step 7, obtaining an intersection bloom filter through probability threshold comparison, and initializing the intersection bloom filter with the length of L; will be provided with
Figure FDA00035690875700000110
Each estimate of the ratio and the probability threshold of
Figure FDA00035690875700000111
Make a comparison if
Figure FDA00035690875700000112
Setting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter(L);
Step 8, local data set D is processediCalculating the hash value h (x) of each element x in BF(L)[h(x)]=1,Then x is the intersection element, otherwise, it is not.
2. The privacy set intersection method for power grid data cross-industry sharing according to claim 1, wherein the steps 1, 2, 3 and 8 are user work; the step 4, the step 5, the step 6 and the step 7 are the work of the server;
the step 3 further comprises: user will disturb bloom filters
Figure FDA00035690875700000113
Sending the data to a server; the step 7 further comprises: the server BF the intersection bloom filter(L) sending to the user; the step 8 further comprises: user reception intersection bloom filter BF(L)。
3. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, wherein the step 3 specifically comprises:
for BF (D)iL), the user BF (D) for each component in accordance with the random response rule as shown in equation (1)i,L)[j]Is subject to random turnover to obtain
Figure FDA0003569087570000021
Figure FDA0003569087570000022
Where epsilon is the given privacy budget cost.
4. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, further comprising converting an original matrix V into a disturbance matrix
Figure FDA0003569087570000023
The method comprises the following specific steps:
the probability that a certain column of elements of the original matrix V takes a value of 1 is ρ, and if there is t probability inversion, the probability that the corresponding column of elements in the disturbance matrix takes a value of 1 is as shown in formula (2):
ρ′=(1-t)ρ+t(1-ρ) (2)
disturbance matrix
Figure FDA0003569087570000024
The ratio of the value of the element of a certain column in the original matrix V to 1 is rho', the turnover probability is t, and the ratio probability of the value of the corresponding column in the original matrix V to 1 is large
Figure FDA0003569087570000025
5. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, wherein in the step 5, the server calculates a proportion of a disturbance matrix in which an element in a jth column takes a value of 1
Figure FDA0003569087570000026
And according to
Figure FDA0003569087570000027
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure FDA0003569087570000028
Hypothesis perturbation matrix
Figure FDA0003569087570000029
The number of the elements with the value of 1 in the jth column is N, and the number of the elements with the value of 0 is N-N; rhojThe method is characterized in that the ratio of the original value of 1 in the jth column of the original matrix V is represented, and the derivation and calculation of the step 5 comprises the following steps:
step 5.1: calculating any disturbing bloom Filter BF (D)iL), calculating the probability that the j element takes values of 0 and 1; from the given set conditions, it is easy to know that:
Figure FDA00035690875700000210
Figure FDA0003569087570000031
Figure FDA0003569087570000032
Figure FDA0003569087570000033
BF (D) for perturbing the bloom filteriL), calculating the probability that the jth element takes a value of 1 as:
Figure FDA0003569087570000034
correspondingly, the probability that the jth element takes a value of 0 is as follows:
Figure FDA0003569087570000035
step 5.2, calculating rhojA maximum likelihood estimate of;
construction of rhojLikelihood function of (d):
Figure FDA0003569087570000036
with respect to likelihood function L (ρ)j) Taking logarithm:
Figure FDA0003569087570000037
and is also provided with
Figure FDA0003569087570000038
Figure FDA0003569087570000039
Due to the fact that
Figure FDA00035690875700000310
Therefore, when
Figure FDA00035690875700000311
When log (L) takes the maximum value; at this time ρjEstimated value
Figure FDA00035690875700000312
Is composed of
Figure FDA00035690875700000313
The server determines for each component j e [1, L]After the operation is executed, the probability that the value of the element in each column in the original matrix V is 1 is obtained
Figure FDA0003569087570000041
6. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, wherein in the step 6, the privacy set intersection method is used for the power grid data cross-industry sharing
Figure FDA0003569087570000042
The variance var of (a) is analyzed and calculated to obtain a probability threshold value
Figure FDA0003569087570000043
The derivation and calculation of step 6 includes the following stepsThe method comprises the following steps:
step 6.1-calculate the estimated ratio
Figure FDA0003569087570000044
(iii) a desire; due to disturbance in the bloom filter
Figure FDA0003569087570000045
Can only be 0 or 1, then:
Figure FDA0003569087570000046
let n be the perturbation matrix
Figure FDA0003569087570000047
The number of the element value in the j column is 1, which is the sum of independent and identically distributed random variables;
Figure FDA0003569087570000048
var(n)=Nvar(BF(Di,L)[j]) (12)
Figure FDA0003569087570000049
step 6.2 calculating the estimated proportion
Figure FDA00035690875700000410
The variance of (a);
Figure FDA00035690875700000411
Figure FDA0003569087570000051
7. a privacy set submission system for cross-industry sharing of grid data, comprising:
the system comprises an initial module, a privacy module and a bloom filter, wherein the initial module is used for setting an initial protocol parameter, setting a privacy budget epsilon, a hash function h (-) and the length L of the bloom filter, and all the positions of the initial bloom filter are 0;
a filter acquisition module for computing a bloom filter for local data for a local data set DiCalculates a local data set D based on a hash function h (·) for each element x in (b)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
A random response processing module for performing random response processing on the local bloom filter for BF (D)iL), BF (D) for each component according to a random response rulei,L)[j]Random overturning is carried out to obtain a disturbed bloom filter
Figure FDA0003569087570000052
A disturbance matrix acquisition module for integrating all disturbance bloom filters and constructing a disturbance matrix
Figure FDA0003569087570000053
Wherein N is the number of users;
a probability acquisition module to calculate a perturbation matrix
Figure FDA0003569087570000054
The ratio of 1 to the value of the element in the j column
Figure FDA0003569087570000055
And according to
Figure FDA0003569087570000056
To estimate the ratio of the j column element value of 1 in the original matrix V
Figure FDA0003569087570000057
For each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Figure FDA0003569087570000058
A computing module for pairing
Figure FDA0003569087570000059
The variance var is analyzed and calculated to obtain a probability threshold value
Figure FDA00035690875700000510
The comparison module obtains an intersection bloom filter through probability threshold comparison and initializes the intersection bloom filter with the length of L; will be provided with
Figure FDA00035690875700000511
Each estimate of the ratio and the probability threshold of
Figure FDA00035690875700000512
Make a comparison if
Figure FDA00035690875700000513
Setting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter(L);
A judging module for judging the local data set DiCalculating the hash value h (x) of each element x in BF(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor realizes the steps of the method according to any of claims 1-6 when executing the computer program.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202210313113.1A 2022-03-28 2022-03-28 Privacy set intersection method, system and device for power grid data cross-industry sharing Active CN114614974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210313113.1A CN114614974B (en) 2022-03-28 2022-03-28 Privacy set intersection method, system and device for power grid data cross-industry sharing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210313113.1A CN114614974B (en) 2022-03-28 2022-03-28 Privacy set intersection method, system and device for power grid data cross-industry sharing

Publications (2)

Publication Number Publication Date
CN114614974A true CN114614974A (en) 2022-06-10
CN114614974B CN114614974B (en) 2023-01-03

Family

ID=81867272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210313113.1A Active CN114614974B (en) 2022-03-28 2022-03-28 Privacy set intersection method, system and device for power grid data cross-industry sharing

Country Status (1)

Country Link
CN (1) CN114614974B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242371A (en) * 2022-06-15 2022-10-25 华中科技大学 Method, device and system for calculating set intersection and cardinality of differential privacy protection
CN115396144A (en) * 2022-07-20 2022-11-25 北京冲量在线科技有限公司 Multi-party privacy intersection scheme based on trusted execution environment and distributed data intersection algorithm
CN117201187A (en) * 2023-11-01 2023-12-08 国网湖北省电力有限公司武汉供电公司 Power data secure sharing method, system and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543842A (en) * 2018-11-02 2019-03-29 西安交通大学 The Distribution estimation method of higher-dimension intelligent perception data with local secret protection
US20190272388A1 (en) * 2018-03-01 2019-09-05 Etron Technology, Inc. Data collection and analysis method and related device thereof
CN110866263A (en) * 2019-11-14 2020-03-06 中国科学院信息工程研究所 User privacy information protection method and system capable of resisting longitudinal attack
CN113094746A (en) * 2021-03-31 2021-07-09 北京邮电大学 High-dimensional data publishing method based on localized differential privacy and related equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190272388A1 (en) * 2018-03-01 2019-09-05 Etron Technology, Inc. Data collection and analysis method and related device thereof
CN109543842A (en) * 2018-11-02 2019-03-29 西安交通大学 The Distribution estimation method of higher-dimension intelligent perception data with local secret protection
CN110866263A (en) * 2019-11-14 2020-03-06 中国科学院信息工程研究所 User privacy information protection method and system capable of resisting longitudinal attack
CN113094746A (en) * 2021-03-31 2021-07-09 北京邮电大学 High-dimensional data publishing method based on localized differential privacy and related equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张佳程 等: "大数据环境下的本地差分隐私图信息收集方法", 《信息网络安全》 *
张啸剑 等: "基于本地差分隐私的空间范围查询方法", 《计算机研究与发展》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242371A (en) * 2022-06-15 2022-10-25 华中科技大学 Method, device and system for calculating set intersection and cardinality of differential privacy protection
CN115242371B (en) * 2022-06-15 2024-04-19 华中科技大学 Differential privacy-protected set intersection and base number calculation method, device and system thereof
CN115396144A (en) * 2022-07-20 2022-11-25 北京冲量在线科技有限公司 Multi-party privacy intersection scheme based on trusted execution environment and distributed data intersection algorithm
CN115396144B (en) * 2022-07-20 2023-12-05 北京冲量在线科技有限公司 Multiparty privacy intersection scheme based on trusted execution environment and distributed data intersection algorithm
CN117201187A (en) * 2023-11-01 2023-12-08 国网湖北省电力有限公司武汉供电公司 Power data secure sharing method, system and storage medium
CN117201187B (en) * 2023-11-01 2024-01-05 国网湖北省电力有限公司武汉供电公司 Power data secure sharing method, system and storage medium

Also Published As

Publication number Publication date
CN114614974B (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN114614974B (en) Privacy set intersection method, system and device for power grid data cross-industry sharing
US11399079B2 (en) Zero-knowledge environment based networking engine
Dilawar et al. Blockchain: securing internet of medical things (IoMT)
US11451392B2 (en) Token-based secure data management
US11805105B2 (en) System and associated method for ensuring data privacy
Taleb et al. Cloud computing trends: A literature review
US9577829B1 (en) Multi-party computation services
Nagaraju et al. Trusted framework for online banking in public cloud using multi-factor authentication and privacy protection gateway
WO2024007599A1 (en) Heterogeneous graph neural network-based method and apparatus for determining target service
CN106375331A (en) Mining method and device of attacking organization
US20220078023A1 (en) Private set calculation using private intersection and calculation, and applications thereof
Vladimirov et al. Security and privacy protection obstacles with 3D reconstructed models of people in applications and the metaverse: A survey
CN117390657A (en) Data encryption method, device, computer equipment and storage medium
CN113315624A (en) Data security management method and system based on multipoint cooperation mechanism
Tyagi et al. Federated learning: Applications, Security hazards and Defense measures
AlFaw et al. Blockchain vulnerabilities and recent security challenges: A review paper
CN113239401A (en) Big data analysis system and method based on power Internet of things and computer storage medium
CN110737905A (en) Data authorization method, data authorization device and computer storage medium
Yuan Towards the development of best data security for big data
CN115719094B (en) Model training method, device, equipment and storage medium based on federal learning
Xu et al. FedG2L: a privacy-preserving federated learning scheme base on “G2L” against poisoning attack
Mishra et al. An Efficient User Protected Encryption Storage Algorithm Used in Encrypted Cloud Data
Tiwari et al. Search for Articles
CN118690412B (en) Data access method, device, electronic equipment and computer readable medium
Dubey et al. Fuzzy logic based intelligent data sensitive security model for big data in healthcare

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant