CN114614974A - Privacy set intersection method, system and device for power grid data cross-industry sharing - Google Patents
Privacy set intersection method, system and device for power grid data cross-industry sharing Download PDFInfo
- Publication number
- CN114614974A CN114614974A CN202210313113.1A CN202210313113A CN114614974A CN 114614974 A CN114614974 A CN 114614974A CN 202210313113 A CN202210313113 A CN 202210313113A CN 114614974 A CN114614974 A CN 114614974A
- Authority
- CN
- China
- Prior art keywords
- value
- bloom filter
- intersection
- probability
- column
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000004044 response Effects 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 19
- 238000004364 calculation method Methods 0.000 claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims description 62
- 230000006870 function Effects 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 17
- 230000007306 turnover Effects 0.000 claims description 6
- 238000009795 derivation Methods 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 230000003094 perturbing effect Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 9
- 230000008569 process Effects 0.000 abstract description 8
- 238000004891 communication Methods 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011056 performance test Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004575 stone Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0407—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Economics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Water Supply & Treatment (AREA)
- Strategic Management (AREA)
- Power Engineering (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a privacy set intersection method, a system and a device for power grid data cross-industry sharing; the invention utilizes bloom filter technology to map the local data, thereby greatly reducing the calculation cost in the process of set intersection. By using the random response technology to replace the homomorphic encryption technology, the communication cost is effectively reduced, the requirement of processing a large amount of data in an actual scene can be met, and meanwhile, the random response technology randomly overturns the bloom filter to realize local differential privacy. Protocol parameters calculated in each running are agreed by a user in advance and are not shared with the server, so that malicious attacks of the server can be avoided to a certain extent. And the disturbance bloom filter shared by the user and the server is subjected to random response processing, so that the local privacy data is further protected.
Description
Technical Field
The invention belongs to the field of data security, and relates to a privacy set intersection method, system and device for power grid data cross-industry sharing.
Background
The electric power data contains important information of enterprise institutions and family production and life, and can objectively reflect the operation state of the social life and the social life. However, while the power grid data contains value, a large amount of sensitive information is also contained, for example, the power fee payment condition of an enterprise can reflect the production condition of the enterprise, the power consumption information of a family can reflect the life law of family members, and the open sharing of the sensitive information has huge privacy risks. The privacy calculation can realize the joint mining of the data value on the premise that the data is not local, and can ensure that the data privacy of each party is not exposed in the joint mining process. It can play an important role in the open sharing of power big data. The encryption sample intersection (PSI) is used as a basic stone technology for privacy calculation, the intersection of data IDs of all parties can be calculated on the premise that data are not local, data alignment is carried out, and the method has important significance for cross-industry sharing and enabling of power grid data. Most of the existing encryption sample intersection directly calculates intersection data among different participants based on technologies such as inadvertent transmission, public key encryption and the like. The method has the disadvantages that in order to ensure the safety, all adopted encryption technologies usually need extremely high key digits to achieve the corresponding safety intensity, and the calculation efficiency of the algorithm is greatly reduced, so that the method cannot be applied to mass data scenes of a power grid. Therefore, the privacy set intersection method which is high in efficiency and low in consumption and is suitable for power grid data cross-industry sharing has important practical significance.
Disclosure of Invention
The invention aims to solve the problems in the prior art, and provides a privacy set intersection method, a system and a device for power grid data cross-industry sharing, which can ensure that the privacy data of a user are not acquired by a server and that the user cannot acquire any personal privacy data of other users except intersection; the method and the device protect the user privacy data based on the bloom filter and the local differential privacy technology, and simultaneously reduce the calculation and communication overhead.
In order to achieve the purpose, the invention adopts the following technical scheme to realize the purpose:
a privacy set intersection method for power grid data cross-industry sharing comprises the following steps:
step 1: setting an initial protocol parameter, and setting a privacy budget epsilon, a hash function h (-) and the length L of a bloom filter, wherein all positions of the initial bloom filter are 0;
Step 3, random response processing is carried out on the local bloom filter, and BF (D) is processediL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Step 4, integrating all the disturbance bloom filters and constructing a disturbance matrixWherein N is the number of users;
step 5, calculating a disturbance matrixThe ratio of 1 to the value of the element in the j columnAnd in accordance withTo estimate the ratio of the j column element value of 1 in the original matrix VFor each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Step 6, calculating probability threshold value, toThe variance var is analyzed and calculated to obtain a probability threshold value
Step 7, obtaining an intersection bloom filter through probability threshold comparison, and initializing the intersection bloom filter with the length of L; will be provided withEach estimate of the ratio and the probability threshold ofMake a comparison ifSetting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter∩(L);
Step 8, local data set D is processediCalculating the hash value h (x) of each element x in BF∩(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
The invention is further improved in that:
step 3 alsoThe method comprises the following steps: user will disturb bloom filtersSending the data to a server; step 7 also includes: server BF intersection bloom filter∩(L) sending to the user; step 8 further comprises: user reception intersection bloom filter BF∩(L)。
The step 3 specifically comprises the following steps:
for BF (D)iL), the user BF (D) for each component in accordance with the random response rule as shown in equation (1)i,L)[j]Is subject to random turnover to obtain
Where epsilon is the given privacy budget cost.
Further comprises transforming the original matrix V into a disturbance matrixThe method specifically comprises the following steps:
the probability that a certain column of elements of the original matrix V takes a value of 1 is ρ, and if there is t probability inversion, the probability that the corresponding column of elements in the disturbance matrix takes a value of 1 is as shown in formula (2):
ρ′=(1-t)ρ+t(1-ρ) (2)
disturbance matrixThe ratio of the element value of a certain column in the matrix is rho', the turnover probability is t, and the ratio probability of the corresponding column value of 1 in the original matrix V is large
In step 5, the server calculates the proportion of the value of the element in the jth column in the disturbance matrix as 1And according toTo estimate the ratio of the j column element value of 1 in the original matrix VHypothesis perturbation matrixThe number of the elements with the value of 1 in the jth column is N, and the number of the elements with the value of 0 is N-N; ρ is a unit of a gradientjThe method is characterized in that the ratio of the original value of 1 in the jth column of the original matrix V is represented, and the derivation and calculation of the step 5 comprises the following steps:
step 5.1: calculating any disturbing bloom Filter BF (D)iL), calculating the probability that the j element takes values of 0 and 1; from the given set conditions, it is easy to know that:
BF (D) for perturbing the bloom filteriL), calculating the probability that the jth element takes a value of 1 as:
correspondingly, the probability that the jth element takes a value of 0 is as follows:
step 5.2, calculating rhojThe maximum likelihood estimate of (a).
Construction of rhojLikelihood function of (d):
with respect to likelihood function L (ρ)j) Taking a logarithm:
and is also provided with
Due to the fact thatTherefore, whenWhen log (L) takes the maximum value; at this time ρjEstimated valueIs composed ofThe server determines for each component j e [1, L]After the operation is executed, the probability that the value of the element in each column in the original matrix V is 1 is obtained
In step 6, forThe variance var of (a) is analyzed and calculated to obtain a probability threshold valueThe derivation and calculation of step 6 comprises the following steps:
step 6.1-calculating the estimated ratio(iii) a desire; due to disturbance in the bloom filterCan only be 0 or 1, then:
let n be the perturbation matrixIf the number of the elements in the j column is 1, n is the sum of independent and uniformly distributed random variables;
var(n)=Nvar(BF(Di,L)[j]) (12)
a privacy set submission system for cross-industry sharing of grid data, comprising:
the system comprises an initial module, a privacy module and a bloom filter, wherein the initial module is used for setting an initial protocol parameter, setting a privacy budget epsilon, a hash function h (-) and the length L of the bloom filter, and all the positions of the initial bloom filter are 0;
a filter acquisition module for computing a bloom filter for local data for a local data set DiCalculates a local data set D based on a hash function h (·) for each element x in (b)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
A random response processing module for performing random response processing on the local bloom filter for BF (D)iL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
A disturbance matrix acquisition module for integrating all disturbance bloom filters and constructing a disturbance matrixWherein N is the number of users;
a probability acquisition module to calculate a perturbation matrixThe ratio of 1 to the value of the element in the j columnAnd according toTo estimate the ratio of the j column element value of 1 in the original matrix VFor each component j ∈ [1, L ]]After the processing, the probability that the value of the element in each column in the estimated original matrix V is 1 is obtained
A computing module for pairingThe variance var is analyzed and calculated to obtain a probability threshold value
The comparison module obtains an intersection bloom filter through probability threshold comparison and initializes the intersection bloom filter with the length of L; will be provided withEach estimate of the ratio and the probability threshold ofMake a comparison ifSetting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter∩(L);
A judging module for judging the local data set DiCalculating the hash value h (x) of each element x in BF∩(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
A terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the above method when executing said computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Compared with the prior art, the invention has the following beneficial effects:
the invention utilizes the different bloom filters operated in the process of solving for the encrypted samples each time, does not share the Hash mapping function with the server, and can avoid the malicious attack of the server to a certain extent. And by using the bloom filter as an intermediate carrier for subsequent calculation and communication, the communication cost and the calculation overhead in the privacy set intersection process can be obviously reduced.
Furthermore, local differential privacy of a random response mechanism is introduced, so that the private data of the users can be ensured not to be acquired by the server, and any private data of other users except for intersection can not be acquired among the users. And any user can not obtain exact intersection information, so that the data security is further ensured. The safety requirements of the user are realized.
Further, a differential privacy technique is used instead of a homomorphic encryption technique or a public key encryption as a privacy protection technique. As homomorphic encryption relates to encryption and decryption of plaintext, the encryption and decryption process is time-consuming large prime number operation, and the differential privacy only needs simple arithmetic operation without the encryption and decryption process, so that the calculation cost and the communication overhead of the model can be greatly reduced.
Drawings
In order to more clearly explain the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of a logic architecture of a privacy set intersection system method for cross-industry sharing of grid data;
FIG. 2 is a schematic diagram of an application flow of the embodiment of the present invention in a power grid scenario;
FIG. 3 is a schematic of the performance test of the present invention;
fig. 4 is a block diagram of a privacy aggregation request system for power grid data cross-industry sharing according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the embodiments of the present invention, it should be noted that if the terms "upper", "lower", "horizontal", "inner", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which is usually arranged when the product of the present invention is used, the description is merely for convenience and simplicity, and the indication or suggestion that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, cannot be understood as limiting the present invention. Furthermore, the terms "first," "second," and the like are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Furthermore, the term "horizontal", if present, does not mean that the component is required to be absolutely horizontal, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the embodiments of the present invention, it should be further noted that unless otherwise explicitly stated or limited, the terms "disposed," "mounted," "connected," and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description of the present invention, it is to be understood that the term "data" refers to a record with non-repetitive identity tags, "user" refers to an enterprise or organization providing data, including a power grid company providing power data, and also includes other enterprises providing associated data, "server" refers to a third-party device that each user has agreed in advance to interact with information of each participant, "privacy set intersection" refers to calculating data in an intersection without exposing non-intersection data on the premise of not disclosing local data sets of each participant, "bloom filter" refers to a data structure storing data with different tags, respectively, "differential privacy" refers to obtaining calculated results with similar probabilities after performing specified calculation on two data sets with inconsistent data. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, the security of the protocol, whether for honest or non-honest users or servers, is achieved in the following two ways:
1) when the user and the server are honest, namely, they can honestly execute the protocol, both sides cannot obtain the original data information of any other side except the intersection according to the safety of the privacy set intersection technology.
2) When the user or server is not honest, the user or server can infer the original data information of other participants from the obtained intersection. Because the other users and the server only have the intersection set which is the response immediately, the intersection set element takes the appointed probability as the real intersection set element, and therefore the other users and the server cannot acquire any original data information except the intersection set element from the acquired intersection set. Even if individual users collude with the server, the original data information of other users likewise cannot be inferred.
The invention discloses a privacy set intersection method for power grid data cross-industry sharing, which comprises the following steps:
step 1: each user appoints initial protocol parameters, sets the privacy budget epsilon, the hash function h (-) and the length L of the bloom filter, and all the users negotiate and determine the privacy budget epsilon, the hash function h (-) and the length L of the bloom filter. Wherein all the positions of the initial bloom filter are 0;
Each user performs mapping processing on local data by using a bloom filter, so that the calculation cost of set intersection is always O (N). Multiple hash functions are typically used in a bloom filter, but only bloom filters with a single hash function are considered in the present approach.
Step 3, the user checks the bookThe bloom Filter performs random response processing for BF (D)iL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filterUser will disturb bloom filtersAnd sending the data to a server.
For BF (D)iL), the user BF (D) for each component in accordance with the random response rule as shown in equation (1)i,L)[j]Is subject to random turnover to obtain
Where epsilon is the given privacy budget cost.
Step 4, the server integrates all the disturbance bloom filters and constructs a disturbance matrix Wherein N is the number of users;
step 5, the server calculates the disturbance matrixThe ratio of 1 to the value of the element in the j columnAnd according toTo estimate the ratio of the j column element value of 1 in the original matrix VFor each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
The probability that a certain column of elements of the original matrix v takes a value of 1 is ρ, and if t is inverted, the probability that the corresponding column of elements in the disturbance matrix takes a value of 1 is as shown in formula (2):
ρ′=(1-t)ρ+t(1-ρ) (2)
disturbance matrixThe ratio of the value of the element of a certain column in the original matrix V to 1 is rho', the turnover probability is t, and the ratio probability of the value of the corresponding column in the original matrix V to 1 is large
Hypothesis perturbation matrixThe number of elements in the jth column which take on the value of 1 is n, and the number of elements which take on the value of 0 is n
N-n;ρjThe method is characterized in that the ratio of the original value of 1 in the jth column of the original matrix V is represented, and the derivation and calculation of the step 5 comprises the following steps:
step 5.1: calculating any disturbing bloom Filter BF (D)iL), calculating the probability that the j element takes values of 0 and 1; from the given set conditions, it is easy to know that:
BF (D) for perturbing the bloom filteriL), calculating the probability that the jth element takes a value of 1 as:
correspondingly, the probability that the jth element takes a value of 0 is as follows:
step 5.2, calculating rhojThe maximum likelihood estimate of (a).
Construction of rhojLikelihood function of (d):
with respect to likelihood function L (ρ)j) Taking logarithm:
and is also provided with
Due to the fact thatTherefore, whenWhen log (L) takes the maximum value; at this time ρjEstimated valueIs composed ofThe server determines for each component j e [1, L]After the operation is executed, the probability that the value of the element in each column in the original matrix V is 1 is obtained
Step 6, the server calculates a probability threshold value, toThe variance var is analyzed and calculated to obtain a probability threshold value
Step 6.1-calculating the estimated ratio(iii) a desire; due to disturbance in the bloom filterCan only be 0 or 1, then:
let n be the perturbation matrixIf the number of the elements in the j column is 1, n is the sum of independent and uniformly distributed random variables;
var(n)=Nvar(BF(Di,L)[j]) (12)
step 7, the server obtains an intersection bloom filter through probability threshold comparison, and initializes the intersection bloom filter with the length of L; will be provided withEach estimate of the ratio and the probability threshold ofMake a comparison ifSetting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter∩(L); server BF intersection bloom filter∩(L) sending to the user.
Step 8, the user receives the intersection bloom filter BF∩(L); for local data set DiCalculating the hash value h (x) of each element x in BF∩(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
Referring to fig. 2, the process of applying the method to the power grid scene is as follows:
step 3, the user A and the user B appoint initial protocol parameters and server information;
step 4, the user A and the user B respectively process the identity information of the respective data sets locally;
step 5, the server receives the bloom filter and calculates possible intersection;
step 6, the server returns the intersection obtained by calculation to the user A and the user B;
and 7, respectively obtaining intersection parts in the local data sets by the user A and the user B.
Referring to fig. 3, using the harmonic mean of the recall ratio and the accuracy as the evaluation index of the algorithm, the calculation method is as follows:
where interreflection represents the true intersection and animation represents the estimated intersection according to the present invention.
To test the performance of the present invention, the complete data set was divided into 30, 60, 100, 200, 400 subsets respectively and it was assumed that each user held and only one subset. Tests were performed at different numbers of users and privacy budgets, and the results are shown in fig. 3. As can be seen from fig. 3, the performance of the present invention is greatly improved as the privacy budget increases.
Referring to fig. 4, the invention discloses a privacy set intersection system for power grid data cross-industry sharing, which comprises:
the system comprises an initial module, a privacy module and a bloom filter, wherein the initial module is used for setting an initial protocol parameter, setting a privacy budget epsilon, a hash function h (-) and the length L of the bloom filter, and all the positions of the initial bloom filter are 0;
a filter acquisition module for computing a bloom filter of local data forLocal data set DiCalculates the local data set D based on a hash function h (·)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
A random response processing module for performing random response processing on the local bloom filter for BF (D)iL) for each component BF (D) according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
A disturbance matrix acquisition module for integrating all disturbance bloom filters and constructing a disturbance matrixWherein N is the number of users;
a probability acquisition module to calculate a perturbation matrixThe ratio of 1 to the value of the element in the j columnAnd in accordance withTo estimate the ratio of the j column element value of 1 in the original matrix VFor each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
A computing module for pairingThe variance var is analyzed and calculated to obtain a probability threshold value
The comparison module obtains an intersection bloom filter through probability threshold comparison and initializes the intersection bloom filter with the length of L; will be provided withEach estimate of the ratio and the probability threshold ofMake a comparison ifSetting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter∩(L);
A judging module for judging the local data set DiCalculating the hash value h (x) of each element x in BF∩(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
The terminal device provided by the embodiment of the invention. The terminal device of this embodiment includes: a processor, a memory, and a computer program stored in the memory and executable on the processor. The processor realizes the steps of the above-mentioned method embodiments when executing the computer program. Alternatively, the processor implements the functions of the modules/units in the above device embodiments when executing the computer program.
The computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention.
The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory.
The processor may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc.
The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory.
The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer memory, Read-only memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. A privacy set intersection method for power grid data cross-industry sharing is characterized by comprising the following steps:
step 1: setting an initial protocol parameter, and setting a privacy budget epsilon, a hash function h (-) and the length L of a bloom filter, wherein all positions of the initial bloom filter are 0;
step 2, calculating a bloom filter of the local data, and aiming at the local data set DiCalculates the local data set D based on a hash function h (·)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
Step 3, random response processing is carried out on the local bloom filter, and BF (D) is processediL), BF (D) for each component according to a random response rulei,L)[j]Randomly overturning to obtain a disturbed bloom filter
Step 4, integrating all the disturbance bloom filters and constructing a disturbance matrixWherein N is the number of users;
step 5, calculating a disturbance matrixThe ratio of 1 to the value of the element in the j columnAnd according toTo estimate the ratio of the j column element value of 1 in the original matrix VFor each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
Step 6, calculating probability threshold value, toThe variance var is analyzed and calculated to obtain a probability threshold value
Step 7, obtaining an intersection bloom filter through probability threshold comparison, and initializing the intersection bloom filter with the length of L; will be provided withEach estimate of the ratio and the probability threshold ofMake a comparison ifSetting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter∩(L);
Step 8, local data set D is processediCalculating the hash value h (x) of each element x in BF∩(L)[h(x)]=1,Then x is the intersection element, otherwise, it is not.
2. The privacy set intersection method for power grid data cross-industry sharing according to claim 1, wherein the steps 1, 2, 3 and 8 are user work; the step 4, the step 5, the step 6 and the step 7 are the work of the server;
3. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, wherein the step 3 specifically comprises:
for BF (D)iL), the user BF (D) for each component in accordance with the random response rule as shown in equation (1)i,L)[j]Is subject to random turnover to obtain
Where epsilon is the given privacy budget cost.
4. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, further comprising converting an original matrix V into a disturbance matrixThe method comprises the following specific steps:
the probability that a certain column of elements of the original matrix V takes a value of 1 is ρ, and if there is t probability inversion, the probability that the corresponding column of elements in the disturbance matrix takes a value of 1 is as shown in formula (2):
ρ′=(1-t)ρ+t(1-ρ) (2)
5. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, wherein in the step 5, the server calculates a proportion of a disturbance matrix in which an element in a jth column takes a value of 1And according toTo estimate the ratio of the j column element value of 1 in the original matrix VHypothesis perturbation matrixThe number of the elements with the value of 1 in the jth column is N, and the number of the elements with the value of 0 is N-N; rhojThe method is characterized in that the ratio of the original value of 1 in the jth column of the original matrix V is represented, and the derivation and calculation of the step 5 comprises the following steps:
step 5.1: calculating any disturbing bloom Filter BF (D)iL), calculating the probability that the j element takes values of 0 and 1; from the given set conditions, it is easy to know that:
BF (D) for perturbing the bloom filteriL), calculating the probability that the jth element takes a value of 1 as:
correspondingly, the probability that the jth element takes a value of 0 is as follows:
step 5.2, calculating rhojA maximum likelihood estimate of;
construction of rhojLikelihood function of (d):
with respect to likelihood function L (ρ)j) Taking logarithm:
and is also provided with
Due to the fact thatTherefore, whenWhen log (L) takes the maximum value; at this time ρjEstimated valueIs composed ofThe server determines for each component j e [1, L]After the operation is executed, the probability that the value of the element in each column in the original matrix V is 1 is obtained
6. The privacy set intersection method for power grid data cross-industry sharing according to claim 2, wherein in the step 6, the privacy set intersection method is used for the power grid data cross-industry sharingThe variance var of (a) is analyzed and calculated to obtain a probability threshold valueThe derivation and calculation of step 6 includes the following stepsThe method comprises the following steps:
step 6.1-calculate the estimated ratio(iii) a desire; due to disturbance in the bloom filterCan only be 0 or 1, then:
let n be the perturbation matrixThe number of the element value in the j column is 1, which is the sum of independent and identically distributed random variables;
var(n)=Nvar(BF(Di,L)[j]) (12)
7. a privacy set submission system for cross-industry sharing of grid data, comprising:
the system comprises an initial module, a privacy module and a bloom filter, wherein the initial module is used for setting an initial protocol parameter, setting a privacy budget epsilon, a hash function h (-) and the length L of the bloom filter, and all the positions of the initial bloom filter are 0;
a filter acquisition module for computing a bloom filter for local data for a local data set DiCalculates a local data set D based on a hash function h (·) for each element x in (b)iAnd setting the value of the corresponding subscript to 1 in the bloom filter according to the obtained hash value to obtain BF (D)i,L);
A random response processing module for performing random response processing on the local bloom filter for BF (D)iL), BF (D) for each component according to a random response rulei,L)[j]Random overturning is carried out to obtain a disturbed bloom filter
A disturbance matrix acquisition module for integrating all disturbance bloom filters and constructing a disturbance matrixWherein N is the number of users;
a probability acquisition module to calculate a perturbation matrixThe ratio of 1 to the value of the element in the j columnAnd according toTo estimate the ratio of the j column element value of 1 in the original matrix VFor each component j ∈ [1, L ]]After the processing, the probability that the element value in each column in the estimated original matrix V is 1 is obtained
A computing module for pairingThe variance var is analyzed and calculated to obtain a probability threshold value
The comparison module obtains an intersection bloom filter through probability threshold comparison and initializes the intersection bloom filter with the length of L; will be provided withEach estimate of the ratio and the probability threshold ofMake a comparison ifSetting the value of the corresponding subscript in the intersection bloom filter as 1 to finally obtain the BF of the intersection bloom filter∩(L);
A judging module for judging the local data set DiCalculating the hash value h (x) of each element x in BF∩(L)[h(x)]If x is 1, x is an intersection element, otherwise, x is not.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor realizes the steps of the method according to any of claims 1-6 when executing the computer program.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210313113.1A CN114614974B (en) | 2022-03-28 | 2022-03-28 | Privacy set intersection method, system and device for power grid data cross-industry sharing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210313113.1A CN114614974B (en) | 2022-03-28 | 2022-03-28 | Privacy set intersection method, system and device for power grid data cross-industry sharing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114614974A true CN114614974A (en) | 2022-06-10 |
CN114614974B CN114614974B (en) | 2023-01-03 |
Family
ID=81867272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210313113.1A Active CN114614974B (en) | 2022-03-28 | 2022-03-28 | Privacy set intersection method, system and device for power grid data cross-industry sharing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114614974B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115242371A (en) * | 2022-06-15 | 2022-10-25 | 华中科技大学 | Method, device and system for calculating set intersection and cardinality of differential privacy protection |
CN115396144A (en) * | 2022-07-20 | 2022-11-25 | 北京冲量在线科技有限公司 | Multi-party privacy intersection scheme based on trusted execution environment and distributed data intersection algorithm |
CN117201187A (en) * | 2023-11-01 | 2023-12-08 | 国网湖北省电力有限公司武汉供电公司 | Power data secure sharing method, system and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543842A (en) * | 2018-11-02 | 2019-03-29 | 西安交通大学 | The Distribution estimation method of higher-dimension intelligent perception data with local secret protection |
US20190272388A1 (en) * | 2018-03-01 | 2019-09-05 | Etron Technology, Inc. | Data collection and analysis method and related device thereof |
CN110866263A (en) * | 2019-11-14 | 2020-03-06 | 中国科学院信息工程研究所 | User privacy information protection method and system capable of resisting longitudinal attack |
CN113094746A (en) * | 2021-03-31 | 2021-07-09 | 北京邮电大学 | High-dimensional data publishing method based on localized differential privacy and related equipment |
-
2022
- 2022-03-28 CN CN202210313113.1A patent/CN114614974B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190272388A1 (en) * | 2018-03-01 | 2019-09-05 | Etron Technology, Inc. | Data collection and analysis method and related device thereof |
CN109543842A (en) * | 2018-11-02 | 2019-03-29 | 西安交通大学 | The Distribution estimation method of higher-dimension intelligent perception data with local secret protection |
CN110866263A (en) * | 2019-11-14 | 2020-03-06 | 中国科学院信息工程研究所 | User privacy information protection method and system capable of resisting longitudinal attack |
CN113094746A (en) * | 2021-03-31 | 2021-07-09 | 北京邮电大学 | High-dimensional data publishing method based on localized differential privacy and related equipment |
Non-Patent Citations (2)
Title |
---|
张佳程 等: "大数据环境下的本地差分隐私图信息收集方法", 《信息网络安全》 * |
张啸剑 等: "基于本地差分隐私的空间范围查询方法", 《计算机研究与发展》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115242371A (en) * | 2022-06-15 | 2022-10-25 | 华中科技大学 | Method, device and system for calculating set intersection and cardinality of differential privacy protection |
CN115242371B (en) * | 2022-06-15 | 2024-04-19 | 华中科技大学 | Differential privacy-protected set intersection and base number calculation method, device and system thereof |
CN115396144A (en) * | 2022-07-20 | 2022-11-25 | 北京冲量在线科技有限公司 | Multi-party privacy intersection scheme based on trusted execution environment and distributed data intersection algorithm |
CN115396144B (en) * | 2022-07-20 | 2023-12-05 | 北京冲量在线科技有限公司 | Multiparty privacy intersection scheme based on trusted execution environment and distributed data intersection algorithm |
CN117201187A (en) * | 2023-11-01 | 2023-12-08 | 国网湖北省电力有限公司武汉供电公司 | Power data secure sharing method, system and storage medium |
CN117201187B (en) * | 2023-11-01 | 2024-01-05 | 国网湖北省电力有限公司武汉供电公司 | Power data secure sharing method, system and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114614974B (en) | 2023-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114614974B (en) | Privacy set intersection method, system and device for power grid data cross-industry sharing | |
US11399079B2 (en) | Zero-knowledge environment based networking engine | |
Dilawar et al. | Blockchain: securing internet of medical things (IoMT) | |
US11451392B2 (en) | Token-based secure data management | |
US11805105B2 (en) | System and associated method for ensuring data privacy | |
Taleb et al. | Cloud computing trends: A literature review | |
US9577829B1 (en) | Multi-party computation services | |
Nagaraju et al. | Trusted framework for online banking in public cloud using multi-factor authentication and privacy protection gateway | |
WO2024007599A1 (en) | Heterogeneous graph neural network-based method and apparatus for determining target service | |
CN106375331A (en) | Mining method and device of attacking organization | |
US20220078023A1 (en) | Private set calculation using private intersection and calculation, and applications thereof | |
Vladimirov et al. | Security and privacy protection obstacles with 3D reconstructed models of people in applications and the metaverse: A survey | |
CN117390657A (en) | Data encryption method, device, computer equipment and storage medium | |
CN113315624A (en) | Data security management method and system based on multipoint cooperation mechanism | |
Tyagi et al. | Federated learning: Applications, Security hazards and Defense measures | |
AlFaw et al. | Blockchain vulnerabilities and recent security challenges: A review paper | |
CN113239401A (en) | Big data analysis system and method based on power Internet of things and computer storage medium | |
CN110737905A (en) | Data authorization method, data authorization device and computer storage medium | |
Yuan | Towards the development of best data security for big data | |
CN115719094B (en) | Model training method, device, equipment and storage medium based on federal learning | |
Xu et al. | FedG2L: a privacy-preserving federated learning scheme base on “G2L” against poisoning attack | |
Mishra et al. | An Efficient User Protected Encryption Storage Algorithm Used in Encrypted Cloud Data | |
Tiwari et al. | Search for Articles | |
CN118690412B (en) | Data access method, device, electronic equipment and computer readable medium | |
Dubey et al. | Fuzzy logic based intelligent data sensitive security model for big data in healthcare |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |