CN110866277A - Privacy protection method for data integration of DaaS application - Google Patents
Privacy protection method for data integration of DaaS application Download PDFInfo
- Publication number
- CN110866277A CN110866277A CN201911107523.5A CN201911107523A CN110866277A CN 110866277 A CN110866277 A CN 110866277A CN 201911107523 A CN201911107523 A CN 201911107523A CN 110866277 A CN110866277 A CN 110866277A
- Authority
- CN
- China
- Prior art keywords
- data
- attribute
- cloud service
- service providers
- daas
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 230000010354 integration Effects 0.000 title claims abstract description 22
- 230000007246 mechanism Effects 0.000 claims abstract description 11
- 230000011218 segmentation Effects 0.000 claims abstract description 8
- 238000007670 refining Methods 0.000 claims 1
- 230000004927 fusion Effects 0.000 description 20
- 230000006872 improvement Effects 0.000 description 8
- 238000000638 solvent extraction Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 2
- 238000013524 data verification Methods 0.000 description 2
- 238000005315 distribution function Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012946 outsourcing Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 206010023644 Lacrimation increased Diseases 0.000 description 1
- 102000002274 Matrix Metalloproteinases Human genes 0.000 description 1
- 108010000684 Matrix Metalloproteinases Proteins 0.000 description 1
- 201000009310 astigmatism Diseases 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004317 lacrimation Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses a privacy protection method for data integration of DaaS application, which comprises the following steps that firstly, under the condition of satisfying data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the largest information gain; step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating; thirdly, for cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of the data returned by the cloud service providers by adopting a classification index tree data structure. According to the invention, through the classification index tree data structure, the cloud tenant can verify the correctness and integrity of the result set returned by the cloud service provider.
Description
Technical Field
The invention belongs to the technical field of privacy protection, and particularly relates to a privacy protection method for data integration of DaaS application.
Background
In the current business environment, data sharing among different departments, even different enterprises and different functional organizations inside an enterprise or a government organization has become a basic requirement for making decisions and providing high-quality services for users, and a plurality of data owners need to cooperate with each other to integrate data of each other to realize data sharing. There are two issues that need to be addressed in this process: (1) the storage, maintenance and statistical analysis operations on the fused data may exceed the load of the existing equipment; (2) the fused data contains richer knowledge, and an attacker can deduce the privacy data from the fused data. Therefore, in the case of data multi-source fusion, each data provider performs anonymization processing on data. Cloud computing as a novel data operation mode provides a powerful software and hardware platform for data sharing. The cloud computing system is different from a traditional computing mode taking a large-scale server as a core, the cloud computing takes the Internet and an internal private network as the core, a large-scale data center is constructed by adopting a virtualization technology, and a novel service mode of ubiquitous network information sharing, resource renting on demand and actual use charging is provided for cloud tenants. For cloud tenants, the cloud computing relieves the overhead of purchasing software/hardware once and the pressure on data storage management and maintenance.
In view of the lack of privacy protection in data encryption, researchers have proposed that privacy leakage be prevented by anonymizing sensitive data in the case of data plaintext. The k-anonymity principle proposed by Sweeney et al requires that each record in the published data table is indistinguishable from the other k-1 records. This is improved to ensure that the percentage of records associated with any one sensitive attribute value is not higher for the data in each equivalence class. l-diversity ensures that the sensitive attribute of each equivalence class has at least l different values, and t-proximity considers the distribution problem of the sensitive attribute on the basis of l-diversity and requires that the distribution of the sensitive attribute values in all equivalence classes is as close as possible to the global distribution of the attribute.
Aiming at the field of safe multiparty computation, Clifton et al provides a distributed k-anonymization algorithm, which assumes that the same record has a unique global identifier under a vertically divided data environment, each party participating in data fusion only has data with partial attributes, hides original information in a communication process by utilizing exchangeable encryption, and then constructs a complete anonymization table to judge whether anonymity threshold is met to realize data privacy protection. But the time cost of the algorithm is large, and the safe data multi-party data fusion tool aims at 4 typical operations of relational data counting, parallel, intersection and Cartesian product. Mohammed and the like realize data privacy protection of each party of data fusion by using a data generalization technology based on a classification tree structure, but the information loss of the fused data is high, and the specific information loss degree is related to a data set. A accountable computing framework is also presented that enables mutual authentication of parties to data fusion. However, these methods are too expensive to compute.
Aiming at the privacy of cloud data, an attribute blocking tree structure is designed through a complete grid, and each solid line frame in the tree structure represents a reasonable state of attribute segmentation. The data set is required to be divided and the data privacy is protected in a grouping anonymity mode through defining confidentiality limit and attribute visibility, but an attribute constraint rule set needs to be established in advance by application domain experts. A privacy protection mechanism is provided, the data is vertically divided by defining privacy constraints of an attribute set, so that the privacy of data combination cannot be leaked due to the attributes in each data block, a 3-level combination equalization concept is introduced, the probability of occurrence of various data slices in physical storage of each data block is guaranteed to be as average as possible, the data privacy of the DaaS is protected, the establishment of the attribute privacy constraint set needs guidance of field experts, and the generation, the identification and the reconstruction of the confused data need to be completed under the cooperation of a trusted third party.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the privacy protection method for data integration of the DaaS application is provided, the attribute set is divided by constructing the attribute identification set, so that the privacy is not leaked due to the attribute combination in each data block, and the cloud tenant has the capability of verifying the correctness and the integrity of the result set returned by the cloud service provider through the classification index tree data structure.
In order to achieve the purpose, the invention adopts the following technical scheme:
a privacy protection method for data integration of DaaS application comprises the following steps:
step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;
step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;
thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure.
As an improvement of the privacy protection method for data integration of DaaS application, in the first step, the data set refinement includes:
and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.
As an improvement of the privacy protection method for data integration of DaaS application, the method further includes:
uploading the fused data set to a cloud end, and handing the final control right of the data to the cloud service provider.
As an improvement of the privacy protection method for data integration of DaaS applications, in the first step, the step of detailing the data set includes:
in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.
As an improvement of the data integration privacy protection method for DaaS applications, in the third step, the partitioned privacy protection mechanism includes:
and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.
As an improvement of the privacy protection method for data integration of DaaS application, the attribute hypergraph solution includes:
when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.
As an improvement of the privacy protection method for data integration of DaaS applications, in the third step, the packet balancing method includes:
grouping equalization is performed on the partitioned data sets through generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attacks.
As an improvement of the privacy protection method for data integration of DaaS applications described in the present invention, in the third step, the classification index tree data structure includes iteratively dividing data by using an unnecessary attribute set, an important attribute set, and a core attribute set in sequence.
The method has the advantages that the method comprises the following steps that firstly, under the condition that data anonymity is met, through multi-turn cooperation among tenants, each turn adopts an attribute with the largest information gain to refine a data set; step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating; thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure. The invention divides the attribute set by constructing the attribute identification set, so that the attribute combination in each data partition can not cause privacy disclosure, and the cloud tenant can verify the correctness and integrity of the result set returned by the cloud service provider by the classification index tree data structure.
Drawings
Fig. 1 is a multi-tenant outsourcing data fusion architecture of the present invention.
FIG. 2 is a diagram of a classification index tree according to the present invention.
Detailed Description
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", horizontal ", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The present invention will be described in further detail below with reference to the accompanying drawings, but the present invention is not limited thereto.
As shown in fig. 1, a privacy protection method for data integration of DaaS application includes the following steps:
step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;
step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;
thirdly, for cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of the data returned by the cloud service providers by adopting a classification index tree data structure.
It should be noted that: aiming at multi-tenant distributed data fusion, a multi-round elaboration anonymous data protection strategy is provided, and under the condition of satisfying data anonymity, through multi-round cooperation among tenants, each round of elaboration data set with the attribute with the largest information gain is adopted, so that fused data contain more information as far as possible on the premise of finishing data privacy protection; aiming at an untrusted cloud service provider, according to a credit level set by a tenant for the untrusted cloud service provider, a two-stage privacy protection mechanism facing DaaS application is provided, for a semi-trusted cloud service provider with the credit level, a block-based privacy protection mechanism irrelevant to application is adopted, association relation among data is hidden, and value range balanced distribution of attributes is ensured by a grouping balancing mode, so that the cloud service provider is prevented from revealing data privacy of the tenant; and for the cloud service provider with completely untrusted reputation level, a classification index tree data structure is provided, and the correctness and integrity of data returned by the cloud service provider are verified.
The multi-party data fusion enables a decision maker to gracefully make a strategy on a more complete data set than before to provide a higher-quality service for a user, data owners with different information attributes perform collaborative fusion on respective data, and firstly, a data set owned by a cloud tenant is formally defined as a four-tuple T (U, A, F, Class), wherein U is a data object set, namely U { x ═ x-1,x2,...,xnEach xiReferred to as an object; a is attribute set A ═ a1,a2,...,am}; f is a set of relationships between U and A, F ═ Fk:U→Vk},VkIs akA range of values of; class is a decision attribute, and T is used for simplifying the model1(U1,A1,F1,Class1)、T2(U2,A2,F2,Class2)2 cloud tenant data fusion as an example, assuming T1、T2With the same set of records and with the sets of recorded attributes disjoint, i.e. U1=U2,Class1=Class2,
Let P be cloud tenant set P ═ { P of data fusion1,P2,...,Pn},TiAs cloud tenant PiOwned data sheet, AiIs TiAttribute set a contained in tablei={a1,a2,...,akAre multiplied byAj,T is a data table formed after n cloud tenant data are fused, whereinThe secure data outsourcing fusion must satisfy the following 3 conditions: 1) the requirement of data anonymity is met, namely each record in the fused data table cannot be distinguished from other k-1 records; 2) any cloud tenant P participating in data fusioniMore knowledge than the final fusion data table T cannot be learned in the interaction process of data fusion; 3) the cloud service provider cannot derive privacy information or statistical knowledge from the converged data sheet T.
That is, in order to safely and effectively prevent the threat model mentioned above from revealing data privacy, the data privacy protection policy should simultaneously satisfy the following three requirements, zero knowledge: the cloud service provider cannot deduce more knowledge than the fused data set T through data statistics, data background attacks and the like; data correctness and completeness: the privacy protection strategy enables the cloud tenant to have the ability of verifying the correctness and the integrity of a result set returned by the cloud service provider; high efficiency: within the framework of the data privacy protection policy, the cloud server should complete the query request of the tenant with comparable time complexity.
Preferably, in the first step, the data set refinement includes:
and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.
Preferably, the method further comprises the following steps:
and uploading the fused data set to a cloud end, and handing the final control right of the data to a cloud service provider.
Preferably, in the first step, the refinement data set includes:
in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.
The quasi-identifier is a type of sensitive information or privacy record in an identification table which can be uniquely identified by combining m attributes, any subset of the quasi-identifier cannot be uniquely identified, and QID is set as the quasi-identifier set, num (QID) in the data table Ti) Indicating the number of records having the same corresponding attribute value of the attribute contained in the ith identifier in T, and the k-anonymity requirementSo that num (QID)i) And k is more than or equal to the anonymity threshold value agreed by the tenant.
TABLE 1 symbols and their meanings
As shown in Table 1, Shared attribute set is S1And S2Wherein ID is the identifier of the record and Class is the decision/Class attribute (sensitive attribute) of the record; attribute a1、a2、a3And a4For the information attributes, age, prescription of spectacles, lacrimation of eyes and astigmatism are represented, respectively, wherein a1、a2Is S1Local attribute of a3、a4Is S2And S, and1、S2the respective data sets satisfy 2-anonymity.
Equivalence classes: on T (U, A, F, Class), forNote RB={(xi,xj)|fk(xi)=fk(xj)(ak∈B)},RBIs an equivalence class on U.
Thinning: on T (U, A, F, Class),let RB、RCIs an equivalence class on U, ifNamely RBEach division of U pairs is contained in RCIn a certain division of (1), called RBIs RCThe thickness of (1) is increased.
The multiple rounds of detailed anonymity addition algorithm are as follows: and each party of the data fusion calculates the information entropy of each attribute according to local data owned by the party and publishes the maximum entropy value for comparison, and each party selects the attribute with the maximum global entropy value in the current round. And the owner of the attribute performs subdivision on the attribute based on the data division result of the previous round, if the division result does not violate the anonymous constraint of the data, the division result is published, otherwise, the owner directly performs the next round until no attribute performance contributes to the data subdivision on the premise of meeting the anonymous constraint.
Preferably, in step three, the partitioned privacy protection mechanism includes:
and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.
Preferably, the attribute hypergraph solution comprises:
when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.
First, a related concept of data segmentation is given.
Quasi-identifier: on T (U, A, F, Class), for attribute setsSo that RB=RAAnd any proper subset of B makes the equality false, B is called the quasi-identifier of T.
Attribute identification set: t (U, A, F, Class) is an information system, and is recorded as U/RA={[xi]A|xi∈U},D([xi]A,[xj]A)={ak∈A|fk(xi)≠fk(xj) }, call D ([ x ]i]A,[xj]A) Is [ x ]i]AAnd [ x ]j]AThe attribute identification set of (2) is called D ═ D ([ x ])i]A,[xj]A)|[xi]A,[xj]A∈U/RA) The attribute identification matrix is the whole identification set, and elements in the identification set are used for distinguishing various attributes of different equivalence classes.
Attribute hypergraph: the attribute hypergraph can be defined as a binary group (V, HE), where V is a set of all attributes in the fused data table T, HE is a set of hyper-edges, and each hyper-edge represents an item of the attribute identification matrix D.
Looking up quasi-identifiers B by recognizing matrices such that RB=RAIdentifying a quasi-identifier as one in the identification matrixIn the NP problem, an attribute hypergraph resolution method is adopted, when a quasi-identifier is extracted, an attribute set in the largest public sub-edge in a hypergraph is selected as a candidate set every time, all the hypergraph with the attribute of the candidate set are deleted, iteration is carried out in the way until the hypergraph does not contain the hyper-edge, and finally all the candidate sets are subjected to Cartesian product.
Attribute division: t (U, A, F, Class) is information system, Bk(k is less than or equal to r) is an attribute minimal set, r is the total number of minimal sets, and the sum is recordedWherein C is a core attribute set, K is an important attribute set, and I is an unnecessary attribute set.
The data partitioning strategy of the invention is analyzed to meet the data privacy protection requirement.
Firstly, the proposition' if | B | is more than or equal to 2 is proved,", this is obvious; second, prove proposition "if a is the core attribute, thenxj∈U,D([xi]A,[xj]A) By the inverse method, it is assumed that { a } ", the target is represented byxj∈U,D([xi]A,[xj]A) I.e. for a e D ([ x })i]A,[xj]A),|D([xi]A,[xj]A) | is not less than 2, existTherefore, forExist ofSo that RB=RATherefore, it isSo that C is a quasi-identifier, butThis contradicts the hypothesis, and the original proposition is proved; third, prove proposition "if a is the core attribute, then RB-{a}≠RB”,,xjE is U such that fa(xi)≠fa(xj) And fb(xi)=fb(xj) Thus (x)i,xj)∈RA-{a},Namely RA-{a}≠RA. And due to RB=RA,So RB-{a}≠RBIn summary, B is a quasi-identifier, and B- { a } and a do not constitute a quasi-identifier.
Preferably, in step three, the packet equalization method includes:
grouping equalization is carried out on the partitioned data sets by generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attack.
The data partitioning strategy splits the incidence relation among data, but a cloud service provider can still reveal the privacy of tenant data by counting the distribution relation of various attribute values in different data partitions, such as table 1, and binary (v (a) is usedi) N) represents the attribute aiThe number of each value in the value field in the fusion data set T, and the cloud service provider statistical attribute a3、a4And attribute value distribution of Class (for simplicity of notation, Class is replaced by d):
a3:{{normal,2},{reduced,4},{more,2}};
a4:{{yes,2},{no,6}};
d:{{hard,3},{none,3},{soft,2}}。
according to the maximum coverage principle, the cloud service provider can get the following three rules:
a3:{reduced,4}→{d:{hard,3}|d:{none,3}},
a3:{reduced,4}→a4:{no,6},
d:{none,3}→a4:{no,6}。
and due to the attribute a4And d in one data block, the cloud service provider can obtain business secrets in the tenant data, so a (α, k) -group equalization strategy is proposed, so that attribute value domains are uniformly distributed in each data block, and the cloud service provider is prevented from revealing the privacy of the tenant data.
Probability distribution function: t (U, A, F, Class) is an information system, noteU/RB={[xi]B|xi∈U},U/Rd={[xi]d|xiE.g., U }, U/R for convenience of expressiond={d1,d2,...,drIs xi∈U,Probability distribution function muB(xi)=(D(d1/[xi]B),...,D(dr/[xi]B))。
(α, k) -group balancing, assuming that T (U, A, F, Class) satisfies k-anonymity, all non-empty subsets of the attribute set constitute M groups,presentation groupA desired value in the range of values, if pairIn any caseIs provided withAnd isT is said to satisfy (α, k) -group balance.
To satisfy data equalization, a method of inserting counterfeit data is used to make the distribution of attribute values in each data chunk satisfy a preset distribution threshold α in data partitioning based on decision attributes on the premise that each data value satisfies k-anonymity.
The data partitioning strategy of the invention is analyzed to satisfy zero knowledge to cloud service providers.
And (3) proving that: the data anonymity of the fusion data T is k (k is more than 1), the cloud service provider meets zero knowledge to the fusion data, and if and only if the data anonymity of the fusion data after the data segmentation strategy is executed is less than k; the fusion data T is divided into 3 parts, each part does not contain QID, and the probability of recombining one record by the cloud service provider is 1/k3The tenant performs grouping equalization on the partitioned data set T' by generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attack, and the probability of recombining one record by the cloud service provider is less than 1/k3(1/k31/k), the data partitioning strategy of the present invention satisfies zero-knowledge for cloud service providers.
Preferably, in step three, the classifying the index tree data structure includes iteratively partitioning the data using the unnecessary attribute set, the important attribute set, and the core attribute set in sequence.
For a completely untrusted cloud, a cloud service provider may operate only a subset of data uploaded by cloud tenants under the drive of economic benefits, a policy of segmenting an association relationship of split data by means of attributes cannot meet the requirements, and for the threat, a data verification data structure classification index tree is adopted.
Classifying the index tree: the classification index tree is a data verification tree structure (root is set as the 0 th layer) with the depth of 3, the root comprises a total data set, the data set is sequentially refined layer by layer from the root to leaf nodes according to I, K, C attribute sets of the data set as classification conditions, and each node can be regarded as a triple (B, < B)i,Index〉,Count),BiSet of categorical attributes for the layer on which the node resides, < Bi,Index〉={〈b1,Index1>,...,<bn,Indexn>In which b isi∈Bi,IndexiIs attribute biThe pointer points to the local node b in the same layeriNodes with the same attribute value, B ═ a-BiAnd | A is a total attribute set }, and Count is the number of data contained in the node.
The time complexity of the algorithm for analyzing and constructing the classification index tree is O (n), the tenant constructs the classification index tree locally before uploading the fusion data, because the cloud service provider has absolute control over the data in the cloud, the tenant cannot prevent the cloud service provider from violating the SLA, but through the classification index tree, the tenant can verify the correctness and the completeness of the data returned by the cloud service provider, the root node of the classification index tree is the generalization of the total record, the data are iteratively divided by using an unnecessary attribute set, an important attribute set and a core attribute set in sequence in the following layers to form a coarse-to-fine classification tree structure, leaf nodes contain IDs of all records meeting the limitation of routes from roots to the leaf nodes, and tenants obtain the number of searched records and the IDs of the records through a classification index tree, so that the correctness and the integrity of data returned by a cloud service provider are verified.
Variations and modifications to the above-described embodiments may also occur to those skilled in the art, which fall within the scope of the invention as disclosed and taught herein. Therefore, the present invention is not limited to the above-mentioned embodiments, and any obvious improvement, replacement or modification made by those skilled in the art based on the present invention is within the protection scope of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Claims (8)
1. A privacy protection method for data integration of DaaS application is characterized by comprising the following steps:
step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;
step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;
thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure.
2. The privacy protection method for data integration of DaaS applications as claimed in claim 1, wherein in the first step, the data set refinement includes:
and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.
3. The privacy protection method for data integration of DaaS applications as claimed in claim 2, further comprising:
uploading the fused data set to a cloud end, and handing the final control right of the data to the cloud service provider.
4. The privacy protection method for data integration of DaaS applications as claimed in claim 1, wherein in the first step, the step of refining the data set includes:
in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.
5. The method for privacy protection of data integration for DaaS applications as claimed in claim 1, wherein in step three, the partitioned privacy protection mechanism comprises:
and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.
6. The privacy preserving method for data integration of DaaS applications of claim 5, wherein the attribute hypergraph solution comprises:
when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.
7. The privacy protecting method for data integration of DaaS application as claimed in claim 1, wherein in the third step, the packet balancing manner includes:
grouping and equalizing the partitioned data sets by generating fake data, so that the cloud service provider cannot deduce more knowledge through data distribution statistical attacks.
8. The privacy protecting method for data integration of DaaS application as claimed in claim 1, wherein in the third step, the classifying index tree data structure comprises iteratively dividing data by using an unnecessary attribute set, an important attribute set and a core attribute set in sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911107523.5A CN110866277A (en) | 2019-11-13 | 2019-11-13 | Privacy protection method for data integration of DaaS application |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911107523.5A CN110866277A (en) | 2019-11-13 | 2019-11-13 | Privacy protection method for data integration of DaaS application |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110866277A true CN110866277A (en) | 2020-03-06 |
Family
ID=69653803
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911107523.5A Pending CN110866277A (en) | 2019-11-13 | 2019-11-13 | Privacy protection method for data integration of DaaS application |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110866277A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613068A (en) * | 2020-12-15 | 2021-04-06 | 国家超级计算深圳中心(深圳云计算中心) | Multiple data confusion privacy protection method and system and storage medium |
CN112765653A (en) * | 2021-01-06 | 2021-05-07 | 山财高新科技(山西)有限公司 | Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization |
CN114297714A (en) * | 2021-12-30 | 2022-04-08 | 电子科技大学广东电子信息工程研究院 | Method for data privacy protection and safe search in cloud environment |
CN114329588A (en) * | 2021-12-27 | 2022-04-12 | 电子科技大学广东电子信息工程研究院 | Multi-source fusion data privacy protection method in cloud environment |
CN116257657A (en) * | 2022-12-30 | 2023-06-13 | 北京瑞莱智慧科技有限公司 | Data processing method, data query method, related device and storage medium |
CN117313135A (en) * | 2023-08-02 | 2023-12-29 | 东莞理工学院 | Efficient reconfiguration personal privacy protection method based on attribute division |
-
2019
- 2019-11-13 CN CN201911107523.5A patent/CN110866277A/en active Pending
Non-Patent Citations (1)
Title |
---|
周志刚等: "面向DaaS 应用的数据集成隐私保护机制研究", 《通信学报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613068A (en) * | 2020-12-15 | 2021-04-06 | 国家超级计算深圳中心(深圳云计算中心) | Multiple data confusion privacy protection method and system and storage medium |
CN112613068B (en) * | 2020-12-15 | 2024-03-08 | 国家超级计算深圳中心(深圳云计算中心) | Multiple data confusion privacy protection method and system and storage medium |
CN112765653A (en) * | 2021-01-06 | 2021-05-07 | 山财高新科技(山西)有限公司 | Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization |
CN112765653B (en) * | 2021-01-06 | 2022-11-25 | 山财高新科技(山西)有限公司 | Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization |
CN114329588A (en) * | 2021-12-27 | 2022-04-12 | 电子科技大学广东电子信息工程研究院 | Multi-source fusion data privacy protection method in cloud environment |
CN114297714A (en) * | 2021-12-30 | 2022-04-08 | 电子科技大学广东电子信息工程研究院 | Method for data privacy protection and safe search in cloud environment |
CN116257657A (en) * | 2022-12-30 | 2023-06-13 | 北京瑞莱智慧科技有限公司 | Data processing method, data query method, related device and storage medium |
CN116257657B (en) * | 2022-12-30 | 2024-02-06 | 北京瑞莱智慧科技有限公司 | Data processing method, data query method, related device and storage medium |
CN117313135A (en) * | 2023-08-02 | 2023-12-29 | 东莞理工学院 | Efficient reconfiguration personal privacy protection method based on attribute division |
CN117313135B (en) * | 2023-08-02 | 2024-04-16 | 东莞理工学院 | Efficient reconfiguration personal privacy protection method based on attribute division |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110866277A (en) | Privacy protection method for data integration of DaaS application | |
CN104765848B (en) | What support result efficiently sorted in mixing cloud storage symmetrically can search for encryption method | |
Bohli et al. | Security and privacy-enhancing multicloud architectures | |
Yang et al. | Privacy-preserving computation of bayesian networks on vertically partitioned data | |
Kol et al. | Interactive distributed proofs | |
CN105164971A (en) | Verification system and method with extra security for lower-entropy input records | |
Sarfraz et al. | Dbmask: Fine-grained access control on encrypted relational databases | |
Gambs et al. | Privacy-preserving boosting | |
Sahi et al. | A Review of the State of the Art in Privacy and Security in the eHealth Cloud | |
Kerschbaum | A verifiable, centralized, coercion-free reputation system | |
JP7555349B2 (en) | System and method for providing anonymous verification of queries among multiple nodes on a network - Patents.com | |
Qu et al. | A electronic voting protocol based on blockchain and homomorphic signcryption | |
Gao et al. | Privacy threats against federated matrix factorization | |
Qiu | Ciphertext database audit technology under searchable encryption algorithm and blockchain technology | |
CN115189966A (en) | Block chain private data encryption and decryption service system | |
Yang et al. | Improved privacy-preserving Bayesian network parameter learning on vertically partitioned data | |
CN116861991A (en) | Federal decision tree training method based on random sampling and multi-layer splitting | |
CN117216786A (en) | Crowd-sourced platform statistical data on-demand sharing method based on blockchain and differential privacy | |
Ghosh et al. | Verifiable member and order queries on a list in zero-knowledge | |
Liu et al. | Distributed functional signature with function privacy and its application | |
Mehnaz et al. | Privacy-preserving multi-party analytics over arbitrarily partitioned data | |
CN114329588A (en) | Multi-source fusion data privacy protection method in cloud environment | |
CN112948864B (en) | Verifiable PPFIM method based on vertical partition database | |
Dongare et al. | Panda: Public auditing for shared data with efficient user revocation in the cloud | |
Siegenthaler et al. | Sharing private information across distributed databases |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200306 |