CN110866277A - Privacy protection method for data integration of DaaS application - Google Patents

Privacy protection method for data integration of DaaS application Download PDF

Info

Publication number
CN110866277A
CN110866277A CN201911107523.5A CN201911107523A CN110866277A CN 110866277 A CN110866277 A CN 110866277A CN 201911107523 A CN201911107523 A CN 201911107523A CN 110866277 A CN110866277 A CN 110866277A
Authority
CN
China
Prior art keywords
data
attribute
cloud service
service providers
daas
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911107523.5A
Other languages
Chinese (zh)
Inventor
张宏莉
周志刚
张羽
高阳
王星
于海宁
方滨兴
刘妙玲
孙燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Institute Of Electronic And Information Engineering University Of Electronic Science And Technology Of China
Harbin Institute of Technology
Original Assignee
Guangdong Institute Of Electronic And Information Engineering University Of Electronic Science And Technology Of China
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Institute Of Electronic And Information Engineering University Of Electronic Science And Technology Of China, Harbin Institute of Technology filed Critical Guangdong Institute Of Electronic And Information Engineering University Of Electronic Science And Technology Of China
Priority to CN201911107523.5A priority Critical patent/CN110866277A/en
Publication of CN110866277A publication Critical patent/CN110866277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a privacy protection method for data integration of DaaS application, which comprises the following steps that firstly, under the condition of satisfying data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the largest information gain; step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating; thirdly, for cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of the data returned by the cloud service providers by adopting a classification index tree data structure. According to the invention, through the classification index tree data structure, the cloud tenant can verify the correctness and integrity of the result set returned by the cloud service provider.

Description

Privacy protection method for data integration of DaaS application
Technical Field
The invention belongs to the technical field of privacy protection, and particularly relates to a privacy protection method for data integration of DaaS application.
Background
In the current business environment, data sharing among different departments, even different enterprises and different functional organizations inside an enterprise or a government organization has become a basic requirement for making decisions and providing high-quality services for users, and a plurality of data owners need to cooperate with each other to integrate data of each other to realize data sharing. There are two issues that need to be addressed in this process: (1) the storage, maintenance and statistical analysis operations on the fused data may exceed the load of the existing equipment; (2) the fused data contains richer knowledge, and an attacker can deduce the privacy data from the fused data. Therefore, in the case of data multi-source fusion, each data provider performs anonymization processing on data. Cloud computing as a novel data operation mode provides a powerful software and hardware platform for data sharing. The cloud computing system is different from a traditional computing mode taking a large-scale server as a core, the cloud computing takes the Internet and an internal private network as the core, a large-scale data center is constructed by adopting a virtualization technology, and a novel service mode of ubiquitous network information sharing, resource renting on demand and actual use charging is provided for cloud tenants. For cloud tenants, the cloud computing relieves the overhead of purchasing software/hardware once and the pressure on data storage management and maintenance.
In view of the lack of privacy protection in data encryption, researchers have proposed that privacy leakage be prevented by anonymizing sensitive data in the case of data plaintext. The k-anonymity principle proposed by Sweeney et al requires that each record in the published data table is indistinguishable from the other k-1 records. This is improved to ensure that the percentage of records associated with any one sensitive attribute value is not higher for the data in each equivalence class. l-diversity ensures that the sensitive attribute of each equivalence class has at least l different values, and t-proximity considers the distribution problem of the sensitive attribute on the basis of l-diversity and requires that the distribution of the sensitive attribute values in all equivalence classes is as close as possible to the global distribution of the attribute.
Aiming at the field of safe multiparty computation, Clifton et al provides a distributed k-anonymization algorithm, which assumes that the same record has a unique global identifier under a vertically divided data environment, each party participating in data fusion only has data with partial attributes, hides original information in a communication process by utilizing exchangeable encryption, and then constructs a complete anonymization table to judge whether anonymity threshold is met to realize data privacy protection. But the time cost of the algorithm is large, and the safe data multi-party data fusion tool aims at 4 typical operations of relational data counting, parallel, intersection and Cartesian product. Mohammed and the like realize data privacy protection of each party of data fusion by using a data generalization technology based on a classification tree structure, but the information loss of the fused data is high, and the specific information loss degree is related to a data set. A accountable computing framework is also presented that enables mutual authentication of parties to data fusion. However, these methods are too expensive to compute.
Aiming at the privacy of cloud data, an attribute blocking tree structure is designed through a complete grid, and each solid line frame in the tree structure represents a reasonable state of attribute segmentation. The data set is required to be divided and the data privacy is protected in a grouping anonymity mode through defining confidentiality limit and attribute visibility, but an attribute constraint rule set needs to be established in advance by application domain experts. A privacy protection mechanism is provided, the data is vertically divided by defining privacy constraints of an attribute set, so that the privacy of data combination cannot be leaked due to the attributes in each data block, a 3-level combination equalization concept is introduced, the probability of occurrence of various data slices in physical storage of each data block is guaranteed to be as average as possible, the data privacy of the DaaS is protected, the establishment of the attribute privacy constraint set needs guidance of field experts, and the generation, the identification and the reconstruction of the confused data need to be completed under the cooperation of a trusted third party.
Disclosure of Invention
The invention aims to: aiming at the defects of the prior art, the privacy protection method for data integration of the DaaS application is provided, the attribute set is divided by constructing the attribute identification set, so that the privacy is not leaked due to the attribute combination in each data block, and the cloud tenant has the capability of verifying the correctness and the integrity of the result set returned by the cloud service provider through the classification index tree data structure.
In order to achieve the purpose, the invention adopts the following technical scheme:
a privacy protection method for data integration of DaaS application comprises the following steps:
step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;
step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;
thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure.
As an improvement of the privacy protection method for data integration of DaaS application, in the first step, the data set refinement includes:
and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.
As an improvement of the privacy protection method for data integration of DaaS application, the method further includes:
uploading the fused data set to a cloud end, and handing the final control right of the data to the cloud service provider.
As an improvement of the privacy protection method for data integration of DaaS applications, in the first step, the step of detailing the data set includes:
in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.
As an improvement of the data integration privacy protection method for DaaS applications, in the third step, the partitioned privacy protection mechanism includes:
and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.
As an improvement of the privacy protection method for data integration of DaaS application, the attribute hypergraph solution includes:
when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.
As an improvement of the privacy protection method for data integration of DaaS applications, in the third step, the packet balancing method includes:
grouping equalization is performed on the partitioned data sets through generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attacks.
As an improvement of the privacy protection method for data integration of DaaS applications described in the present invention, in the third step, the classification index tree data structure includes iteratively dividing data by using an unnecessary attribute set, an important attribute set, and a core attribute set in sequence.
The method has the advantages that the method comprises the following steps that firstly, under the condition that data anonymity is met, through multi-turn cooperation among tenants, each turn adopts an attribute with the largest information gain to refine a data set; step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating; thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure. The invention divides the attribute set by constructing the attribute identification set, so that the attribute combination in each data partition can not cause privacy disclosure, and the cloud tenant can verify the correctness and integrity of the result set returned by the cloud service provider by the classification index tree data structure.
Drawings
Fig. 1 is a multi-tenant outsourcing data fusion architecture of the present invention.
FIG. 2 is a diagram of a classification index tree according to the present invention.
Detailed Description
As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect.
In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", horizontal ", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
The present invention will be described in further detail below with reference to the accompanying drawings, but the present invention is not limited thereto.
As shown in fig. 1, a privacy protection method for data integration of DaaS application includes the following steps:
step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;
step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;
thirdly, for cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of the data returned by the cloud service providers by adopting a classification index tree data structure.
It should be noted that: aiming at multi-tenant distributed data fusion, a multi-round elaboration anonymous data protection strategy is provided, and under the condition of satisfying data anonymity, through multi-round cooperation among tenants, each round of elaboration data set with the attribute with the largest information gain is adopted, so that fused data contain more information as far as possible on the premise of finishing data privacy protection; aiming at an untrusted cloud service provider, according to a credit level set by a tenant for the untrusted cloud service provider, a two-stage privacy protection mechanism facing DaaS application is provided, for a semi-trusted cloud service provider with the credit level, a block-based privacy protection mechanism irrelevant to application is adopted, association relation among data is hidden, and value range balanced distribution of attributes is ensured by a grouping balancing mode, so that the cloud service provider is prevented from revealing data privacy of the tenant; and for the cloud service provider with completely untrusted reputation level, a classification index tree data structure is provided, and the correctness and integrity of data returned by the cloud service provider are verified.
The multi-party data fusion enables a decision maker to gracefully make a strategy on a more complete data set than before to provide a higher-quality service for a user, data owners with different information attributes perform collaborative fusion on respective data, and firstly, a data set owned by a cloud tenant is formally defined as a four-tuple T (U, A, F, Class), wherein U is a data object set, namely U { x ═ x-1,x2,...,xnEach xiReferred to as an object; a is attribute set A ═ a1,a2,...,am}; f is a set of relationships between U and A, F ═ Fk:U→Vk},VkIs akA range of values of; class is a decision attribute, and T is used for simplifying the model1(U1,A1,F1,Class1)、T2(U2,A2,F2,Class2)2 cloud tenant data fusion as an example, assuming T1、T2With the same set of records and with the sets of recorded attributes disjoint, i.e. U1=U2,Class1=Class2,
Figure BDA0002271764270000061
Let P be cloud tenant set P ═ { P of data fusion1,P2,...,Pn},TiAs cloud tenant PiOwned data sheet, AiIs TiAttribute set a contained in tablei={a1,a2,...,akAre multiplied by
Figure BDA0002271764270000062
Aj
Figure BDA0002271764270000064
T is a data table formed after n cloud tenant data are fused, wherein
Figure BDA0002271764270000063
The secure data outsourcing fusion must satisfy the following 3 conditions: 1) the requirement of data anonymity is met, namely each record in the fused data table cannot be distinguished from other k-1 records; 2) any cloud tenant P participating in data fusioniMore knowledge than the final fusion data table T cannot be learned in the interaction process of data fusion; 3) the cloud service provider cannot derive privacy information or statistical knowledge from the converged data sheet T.
That is, in order to safely and effectively prevent the threat model mentioned above from revealing data privacy, the data privacy protection policy should simultaneously satisfy the following three requirements, zero knowledge: the cloud service provider cannot deduce more knowledge than the fused data set T through data statistics, data background attacks and the like; data correctness and completeness: the privacy protection strategy enables the cloud tenant to have the ability of verifying the correctness and the integrity of a result set returned by the cloud service provider; high efficiency: within the framework of the data privacy protection policy, the cloud server should complete the query request of the tenant with comparable time complexity.
Preferably, in the first step, the data set refinement includes:
and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.
Preferably, the method further comprises the following steps:
and uploading the fused data set to a cloud end, and handing the final control right of the data to a cloud service provider.
Preferably, in the first step, the refinement data set includes:
in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.
The quasi-identifier is a type of sensitive information or privacy record in an identification table which can be uniquely identified by combining m attributes, any subset of the quasi-identifier cannot be uniquely identified, and QID is set as the quasi-identifier set, num (QID) in the data table Ti) Indicating the number of records having the same corresponding attribute value of the attribute contained in the ith identifier in T, and the k-anonymity requirement
Figure BDA0002271764270000071
So that num (QID)i) And k is more than or equal to the anonymity threshold value agreed by the tenant.
TABLE 1 symbols and their meanings
Figure BDA0002271764270000081
As shown in Table 1, Shared attribute set is S1And S2Wherein ID is the identifier of the record and Class is the decision/Class attribute (sensitive attribute) of the record; attribute a1、a2、a3And a4For the information attributes, age, prescription of spectacles, lacrimation of eyes and astigmatism are represented, respectively, wherein a1、a2Is S1Local attribute of a3、a4Is S2And S, and1、S2the respective data sets satisfy 2-anonymity.
Equivalence classes: on T (U, A, F, Class), for
Figure BDA0002271764270000082
Note RB={(xi,xj)|fk(xi)=fk(xj)(ak∈B)},RBIs an equivalence class on U.
Thinning: on T (U, A, F, Class),
Figure BDA0002271764270000085
let RB、RCIs an equivalence class on U, if
Figure BDA0002271764270000084
Namely RBEach division of U pairs is contained in RCIn a certain division of (1), called RBIs RCThe thickness of (1) is increased.
The multiple rounds of detailed anonymity addition algorithm are as follows: and each party of the data fusion calculates the information entropy of each attribute according to local data owned by the party and publishes the maximum entropy value for comparison, and each party selects the attribute with the maximum global entropy value in the current round. And the owner of the attribute performs subdivision on the attribute based on the data division result of the previous round, if the division result does not violate the anonymous constraint of the data, the division result is published, otherwise, the owner directly performs the next round until no attribute performance contributes to the data subdivision on the premise of meeting the anonymous constraint.
Preferably, in step three, the partitioned privacy protection mechanism includes:
and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.
Preferably, the attribute hypergraph solution comprises:
when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.
First, a related concept of data segmentation is given.
Quasi-identifier: on T (U, A, F, Class), for attribute sets
Figure BDA0002271764270000091
So that RB=RAAnd any proper subset of B makes the equality false, B is called the quasi-identifier of T.
Attribute identification set: t (U, A, F, Class) is an information system, and is recorded as U/RA={[xi]A|xi∈U},D([xi]A,[xj]A)={ak∈A|fk(xi)≠fk(xj) }, call D ([ x ]i]A,[xj]A) Is [ x ]i]AAnd [ x ]j]AThe attribute identification set of (2) is called D ═ D ([ x ])i]A,[xj]A)|[xi]A,[xj]A∈U/RA) The attribute identification matrix is the whole identification set, and elements in the identification set are used for distinguishing various attributes of different equivalence classes.
Attribute hypergraph: the attribute hypergraph can be defined as a binary group (V, HE), where V is a set of all attributes in the fused data table T, HE is a set of hyper-edges, and each hyper-edge represents an item of the attribute identification matrix D.
Looking up quasi-identifiers B by recognizing matrices such that RB=RAIdentifying a quasi-identifier as one in the identification matrixIn the NP problem, an attribute hypergraph resolution method is adopted, when a quasi-identifier is extracted, an attribute set in the largest public sub-edge in a hypergraph is selected as a candidate set every time, all the hypergraph with the attribute of the candidate set are deleted, iteration is carried out in the way until the hypergraph does not contain the hyper-edge, and finally all the candidate sets are subjected to Cartesian product.
Attribute division: t (U, A, F, Class) is information system, Bk(k is less than or equal to r) is an attribute minimal set, r is the total number of minimal sets, and the sum is recorded
Figure BDA0002271764270000101
Wherein C is a core attribute set, K is an important attribute set, and I is an unnecessary attribute set.
The data partitioning strategy of the invention is analyzed to meet the data privacy protection requirement.
Firstly, the proposition' if | B | is more than or equal to 2 is proved,
Figure BDA0002271764270000102
", this is obvious; second, prove proposition "if a is the core attribute, then
Figure BDA0002271764270000103
xj∈U,D([xi]A,[xj]A) By the inverse method, it is assumed that { a } ", the target is represented by
Figure BDA0002271764270000104
xj∈U,D([xi]A,[xj]A) I.e. for a e D ([ x })i]A,[xj]A),|D([xi]A,[xj]A) | is not less than 2, exist
Figure BDA0002271764270000105
Therefore, for
Figure BDA0002271764270000106
Exist of
Figure BDA0002271764270000107
So that RB=RATherefore, it is
Figure BDA0002271764270000108
So that C is a quasi-identifier, but
Figure BDA0002271764270000109
This contradicts the hypothesis, and the original proposition is proved; third, prove proposition "if a is the core attribute, then RB-{a}≠RB”,,
Figure BDA00022717642700001010
xjE is U such that fa(xi)≠fa(xj) And fb(xi)=fb(xj) Thus (x)i,xj)∈RA-{a}
Figure BDA00022717642700001011
Namely RA-{a}≠RA. And due to RB=RA
Figure BDA00022717642700001012
So RB-{a}≠RBIn summary, B is a quasi-identifier, and B- { a } and a do not constitute a quasi-identifier.
Preferably, in step three, the packet equalization method includes:
grouping equalization is carried out on the partitioned data sets by generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attack.
The data partitioning strategy splits the incidence relation among data, but a cloud service provider can still reveal the privacy of tenant data by counting the distribution relation of various attribute values in different data partitions, such as table 1, and binary (v (a) is usedi) N) represents the attribute aiThe number of each value in the value field in the fusion data set T, and the cloud service provider statistical attribute a3、a4And attribute value distribution of Class (for simplicity of notation, Class is replaced by d):
a3:{{normal,2},{reduced,4},{more,2}};
a4:{{yes,2},{no,6}};
d:{{hard,3},{none,3},{soft,2}}。
according to the maximum coverage principle, the cloud service provider can get the following three rules:
a3:{reduced,4}→{d:{hard,3}|d:{none,3}},
a3:{reduced,4}→a4:{no,6},
d:{none,3}→a4:{no,6}。
and due to the attribute a4And d in one data block, the cloud service provider can obtain business secrets in the tenant data, so a (α, k) -group equalization strategy is proposed, so that attribute value domains are uniformly distributed in each data block, and the cloud service provider is prevented from revealing the privacy of the tenant data.
Probability distribution function: t (U, A, F, Class) is an information system, note
Figure BDA0002271764270000111
U/RB={[xi]B|xi∈U},U/Rd={[xi]d|xiE.g., U }, U/R for convenience of expressiond={d1,d2,...,drIs xi∈U,
Figure BDA0002271764270000112
Probability distribution function muB(xi)=(D(d1/[xi]B),...,D(dr/[xi]B))。
(α, k) -group balancing, assuming that T (U, A, F, Class) satisfies k-anonymity, all non-empty subsets of the attribute set constitute M groups,
Figure BDA0002271764270000113
presentation group
Figure BDA0002271764270000114
A desired value in the range of values, if pairIn any case
Figure BDA0002271764270000115
Is provided with
Figure BDA0002271764270000116
And is
Figure BDA0002271764270000117
T is said to satisfy (α, k) -group balance.
To satisfy data equalization, a method of inserting counterfeit data is used to make the distribution of attribute values in each data chunk satisfy a preset distribution threshold α in data partitioning based on decision attributes on the premise that each data value satisfies k-anonymity.
The data partitioning strategy of the invention is analyzed to satisfy zero knowledge to cloud service providers.
And (3) proving that: the data anonymity of the fusion data T is k (k is more than 1), the cloud service provider meets zero knowledge to the fusion data, and if and only if the data anonymity of the fusion data after the data segmentation strategy is executed is less than k; the fusion data T is divided into 3 parts, each part does not contain QID, and the probability of recombining one record by the cloud service provider is 1/k3The tenant performs grouping equalization on the partitioned data set T' by generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attack, and the probability of recombining one record by the cloud service provider is less than 1/k3(1/k31/k), the data partitioning strategy of the present invention satisfies zero-knowledge for cloud service providers.
Preferably, in step three, the classifying the index tree data structure includes iteratively partitioning the data using the unnecessary attribute set, the important attribute set, and the core attribute set in sequence.
For a completely untrusted cloud, a cloud service provider may operate only a subset of data uploaded by cloud tenants under the drive of economic benefits, a policy of segmenting an association relationship of split data by means of attributes cannot meet the requirements, and for the threat, a data verification data structure classification index tree is adopted.
Classifying the index tree: the classification index tree is a data verification tree structure (root is set as the 0 th layer) with the depth of 3, the root comprises a total data set, the data set is sequentially refined layer by layer from the root to leaf nodes according to I, K, C attribute sets of the data set as classification conditions, and each node can be regarded as a triple (B, < B)i,Index〉,Count),BiSet of categorical attributes for the layer on which the node resides, < Bi,Index〉={〈b1,Index1>,...,<bn,Indexn>In which b isi∈Bi,IndexiIs attribute biThe pointer points to the local node b in the same layeriNodes with the same attribute value, B ═ a-BiAnd | A is a total attribute set }, and Count is the number of data contained in the node.
The time complexity of the algorithm for analyzing and constructing the classification index tree is O (n), the tenant constructs the classification index tree locally before uploading the fusion data, because the cloud service provider has absolute control over the data in the cloud, the tenant cannot prevent the cloud service provider from violating the SLA, but through the classification index tree, the tenant can verify the correctness and the completeness of the data returned by the cloud service provider, the root node of the classification index tree is the generalization of the total record, the data are iteratively divided by using an unnecessary attribute set, an important attribute set and a core attribute set in sequence in the following layers to form a coarse-to-fine classification tree structure, leaf nodes contain IDs of all records meeting the limitation of routes from roots to the leaf nodes, and tenants obtain the number of searched records and the IDs of the records through a classification index tree, so that the correctness and the integrity of data returned by a cloud service provider are verified.
Variations and modifications to the above-described embodiments may also occur to those skilled in the art, which fall within the scope of the invention as disclosed and taught herein. Therefore, the present invention is not limited to the above-mentioned embodiments, and any obvious improvement, replacement or modification made by those skilled in the art based on the present invention is within the protection scope of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (8)

1. A privacy protection method for data integration of DaaS application is characterized by comprising the following steps:
step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;
step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;
thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure.
2. The privacy protection method for data integration of DaaS applications as claimed in claim 1, wherein in the first step, the data set refinement includes:
and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.
3. The privacy protection method for data integration of DaaS applications as claimed in claim 2, further comprising:
uploading the fused data set to a cloud end, and handing the final control right of the data to the cloud service provider.
4. The privacy protection method for data integration of DaaS applications as claimed in claim 1, wherein in the first step, the step of refining the data set includes:
in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.
5. The method for privacy protection of data integration for DaaS applications as claimed in claim 1, wherein in step three, the partitioned privacy protection mechanism comprises:
and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.
6. The privacy preserving method for data integration of DaaS applications of claim 5, wherein the attribute hypergraph solution comprises:
when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.
7. The privacy protecting method for data integration of DaaS application as claimed in claim 1, wherein in the third step, the packet balancing manner includes:
grouping and equalizing the partitioned data sets by generating fake data, so that the cloud service provider cannot deduce more knowledge through data distribution statistical attacks.
8. The privacy protecting method for data integration of DaaS application as claimed in claim 1, wherein in the third step, the classifying index tree data structure comprises iteratively dividing data by using an unnecessary attribute set, an important attribute set and a core attribute set in sequence.
CN201911107523.5A 2019-11-13 2019-11-13 Privacy protection method for data integration of DaaS application Pending CN110866277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911107523.5A CN110866277A (en) 2019-11-13 2019-11-13 Privacy protection method for data integration of DaaS application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911107523.5A CN110866277A (en) 2019-11-13 2019-11-13 Privacy protection method for data integration of DaaS application

Publications (1)

Publication Number Publication Date
CN110866277A true CN110866277A (en) 2020-03-06

Family

ID=69653803

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911107523.5A Pending CN110866277A (en) 2019-11-13 2019-11-13 Privacy protection method for data integration of DaaS application

Country Status (1)

Country Link
CN (1) CN110866277A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613068A (en) * 2020-12-15 2021-04-06 国家超级计算深圳中心(深圳云计算中心) Multiple data confusion privacy protection method and system and storage medium
CN112765653A (en) * 2021-01-06 2021-05-07 山财高新科技(山西)有限公司 Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization
CN114297714A (en) * 2021-12-30 2022-04-08 电子科技大学广东电子信息工程研究院 Method for data privacy protection and safe search in cloud environment
CN114329588A (en) * 2021-12-27 2022-04-12 电子科技大学广东电子信息工程研究院 Multi-source fusion data privacy protection method in cloud environment
CN116257657A (en) * 2022-12-30 2023-06-13 北京瑞莱智慧科技有限公司 Data processing method, data query method, related device and storage medium
CN117313135A (en) * 2023-08-02 2023-12-29 东莞理工学院 Efficient reconfiguration personal privacy protection method based on attribute division

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周志刚等: "面向DaaS 应用的数据集成隐私保护机制研究", 《通信学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613068A (en) * 2020-12-15 2021-04-06 国家超级计算深圳中心(深圳云计算中心) Multiple data confusion privacy protection method and system and storage medium
CN112613068B (en) * 2020-12-15 2024-03-08 国家超级计算深圳中心(深圳云计算中心) Multiple data confusion privacy protection method and system and storage medium
CN112765653A (en) * 2021-01-06 2021-05-07 山财高新科技(山西)有限公司 Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization
CN112765653B (en) * 2021-01-06 2022-11-25 山财高新科技(山西)有限公司 Multi-source data fusion privacy protection method based on multi-privacy policy combination optimization
CN114329588A (en) * 2021-12-27 2022-04-12 电子科技大学广东电子信息工程研究院 Multi-source fusion data privacy protection method in cloud environment
CN114297714A (en) * 2021-12-30 2022-04-08 电子科技大学广东电子信息工程研究院 Method for data privacy protection and safe search in cloud environment
CN116257657A (en) * 2022-12-30 2023-06-13 北京瑞莱智慧科技有限公司 Data processing method, data query method, related device and storage medium
CN116257657B (en) * 2022-12-30 2024-02-06 北京瑞莱智慧科技有限公司 Data processing method, data query method, related device and storage medium
CN117313135A (en) * 2023-08-02 2023-12-29 东莞理工学院 Efficient reconfiguration personal privacy protection method based on attribute division
CN117313135B (en) * 2023-08-02 2024-04-16 东莞理工学院 Efficient reconfiguration personal privacy protection method based on attribute division

Similar Documents

Publication Publication Date Title
CN110866277A (en) Privacy protection method for data integration of DaaS application
CN104765848B (en) What support result efficiently sorted in mixing cloud storage symmetrically can search for encryption method
Bohli et al. Security and privacy-enhancing multicloud architectures
Yang et al. Privacy-preserving computation of bayesian networks on vertically partitioned data
Kol et al. Interactive distributed proofs
CN105164971A (en) Verification system and method with extra security for lower-entropy input records
Sarfraz et al. Dbmask: Fine-grained access control on encrypted relational databases
Gambs et al. Privacy-preserving boosting
Sahi et al. A Review of the State of the Art in Privacy and Security in the eHealth Cloud
Kerschbaum A verifiable, centralized, coercion-free reputation system
JP7555349B2 (en) System and method for providing anonymous verification of queries among multiple nodes on a network - Patents.com
Qu et al. A electronic voting protocol based on blockchain and homomorphic signcryption
Gao et al. Privacy threats against federated matrix factorization
Qiu Ciphertext database audit technology under searchable encryption algorithm and blockchain technology
CN115189966A (en) Block chain private data encryption and decryption service system
Yang et al. Improved privacy-preserving Bayesian network parameter learning on vertically partitioned data
CN116861991A (en) Federal decision tree training method based on random sampling and multi-layer splitting
CN117216786A (en) Crowd-sourced platform statistical data on-demand sharing method based on blockchain and differential privacy
Ghosh et al. Verifiable member and order queries on a list in zero-knowledge
Liu et al. Distributed functional signature with function privacy and its application
Mehnaz et al. Privacy-preserving multi-party analytics over arbitrarily partitioned data
CN114329588A (en) Multi-source fusion data privacy protection method in cloud environment
CN112948864B (en) Verifiable PPFIM method based on vertical partition database
Dongare et al. Panda: Public auditing for shared data with efficient user revocation in the cloud
Siegenthaler et al. Sharing private information across distributed databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200306