CN110866277A

CN110866277A - Privacy protection method for data integration of DaaS application

Info

Publication number: CN110866277A
Application number: CN201911107523.5A
Authority: CN
Inventors: 张宏莉; 周志刚; 张羽; 高阳; 王星; 于海宁; 方滨兴; 刘妙玲; 孙燕
Original assignee: Guangdong Institute Of Electronic And Information Engineering University Of Electronic Science And Technology Of China; Harbin Institute of Technology
Current assignee: Guangdong Institute Of Electronic And Information Engineering University Of Electronic Science And Technology Of China; Harbin Institute of Technology
Priority date: 2019-11-13
Filing date: 2019-11-13
Publication date: 2020-03-06

Abstract

The invention discloses a privacy protection method for data integration of DaaS application, which comprises the following steps that firstly, under the condition of satisfying data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the largest information gain; step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating; thirdly, for cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of the data returned by the cloud service providers by adopting a classification index tree data structure. According to the invention, through the classification index tree data structure, the cloud tenant can verify the correctness and integrity of the result set returned by the cloud service provider.

Description

Privacy protection method for data integration of DaaS application

Technical Field

The invention belongs to the technical field of privacy protection, and particularly relates to a privacy protection method for data integration of DaaS application.

Background

In the current business environment, data sharing among different departments, even different enterprises and different functional organizations inside an enterprise or a government organization has become a basic requirement for making decisions and providing high-quality services for users, and a plurality of data owners need to cooperate with each other to integrate data of each other to realize data sharing. There are two issues that need to be addressed in this process: (1) the storage, maintenance and statistical analysis operations on the fused data may exceed the load of the existing equipment; (2) the fused data contains richer knowledge, and an attacker can deduce the privacy data from the fused data. Therefore, in the case of data multi-source fusion, each data provider performs anonymization processing on data. Cloud computing as a novel data operation mode provides a powerful software and hardware platform for data sharing. The cloud computing system is different from a traditional computing mode taking a large-scale server as a core, the cloud computing takes the Internet and an internal private network as the core, a large-scale data center is constructed by adopting a virtualization technology, and a novel service mode of ubiquitous network information sharing, resource renting on demand and actual use charging is provided for cloud tenants. For cloud tenants, the cloud computing relieves the overhead of purchasing software/hardware once and the pressure on data storage management and maintenance.

In view of the lack of privacy protection in data encryption, researchers have proposed that privacy leakage be prevented by anonymizing sensitive data in the case of data plaintext. The k-anonymity principle proposed by Sweeney et al requires that each record in the published data table is indistinguishable from the other k-1 records. This is improved to ensure that the percentage of records associated with any one sensitive attribute value is not higher for the data in each equivalence class. l-diversity ensures that the sensitive attribute of each equivalence class has at least l different values, and t-proximity considers the distribution problem of the sensitive attribute on the basis of l-diversity and requires that the distribution of the sensitive attribute values in all equivalence classes is as close as possible to the global distribution of the attribute.

Aiming at the field of safe multiparty computation, Clifton et al provides a distributed k-anonymization algorithm, which assumes that the same record has a unique global identifier under a vertically divided data environment, each party participating in data fusion only has data with partial attributes, hides original information in a communication process by utilizing exchangeable encryption, and then constructs a complete anonymization table to judge whether anonymity threshold is met to realize data privacy protection. But the time cost of the algorithm is large, and the safe data multi-party data fusion tool aims at 4 typical operations of relational data counting, parallel, intersection and Cartesian product. Mohammed and the like realize data privacy protection of each party of data fusion by using a data generalization technology based on a classification tree structure, but the information loss of the fused data is high, and the specific information loss degree is related to a data set. A accountable computing framework is also presented that enables mutual authentication of parties to data fusion. However, these methods are too expensive to compute.

Aiming at the privacy of cloud data, an attribute blocking tree structure is designed through a complete grid, and each solid line frame in the tree structure represents a reasonable state of attribute segmentation. The data set is required to be divided and the data privacy is protected in a grouping anonymity mode through defining confidentiality limit and attribute visibility, but an attribute constraint rule set needs to be established in advance by application domain experts. A privacy protection mechanism is provided, the data is vertically divided by defining privacy constraints of an attribute set, so that the privacy of data combination cannot be leaked due to the attributes in each data block, a 3-level combination equalization concept is introduced, the probability of occurrence of various data slices in physical storage of each data block is guaranteed to be as average as possible, the data privacy of the DaaS is protected, the establishment of the attribute privacy constraint set needs guidance of field experts, and the generation, the identification and the reconstruction of the confused data need to be completed under the cooperation of a trusted third party.

Disclosure of Invention

The invention aims to: aiming at the defects of the prior art, the privacy protection method for data integration of the DaaS application is provided, the attribute set is divided by constructing the attribute identification set, so that the privacy is not leaked due to the attribute combination in each data block, and the cloud tenant has the capability of verifying the correctness and the integrity of the result set returned by the cloud service provider through the classification index tree data structure.

In order to achieve the purpose, the invention adopts the following technical scheme:

a privacy protection method for data integration of DaaS application comprises the following steps:

step one, under the condition of meeting data anonymity, through multi-round cooperation among tenants, each round adopts an attribute detailed data set with the maximum information gain;

step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating;

thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure.

As an improvement of the privacy protection method for data integration of DaaS application, in the first step, the data set refinement includes:

and finely dividing the data set by turns, wherein each turn selects the attribute with the maximum current global information gain to divide the data until the fused data set is irrevocable.

As an improvement of the privacy protection method for data integration of DaaS application, the method further includes:

uploading the fused data set to a cloud end, and handing the final control right of the data to the cloud service provider.

As an improvement of the privacy protection method for data integration of DaaS applications, in the first step, the step of detailing the data set includes:

in the local data, calculating the information entropy of each attribute and publishing the maximum entropy value for comparison, selecting the attribute with the maximum global entropy value in the current round, carrying out refinement division on the data division result of the previous round by the owner of the attribute, publishing the division result if the division result meets the anonymous constraint of the data, otherwise, directly carrying out the next round until no attribute can meet the anonymous constraint.

As an improvement of the data integration privacy protection method for DaaS applications, in the third step, the partitioned privacy protection mechanism includes:

and (4) according to different importance of the attributes to the information decision, segmenting the data set by using an attribute hypergraph resolution method.

As an improvement of the privacy protection method for data integration of DaaS application, the attribute hypergraph solution includes:

when the quasi-identifiers are extracted, the attribute set in the largest public sub-edge in the hypergraph is selected as a candidate set every time, all the hypergraph edges containing the attributes of the candidate set are deleted until the hypergraph does not contain the hyperedges, and finally all the candidate sets are subjected to Cartesian product.

As an improvement of the privacy protection method for data integration of DaaS applications, in the third step, the packet balancing method includes:

grouping equalization is performed on the partitioned data sets through generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attacks.

As an improvement of the privacy protection method for data integration of DaaS applications described in the present invention, in the third step, the classification index tree data structure includes iteratively dividing data by using an unnecessary attribute set, an important attribute set, and a core attribute set in sequence.

The method has the advantages that the method comprises the following steps that firstly, under the condition that data anonymity is met, through multi-turn cooperation among tenants, each turn adopts an attribute with the largest information gain to refine a data set; step two, setting the credit rating of the cloud service providers, and dividing the cloud service providers according to the credit rating; thirdly, for the cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of the tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of data returned by the cloud service providers by adopting a classification index tree data structure. The invention divides the attribute set by constructing the attribute identification set, so that the attribute combination in each data partition can not cause privacy disclosure, and the cloud tenant can verify the correctness and integrity of the result set returned by the cloud service provider by the classification index tree data structure.

Drawings

Fig. 1 is a multi-tenant outsourcing data fusion architecture of the present invention.

FIG. 2 is a diagram of a classification index tree according to the present invention.

Detailed Description

As used in the specification and in the claims, certain terms are used to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. "substantially" means within an acceptable error range, and a person skilled in the art can solve the technical problem within a certain error range to substantially achieve the technical effect.

In the description of the present invention, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", horizontal ", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.

In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

The present invention will be described in further detail below with reference to the accompanying drawings, but the present invention is not limited thereto.

As shown in fig. 1, a privacy protection method for data integration of DaaS application includes the following steps:

thirdly, for cloud service providers lower than the preset credit rating, hiding the incidence relation among the data by adopting a privacy protection mechanism based on segmentation, ensuring the value range balanced distribution of the attributes in a grouping balancing mode, and preventing the cloud service providers from revealing the data privacy of tenants; and for the cloud service providers with the reputation levels higher than the preset reputation level, verifying the correctness and the integrity of the data returned by the cloud service providers by adopting a classification index tree data structure.

It should be noted that: aiming at multi-tenant distributed data fusion, a multi-round elaboration anonymous data protection strategy is provided, and under the condition of satisfying data anonymity, through multi-round cooperation among tenants, each round of elaboration data set with the attribute with the largest information gain is adopted, so that fused data contain more information as far as possible on the premise of finishing data privacy protection; aiming at an untrusted cloud service provider, according to a credit level set by a tenant for the untrusted cloud service provider, a two-stage privacy protection mechanism facing DaaS application is provided, for a semi-trusted cloud service provider with the credit level, a block-based privacy protection mechanism irrelevant to application is adopted, association relation among data is hidden, and value range balanced distribution of attributes is ensured by a grouping balancing mode, so that the cloud service provider is prevented from revealing data privacy of the tenant; and for the cloud service provider with completely untrusted reputation level, a classification index tree data structure is provided, and the correctness and integrity of data returned by the cloud service provider are verified.

The multi-party data fusion enables a decision maker to gracefully make a strategy on a more complete data set than before to provide a higher-quality service for a user, data owners with different information attributes perform collaborative fusion on respective data, and firstly, a data set owned by a cloud tenant is formally defined as a four-tuple T (U, A, F, Class), wherein U is a data object set, namely U { x ═ x-₁,x₂,...,x_nEach x_iReferred to as an object; a is attribute set A ═ a₁,a₂,...,a_m}; f is a set of relationships between U and A, F ═ F_k:U→V_k}，V_kIs a_kA range of values of; class is a decision attribute, and T is used for simplifying the model₁(U₁,A₁,F₁,Class₁)、T₂(U₂,A₂,F₂,Class₂)2 cloud tenant data fusion as an example, assuming T₁、T₂With the same set of records and with the sets of recorded attributes disjoint, i.e. U₁＝U₂,Class₁＝Class₂,

Let P be cloud tenant set P ═ { P of data fusion₁,P₂,...,P_n}，T_iAs cloud tenant P_iOwned data sheet, A_iIs T_iAttribute set a contained in table_i＝{a₁,a₂,...,a_kAre multiplied by

A_j，

T is a data table formed after n cloud tenant data are fused, wherein

The secure data outsourcing fusion must satisfy the following 3 conditions: 1) the requirement of data anonymity is met, namely each record in the fused data table cannot be distinguished from other k-1 records; 2) any cloud tenant P participating in data fusion_iMore knowledge than the final fusion data table T cannot be learned in the interaction process of data fusion; 3) the cloud service provider cannot derive privacy information or statistical knowledge from the converged data sheet T.

That is, in order to safely and effectively prevent the threat model mentioned above from revealing data privacy, the data privacy protection policy should simultaneously satisfy the following three requirements, zero knowledge: the cloud service provider cannot deduce more knowledge than the fused data set T through data statistics, data background attacks and the like; data correctness and completeness: the privacy protection strategy enables the cloud tenant to have the ability of verifying the correctness and the integrity of a result set returned by the cloud service provider; high efficiency: within the framework of the data privacy protection policy, the cloud server should complete the query request of the tenant with comparable time complexity.

Preferably, in the first step, the data set refinement includes:

Preferably, the method further comprises the following steps:

and uploading the fused data set to a cloud end, and handing the final control right of the data to a cloud service provider.

Preferably, in the first step, the refinement data set includes:

The quasi-identifier is a type of sensitive information or privacy record in an identification table which can be uniquely identified by combining m attributes, any subset of the quasi-identifier cannot be uniquely identified, and QID is set as the quasi-identifier set, num (QID) in the data table T_i) Indicating the number of records having the same corresponding attribute value of the attribute contained in the ith identifier in T, and the k-anonymity requirement

So that num (QID)_i) And k is more than or equal to the anonymity threshold value agreed by the tenant.

TABLE 1 symbols and their meanings

As shown in Table 1, Shared attribute set is S₁And S₂Wherein ID is the identifier of the record and Class is the decision/Class attribute (sensitive attribute) of the record; attribute a₁、a₂、a₃And a₄For the information attributes, age, prescription of spectacles, lacrimation of eyes and astigmatism are represented, respectively, wherein a₁、a₂Is S₁Local attribute of a₃、a₄Is S₂And S, and₁、S₂the respective data sets satisfy 2-anonymity.

Equivalence classes: on T (U, A, F, Class), for

Note R_B＝{(x_i,x_j)|f_k(x_i)＝f_k(x_j)(a_k∈B)}，R_BIs an equivalence class on U.

Thinning: on T (U, A, F, Class),

let R_B、R_CIs an equivalence class on U, if

Namely R_BEach division of U pairs is contained in R_CIn a certain division of (1), called R_BIs R_CThe thickness of (1) is increased.

The multiple rounds of detailed anonymity addition algorithm are as follows: and each party of the data fusion calculates the information entropy of each attribute according to local data owned by the party and publishes the maximum entropy value for comparison, and each party selects the attribute with the maximum global entropy value in the current round. And the owner of the attribute performs subdivision on the attribute based on the data division result of the previous round, if the division result does not violate the anonymous constraint of the data, the division result is published, otherwise, the owner directly performs the next round until no attribute performance contributes to the data subdivision on the premise of meeting the anonymous constraint.

Preferably, in step three, the partitioned privacy protection mechanism includes:

Preferably, the attribute hypergraph solution comprises:

First, a related concept of data segmentation is given.

Quasi-identifier: on T (U, A, F, Class), for attribute sets

So that R_B＝R_AAnd any proper subset of B makes the equality false, B is called the quasi-identifier of T.

Attribute identification set: t (U, A, F, Class) is an information system, and is recorded as U/R_A＝{[x_i]_A|x_i∈U},D([x_i]_A,[x_j]_A)＝{a_k∈A|f_k(x_i)≠f_k(x_j) }, call D ([ x ]_i]_A,[x_j]_A) Is [ x ]_i]_AAnd [ x ]_j]_AThe attribute identification set of (2) is called D ═ D ([ x ])_i]_A,[x_j]_A)|[x_i]_A,[x_j]_A∈U/R_A) The attribute identification matrix is the whole identification set, and elements in the identification set are used for distinguishing various attributes of different equivalence classes.

Attribute hypergraph: the attribute hypergraph can be defined as a binary group (V, HE), where V is a set of all attributes in the fused data table T, HE is a set of hyper-edges, and each hyper-edge represents an item of the attribute identification matrix D.

Looking up quasi-identifiers B by recognizing matrices such that R_B＝R_AIdentifying a quasi-identifier as one in the identification matrixIn the NP problem, an attribute hypergraph resolution method is adopted, when a quasi-identifier is extracted, an attribute set in the largest public sub-edge in a hypergraph is selected as a candidate set every time, all the hypergraph with the attribute of the candidate set are deleted, iteration is carried out in the way until the hypergraph does not contain the hyper-edge, and finally all the candidate sets are subjected to Cartesian product.

Attribute division: t (U, A, F, Class) is information system, B_k(k is less than or equal to r) is an attribute minimal set, r is the total number of minimal sets, and the sum is recorded

Wherein C is a core attribute set, K is an important attribute set, and I is an unnecessary attribute set.

The data partitioning strategy of the invention is analyzed to meet the data privacy protection requirement.

Firstly, the proposition' if | B | is more than or equal to 2 is proved,

", this is obvious; second, prove proposition "if a is the core attribute, then

x_j∈U，D([x_i]_A,[x_j]_A) By the inverse method, it is assumed that { a } ", the target is represented by

x_j∈U，D([x_i]_A,[x_j]_A) I.e. for a e D ([ x })_i]_A,[x_j]_A)，|D([x_i]_A,[x_j]_A) | is not less than 2, exist

Therefore, for

Exist of

So that R_B＝R_ATherefore, it is

So that C is a quasi-identifier, but

This contradicts the hypothesis, and the original proposition is proved; third, prove proposition "if a is the core attribute, then R_B-{a}≠R_B”，，

x_jE is U such that f_a(x_i)≠f_a(x_j) And f_b(x_i)＝f_b(x_j) Thus (x)_i,x_j)∈R_A-{a}，

Namely R_A-{a}≠R_A. And due to R_B＝R_A，

So R_B-{a}≠R_BIn summary, B is a quasi-identifier, and B- { a } and a do not constitute a quasi-identifier.

Preferably, in step three, the packet equalization method includes:

grouping equalization is carried out on the partitioned data sets by generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attack.

The data partitioning strategy splits the incidence relation among data, but a cloud service provider can still reveal the privacy of tenant data by counting the distribution relation of various attribute values in different data partitions, such as table 1, and binary (v (a) is used_i) N) represents the attribute a_iThe number of each value in the value field in the fusion data set T, and the cloud service provider statistical attribute a₃、a₄And attribute value distribution of Class (for simplicity of notation, Class is replaced by d):

a₃:{{normal,2},{reduced,4},{more,2}}；

a₄:{{yes,2},{no,6}}；

d:{{hard,3},{none,3},{soft,2}}。

according to the maximum coverage principle, the cloud service provider can get the following three rules:

a₃:{reduced,4}→{d:{hard,3}|d:{none,3}},

a₃:{reduced,4}→a₄:{no,6},

d:{none,3}→a₄:{no,6}。

and due to the attribute a₄And d in one data block, the cloud service provider can obtain business secrets in the tenant data, so a (α, k) -group equalization strategy is proposed, so that attribute value domains are uniformly distributed in each data block, and the cloud service provider is prevented from revealing the privacy of the tenant data.

Probability distribution function: t (U, A, F, Class) is an information system, note

U/R_B＝{[x_i]_B|x_i∈U}，U/R_d＝{[x_i]_d|x_iE.g., U }, U/R for convenience of expression_d＝{d₁,d₂,...,d_rIs x_i∈U，

Probability distribution function mu_B(x_i)＝(D(d₁/[x_i]_B),...,D(d_r/[x_i]_B))。

(α, k) -group balancing, assuming that T (U, A, F, Class) satisfies k-anonymity, all non-empty subsets of the attribute set constitute M groups,

presentation group

A desired value in the range of values, if pairIn any case

Is provided with

And is

T is said to satisfy (α, k) -group balance.

To satisfy data equalization, a method of inserting counterfeit data is used to make the distribution of attribute values in each data chunk satisfy a preset distribution threshold α in data partitioning based on decision attributes on the premise that each data value satisfies k-anonymity.

The data partitioning strategy of the invention is analyzed to satisfy zero knowledge to cloud service providers.

And (3) proving that: the data anonymity of the fusion data T is k (k is more than 1), the cloud service provider meets zero knowledge to the fusion data, and if and only if the data anonymity of the fusion data after the data segmentation strategy is executed is less than k; the fusion data T is divided into 3 parts, each part does not contain QID, and the probability of recombining one record by the cloud service provider is 1/k³The tenant performs grouping equalization on the partitioned data set T' by generating fake data, so that a cloud service provider cannot deduce more knowledge through data distribution statistical attack, and the probability of recombining one record by the cloud service provider is less than 1/k³(1/k³1/k), the data partitioning strategy of the present invention satisfies zero-knowledge for cloud service providers.

Preferably, in step three, the classifying the index tree data structure includes iteratively partitioning the data using the unnecessary attribute set, the important attribute set, and the core attribute set in sequence.

For a completely untrusted cloud, a cloud service provider may operate only a subset of data uploaded by cloud tenants under the drive of economic benefits, a policy of segmenting an association relationship of split data by means of attributes cannot meet the requirements, and for the threat, a data verification data structure classification index tree is adopted.

Classifying the index tree: the classification index tree is a data verification tree structure (root is set as the 0 th layer) with the depth of 3, the root comprises a total data set, the data set is sequentially refined layer by layer from the root to leaf nodes according to I, K, C attribute sets of the data set as classification conditions, and each node can be regarded as a triple (B, < B)_i,Index〉,Count)，B_iSet of categorical attributes for the layer on which the node resides, < B_i,Index〉＝{〈b₁,Index₁>,...,<b_n,Index_n>In which b is_i∈B_i，Index_iIs attribute b_iThe pointer points to the local node b in the same layer_iNodes with the same attribute value, B ═ a-B_iAnd | A is a total attribute set }, and Count is the number of data contained in the node.

The time complexity of the algorithm for analyzing and constructing the classification index tree is O (n), the tenant constructs the classification index tree locally before uploading the fusion data, because the cloud service provider has absolute control over the data in the cloud, the tenant cannot prevent the cloud service provider from violating the SLA, but through the classification index tree, the tenant can verify the correctness and the completeness of the data returned by the cloud service provider, the root node of the classification index tree is the generalization of the total record, the data are iteratively divided by using an unnecessary attribute set, an important attribute set and a core attribute set in sequence in the following layers to form a coarse-to-fine classification tree structure, leaf nodes contain IDs of all records meeting the limitation of routes from roots to the leaf nodes, and tenants obtain the number of searched records and the IDs of the records through a classification index tree, so that the correctness and the integrity of data returned by a cloud service provider are verified.

Variations and modifications to the above-described embodiments may also occur to those skilled in the art, which fall within the scope of the invention as disclosed and taught herein. Therefore, the present invention is not limited to the above-mentioned embodiments, and any obvious improvement, replacement or modification made by those skilled in the art based on the present invention is within the protection scope of the present invention. Furthermore, although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

1. A privacy protection method for data integration of DaaS application is characterized by comprising the following steps:

2. The privacy protection method for data integration of DaaS applications as claimed in claim 1, wherein in the first step, the data set refinement includes:

3. The privacy protection method for data integration of DaaS applications as claimed in claim 2, further comprising:

4. The privacy protection method for data integration of DaaS applications as claimed in claim 1, wherein in the first step, the step of refining the data set includes:

5. The method for privacy protection of data integration for DaaS applications as claimed in claim 1, wherein in step three, the partitioned privacy protection mechanism comprises:

6. The privacy preserving method for data integration of DaaS applications of claim 5, wherein the attribute hypergraph solution comprises:

7. The privacy protecting method for data integration of DaaS application as claimed in claim 1, wherein in the third step, the packet balancing manner includes:

grouping and equalizing the partitioned data sets by generating fake data, so that the cloud service provider cannot deduce more knowledge through data distribution statistical attacks.

8. The privacy protecting method for data integration of DaaS application as claimed in claim 1, wherein in the third step, the classifying index tree data structure comprises iteratively dividing data by using an unnecessary attribute set, an important attribute set and a core attribute set in sequence.