CN109446844A - A kind of method for secret protection and system towards big data publication - Google Patents
A kind of method for secret protection and system towards big data publication Download PDFInfo
- Publication number
- CN109446844A CN109446844A CN201811356234.4A CN201811356234A CN109446844A CN 109446844 A CN109446844 A CN 109446844A CN 201811356234 A CN201811356234 A CN 201811356234A CN 109446844 A CN109446844 A CN 109446844A
- Authority
- CN
- China
- Prior art keywords
- value
- data
- scheme
- anonymization
- privacy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004224 protection Effects 0.000 title claims abstract description 86
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000012545 processing Methods 0.000 claims abstract description 87
- 238000009826 distribution Methods 0.000 claims description 16
- 239000004744 fabric Substances 0.000 claims description 14
- 238000004364 calculation method Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 4
- 241000208340 Araliaceae Species 0.000 claims description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 3
- 238000013500 data storage Methods 0.000 claims description 3
- 235000008434 ginseng Nutrition 0.000 claims description 3
- 238000000151 deposition Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 13
- 238000013503 de-identification Methods 0.000 abstract description 10
- 238000011156 evaluation Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 12
- 230000008859 change Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 238000007418 data mining Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 230000002633 protecting effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000003862 health status Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000007115 recruitment Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method for secret protection and system towards big data publication, this method retrieves data according to user data range of needs first;Then user security grade is determined according to user identity and data use, and thereby determines that corresponding anonymization scheme and initial Privacy parameter;The quality of data of secret protection requirement and user side further according to data providing requires to carry out secret protection effect assessment.The adjusting parameter if being unsatisfactory for requiring;Project setting is carried out if parameter adjustment is invalid.Every time after the completion of adjustment, secret protection effect assessment need to be re-started.Evaluation then carries out secret protection processing to the data to be released retrieved according to selected de-identification method and parameter after passing through, and is formed and final issues data.Most suitable de-identification method and privacy parameters can be chosen using the present invention, data not only can achieve secret protection effect desired by data set provider so that treated, but also can satisfy requirement of the user side to availability of data.
Description
Technical field
The present invention relates to field of information security technology, in particular to it is a kind of towards big data publication method for secret protection and
System.
Background technique
In data trade and data sharing field, how under the premise of not revealing privacy to need data person to provide number
According to becoming a stubborn problem.In order to solve this problem, industry proposes many solution data publication secret protection skills
Art.The difference of common secret protection technical basis implementation method is roughly divided into following a few classes: data transfer device, data anonymous
Change method, multi-party computations method and mixed method.Wherein, de-identification method because its significant safety and validity and
It has obtained more being widely applied.
Foremost algorithm is k-anonymity algorithm in de-identification method.1998, Sweeney et al. was proposed first
K-anonymity algorithm, the algorithm can effectively prevent link from attacking, i.e., attacker is the case where grasping common data
Under, by being matched with certain attributes in announced record, to know privacy information.K-anonymity algorithm is wanted
It asks and the record in data set is divided into several equivalence classes, in each equivalence class, the attribute that may release privacy information all has
There is identical value, and at least k item records in each equivalence class, the probability of link attack so is just not more than 1/k.
Machanavajjhala et al. proposed l-diversity algorithm in 2006, and l-diversity algorithm requires each equivalence
At least contain l Sensitive Attributes value in class.Subsequent Li Ning hui proposes t-closeness algorithm, it requires quick in grouping
The distributional difference in the distribution and tables of data of attribute value is felt no more than threshold value t.
Although the above method realizes data publication secret protection to a certain extent, also suffer from certain drawbacks.
If k-anonymity algorithm does not do any constraint to Sensitive Attributes data, when one grouping in it is all record possess it is same
When Sensitive Attributes value, sensitive information can be uniquely determined, therefore attacker can obtain privacy information, i.e. k- easily
Anonymity algorithm loses caused by data and changes smaller, but secret protection degree is lower.L-diversity algorithm is protected
Sensitive Attributes value different in each equivalence class is demonstrate,proved more than or equal to l, but when some great feelings of class value accounting in grouping
Under condition, be inferred to the value be sensitive information a possibility that it is very big, will lead to privacy leakage, the i.e. privacy of l-diversity algorithm
Degree of protection is higher than k-anonymity algorithm, but the information loss caused by data is also above k-anonymity algorithm.t-
Closeness algorithm requires the distribution in grouping in the distribution and tables of data of Sensitive Attributes value approximate, solves l-diversity
The problems of algorithm, but since t-closeness secret protection requires stringenter, the quality of data and other two methods
Compared to being more difficult to reach the requirement of user, limitation is larger, therefore the data for meeting t-closeness constraint are difficult to use in data
It excavates, the application such as data analysis, but because it is most strong to the degree of protection of privacy, is suitably applied the higher data publication field of risk
Scape.
It can be seen that the advantages of above-mentioned three kinds of de-identification methods have oneself and limitation, every kind of de-identification method it is hidden
Private parameter also affects data-privacy protecting effect and the quality of data.And in the actual environment, user is according to using data purpose
Difference, can data be proposed with different secret protection requirements, and the sensitivity of data type is also not quite similar, only lean on one
Kind de-identification method is difficult to meet the secret protection demand of the data of a variety of data uses.Therefore, for different secret protections
How demand scientific and rational chooses most appropriate method, and the privacy by being automatically found optimal parameter to guarantee data
There is no actual applicable achievements for the research of protecting effect.
Summary of the invention
The object of the present invention is to provide a kind of method for secret protection and system towards big data publication, can choose and most close
Suitable de-identification method and privacy parameters, so that treated, data both can achieve secret protection desired by data set provider
Effect, and can satisfy the availability requirement of data consumer.
To achieve the above object, the present invention provides following schemes:
A kind of method for secret protection towards big data publication, the method for secret protection include:
Step 101: obtaining identity information, data requirements range and the data description of use of user;
Step 102: user security grade being determined according to the identity information of the user and data description of use, according to described
The data requirements range retrieval user requested data of user;
Step 103: according to the user security grade and security level and the anonymization processing scheme table of comparisons, determination is hidden
Nameization scheme and the corresponding initial privacy parameters value of the anonymization scheme;The anonymization scheme includes directly providing inspection
Rope data processing scheme, k-anonymity processing scheme, l-diversity processing scheme and the processing side t-closeness
Case;The corresponding privacy parameters value of the k-anonymity processing scheme is k value;The l-diversity processing scheme is corresponding
Privacy parameters value is l value;The corresponding privacy parameters value of the t-closeness processing scheme is t value;
Step 104: according to the corresponding initial privacy parameters value of the anonymization scheme and the anonymization scheme
Determine privacy leakage probability and data quality value;
Step 105: judging whether the privacy leakage probability is less than maximum privacy leakage threshold value and the data quality value
Whether it is greater than quality of data threshold value, obtains the first judging result;Wherein, the maximum privacy leakage threshold value is mentioned by data providing
For the quality of data threshold value is provided by user side;
Step 106: if first judging result indicates that the privacy leakage probability is less than the maximum privacy leakage threshold
It is worth and the data quality value is greater than the quality of data threshold value, then the maximum privacy is less than using the privacy leakage probability
It reveals threshold value and the data quality value is greater than anonymization scheme and anonymization scheme pair corresponding to the quality of data threshold value
The privacy parameters value answered carries out secret protection processing to the user requested data that retrieves, obtains sending out cloth data, and to
Cloth data are sent out described in user's publication;
Step 107: if first judging result indicates that the privacy leakage probability is more than or equal to the maximum privacy and lets out
Reveal threshold value or the data quality value is less than or equal to the quality of data threshold value, then judges that the anonymization scheme is corresponding hidden
Whether private parameter value can be adjusted, and obtain the second judging result;
Step 108: if second judging result indicates the anonymization scheme, corresponding privacy parameters value can be carried out
Adjustment, then adjust the corresponding privacy parameters value of the anonymization scheme, and return step 104, according to the anonymization scheme with
And the anonymization scheme privacy parameters value adjusted redefines privacy leakage probability and data quality value;
Step 109: if the corresponding privacy parameters value of the anonymization scheme cannot be adjusted, judging the anonymity
Whether change scheme can be adjusted, and obtain third judging result;
Step 110: if the third judging result indicates that the anonymization scheme can be adjusted, adjusting anonymization
Scheme, and return step 104, according to anonymization scheme adjusted and the corresponding privacy parameters of anonymization scheme adjusted
Value redefines privacy leakage probability and data quality value;
Step 111: if the third judging result indicates that the anonymization scheme cannot be adjusted, described in reduction
Quality of data threshold value, and return step 104, the Rule of judgment until meeting step 105 stop.
Optionally, the data requirements range retrieval user requested data according to the user, specifically includes:
Using SQL query statement, the data for meeting user data range of needs are extracted, and from source database with tables of data
Form storage.
Optionally, the security level and the anonymization processing scheme table of comparisons include four kinds of corresponding relationships, are respectively as follows: when peace
Anonymization scheme is directly to provide retrieval data processing scheme when congruent grade is 0, and when security level is 1, anonymization scheme is k-
Anonymity processing scheme, when security level is 2, anonymization scheme is l-diversity processing scheme, when security level is
Anonymization scheme is t-closeness processing scheme when 3.
Optionally, described before judging whether the corresponding privacy parameters value of the anonymization scheme can be adjusted
Big data issues method for secret protection further include:
Determine the value range of the corresponding privacy parameters value of the anonymization scheme.
Optionally, the value range of the corresponding privacy parameters value of the determination anonymization scheme, specifically includes:
Determine k value set, l value set and t value set;The k value set is in the k-anonymity processing scheme
The value range of k value parameter, the l value set are the value range of l value parameter in the l-diversity processing scheme, institute
State the value range that t value set is t value parameter in the t-closeness processing scheme.
Optionally, the determining k value set, specifically includes:
According to the maximum privacy leakage threshold value, the minimum value in the k value set is calculated;
Determine quality of data threshold value, and the maximum value in the k value set according to the quality of data threshold calculations;It is described
Quality of data threshold value is determined under the premise of tables of data meets the maximum privacy leakage threshold value;The tables of data is retrieval number
According to storage table;
According to the maximum value in the minimum value and the k value set in the k value set, k value set is determined.
Optionally, the determining l value set, specifically includes:
The maximum value of the l value set is determined according to comentropy system;
Determine the least equivalence class of Sensitive Attributes classification in the tables of data retrieved, and the Sensitive Attributes classification is minimum
The equivalence class Sensitive Attributes classification number that is included be determined as the minimum value of the l value set;The tables of data is retrieval data
Storage table;
According to the maximum value in the minimum value and the l value set in the l value set, l value set is determined.
Optionally, the determining t value set, specifically includes:
The tables of data is adjusted, so that tables of data adjusted meets the maximum privacy leakage threshold value and the data matter
Threshold value is measured, and determines the equivalence class set of tables of data adjusted;
T value set is determined according to the equivalence class set;The t value set is all DiThe set of value, DiIt indicates i-th
The distribution of Sensitive Attributes value is at a distance from overall situation distribution in equivalence class.
Optionally, described to judge whether the corresponding privacy parameters value of the anonymization scheme be adjusted, it is specific to wrap
It includes:
According to the value range of the corresponding privacy parameters value of the anonymization scheme, opened from the minimum value of the value range
Begin, successively carried out according to sequence from small to large, is adjusted to be more than described when the corresponding privacy parameters value of the anonymization scheme
After the maximum value of value range, the adjustment of the corresponding privacy parameters value of the anonymization scheme is no longer carried out.
A kind of intimacy protection system towards big data publication, the intimacy protection system include:
User profile acquisition module, for obtaining identity information, data requirements range and the data description of use of user;
User security level determination module, for determining user according to the identity information and data description of use of the user
Security level;
User requested data determining module, for the data requirements range retrieval user requested data according to the user;
Anonymization scheme and the corresponding privacy parameters determining module of anonymization scheme, for according to described user security etc.
Grade and security level and the anonymization processing scheme table of comparisons determine that anonymization scheme and the anonymization scheme are corresponding just
The privacy parameters value of beginning;The anonymization scheme includes directly providing retrieval data processing scheme, the processing side k-anonymity
Case, l-diversity processing scheme and t-closeness processing scheme;The k-anonymity processing scheme is corresponding hidden
Private parameter value is k value;The corresponding privacy parameters value of the l-diversity processing scheme is l value;At the t-closeness
The corresponding privacy parameters value of reason scheme is t value;
Privacy leakage probability and data quality value determining module, for according to the anonymization scheme and the anonymization
The corresponding initial privacy parameters value of scheme determines privacy leakage probability and data quality value;
First judging result obtains module, for judging whether the privacy leakage probability is less than maximum privacy leakage threshold value
And whether the data quality value is greater than quality of data threshold value, obtains the first judging result;Wherein, the maximum privacy leakage threshold
Value is provided by data providing, and the quality of data threshold value is provided by user side;
Cloth data publication module is sent out, for indicating the privacy leakage probability less than described when first judging result
It is small using the privacy leakage probability when maximum privacy leakage threshold value and the data quality value are greater than the quality of data threshold value
In the maximum privacy leakage threshold value and the data quality value is greater than anonymization scheme corresponding to the quality of data threshold value
Privacy parameters value corresponding with anonymization scheme carries out secret protection processing to the user requested data retrieved, is intended
Data are issued, and send out cloth data to described in user's publication;
Second judging result obtains module, for indicating that the privacy leakage probability is greater than when first judging result
When the maximum privacy leakage threshold value or the data quality value are less than or equal to the quality of data threshold value, hide described in judgement
Whether the corresponding privacy parameters value of nameization scheme can be adjusted, and obtain the second judging result;
Privacy parameters adjust module, for indicating the corresponding privacy ginseng of the anonymization scheme when second judging result
When numerical value can be adjusted, the corresponding privacy parameters value of the anonymization scheme is adjusted, and return to privacy leakage probability sum number
According to mass value determining module;
Third judging result obtains module, for that cannot be adjusted when the corresponding privacy parameters value of the anonymization scheme
When whole, judge whether the anonymization scheme can be adjusted, obtains third judging result;
Anonymization project setting module, for indicating that the anonymization scheme can be adjusted when the third judging result
When whole, anonymization scheme is adjusted, and return to privacy leakage probability and data quality value determining module;
Quality of data threshold value reduces module, for indicating that the anonymization scheme cannot be into when the third judging result
When row adjustment, the quality of data threshold value is reduced, and return to privacy leakage probability and data quality value determining module.
The specific embodiment provided according to the present invention, the invention discloses following technical effects:
The present invention provides a kind of method for secret protection and system towards big data publication, and this method is according to user demand
User demand data are retrieved, anonymization scheme and the corresponding privacy parameters of anonymization scheme are determined according to user security grade,
And privacy leakage probability and data quality value are determined using anonymization scheme and the corresponding privacy parameters of anonymization scheme;Then
Judge whether privacy leakage threshold value is less than maximum privacy leakage threshold value and whether data quality value is greater than quality of data threshold value, if
It then directlys adopt anonymization scheme and anonymization scheme corresponding privacy parameters and privacy is carried out to the user demand data retrieved
Data are protected and issue secret protection treated, anonymization scheme and the corresponding privacy parameters of anonymization scheme are otherwise adjusted,
Stop after meeting condition.With the application of the invention, most suitable de-identification method and privacy parameters can be chosen, so that after processing
Data not only can achieve secret protection effect desired by data set provider, but also can satisfy the availability of data consumer and want
It asks.
Detailed description of the invention
It in order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, below will be to institute in embodiment
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also obtain according to these attached drawings
Obtain other attached drawings.
Fig. 1 is the flow diagram that the embodiment of the present invention issues method for secret protection based on the big data of anonymization;
Fig. 2 is the structural schematic diagram for the intimacy protection system that the embodiment of the present invention is issued towards big data;
Fig. 3 is the structural schematic diagram that big data of the present invention issues secret protection platform;
Fig. 4 is the flow diagram of k value choosing method in k-anonymity algorithm of the present invention;
Fig. 5 is the flow diagram of l value choosing method in k-anonymity algorithm of the present invention;
Fig. 6 is the flow diagram of t value choosing method in t-closeness algorithm of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
The object of the present invention is to provide a kind of method for secret protection and system towards big data publication, can choose and most close
Suitable de-identification method and privacy parameters, so that treated, data both can achieve secret protection desired by data set provider
Effect, and can satisfy the availability requirement of data consumer.
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real
Applying mode, the present invention is described in further detail.
Term is explained
Anonymization: Fuzzy processing is carried out to achieve the purpose that secret protection to data.
Aobvious identity property: the attribute of the energy single individual of unique identification, such as ID card No., name, phone.
Quasi- identity property: from the attribute of different aspect mark individual, such as date of birth, gender, address, quasi- identity property
It joins together, may determine some individual.
Sensitive Attributes: the attribute comprising privacy information, such as health status, disease, income.
It is extensive: to be the typical method for realizing anonymous systems.Its main thought is the precision that attribute value is known by reducing fiducial mark,
So that the number for knowing the tuple of attribute value in tables of data with identical fiducial mark increases.
Equivalence class/grouping: knowing the set of the tuple of attribute value with identical fiducial mark, and in an equivalence class, attacker is logical
Quasi- identity property is crossed to know that the probability of individual identity or sensitive information can be greatly reduced.
K-anonymity algorithm: k anonymization algorithm.
L-diversity algorithm: l diversity algorithm.
T-closeness algorithm: t close algorithm.
Fig. 1 is the flow diagram that the embodiment of the present invention issues method for secret protection based on the big data of anonymization, such as Fig. 1
Shown, the big data publication method for secret protection provided in an embodiment of the present invention based on anonymization specifically includes following step
Suddenly.
Step 101: obtaining identity information, data requirements range and the data description of use of user.
Step 102: user security grade being determined according to the identity information of the user and data description of use, according to described
The data requirements range retrieval user requested data of user.
In the present invention, using SQL query statement, the data for meeting user data range of needs are extracted from source database
And it is stored in the form of tables of data.
Step 103: according to the user security grade and security level and the anonymization processing scheme table of comparisons, determination is hidden
Nameization scheme and the corresponding initial privacy parameters value of the anonymization scheme;The anonymization scheme includes directly providing inspection
Rope data processing scheme, k-anonymity processing scheme, l-diversity processing scheme and the processing side t-closeness
Case;The corresponding privacy parameters value of the k-anonymity processing scheme is k value;The l-diversity processing scheme is corresponding
Privacy parameters value is l value;The corresponding privacy parameters value of the t-closeness processing scheme is t value.
Step 104: according to the corresponding initial privacy parameters value of the anonymization scheme and the anonymization scheme
Determine privacy leakage probability and data quality value.
Followed by progress secret protection effect assessment.
Step 105: judging whether the privacy leakage probability is less than maximum privacy leakage threshold value and the data quality value
Whether it is greater than quality of data threshold value, obtains the first judging result;If first judging result indicates the privacy leakage probability
Less than the maximum privacy leakage threshold value and the data quality value is greater than the quality of data threshold value, thens follow the steps 106;If
First judging result indicates that the privacy leakage probability is more than or equal to the maximum privacy leakage threshold value or the data
Mass value is less than or equal to the quality of data threshold value, thens follow the steps 107;Wherein, the maximum privacy leakage threshold value is by data
Provider provides, and the quality of data threshold value is provided by user side.
Step 106: the maximum privacy leakage threshold value and the data quality value are less than using the privacy leakage probability
Greater than anonymization scheme corresponding to the quality of data threshold value and the corresponding privacy parameters value of anonymization scheme to retrieving
The user requested data carries out secret protection processing, obtains sending out cloth data, and send out cloth data to described in user's publication.
Step 107: judging whether the corresponding privacy parameters value of the anonymization scheme can be adjusted, obtain second and sentence
Disconnected result;If second judging result indicates the anonymization scheme, corresponding privacy parameters value can be adjusted, and be held
Row step 108;If the corresponding privacy parameters value of the anonymization scheme cannot be adjusted, 109 are thened follow the steps.
Before judging whether the corresponding privacy parameters of the anonymization scheme can be adjusted, first have to described in determination
The value range of the corresponding privacy parameters value of anonymization scheme determines k value set, l value set and t value set;The k value
Collection is combined into the value range of k value parameter in the k-anonymity processing scheme, and the l value set is the l-diversity
The value range of l value parameter in processing scheme, the t value set are that t value parameter takes in the t-closeness processing scheme
It is worth range.
Wherein it is determined that k value set, specifically includes:
According to the maximum privacy leakage threshold value, the minimum value in the k value set is calculated.
Determine quality of data threshold value, and the maximum value in the k value set according to the quality of data threshold calculations;It is described
Quality of data threshold value is determined under the premise of tables of data meets the maximum privacy leakage threshold value;The tables of data is retrieval number
According to storage table.
According to the maximum value in the minimum value and the k value set in the k value set, k value set is determined.
It determines l value set, specifically includes:
The maximum value of the l value set is determined according to comentropy system.
Determine the least equivalence class of Sensitive Attributes classification in the tables of data retrieved, and the Sensitive Attributes classification is minimum
The equivalence class Sensitive Attributes classification number that is included be determined as the minimum value of the l value set;The tables of data is retrieval data
Storage table.
According to the maximum value in the minimum value and the l value set in the l value set, l value set is determined.
T value set, specifically includes:
The tables of data is adjusted, so that tables of data adjusted meets the maximum privacy leakage threshold value and the data matter
Threshold value is measured, and determines the equivalence class set of tables of data adjusted.
T value set is determined according to the equivalence class set;The t value set is DiThe set of all values, DiIt indicates i-th
The distribution of Sensitive Attributes value is at a distance from overall situation distribution in equivalence class.
Step 107 specifically includes: according to the value range of the corresponding privacy parameters value of the anonymization scheme, taking from described
The minimum value of value range starts, and successively carries out according to sequence from small to large, when the corresponding privacy parameters of the anonymization scheme
After value is adjusted to the maximum value more than the value range, the tune of the corresponding privacy parameters value of the anonymization scheme is no longer carried out
It is whole.
Step 108: the corresponding privacy parameters value of the anonymization scheme, and return step 104 are adjusted, according to the anonymity
Change scheme and anonymization scheme privacy parameters value adjusted redefine privacy leakage probability and data quality value.
Step 109: judging whether the anonymization scheme can be adjusted, obtain third judging result;If described
Three judging results indicate that the anonymization scheme can be adjusted, and then follow the steps 110;If the third judging result indicates
The anonymization scheme cannot be adjusted, and then follow the steps 111.
Step 110: adjustment anonymization scheme, and return step 104, after anonymization scheme adjusted and adjustment
The corresponding privacy parameters value of anonymization scheme redefine privacy leakage probability and data quality value.
Step 111: reducing the quality of data threshold value, and return step 104, the Rule of judgment until meeting step 105
Stop.
In actual data publication scene, according to data volume required by data providing and user side and data item,
Method and safety to data mining is different, and subsequent anonymization processing scheme also can be different.It is calculated in k-anonymity
Method, l-diversity algorithm, in these three classical anonymization algorithms of t-closeness algorithm, secret protection effect is also
Incremental, but the extent of the destruction of data will also be gradually increased.K-anonymity algorithm is minimum to the change of data, and data can
It is best with property, but privacy effect is worst;The secret protection effect of t-closeness algorithm is best but very big to data change, can
With property and worst;The data change and availability of l-diversity algorithm are in middle reaches.
All privacy concerns are able to solve currently without an omnipotent anonymization algorithm, each anonymization algorithm is all
There are the advantage and disadvantage of oneself, it can only be by weighing the advantages and disadvantages, thering is house to have guarantor.So selecting appropriate anonymization for different data purposes
Algorithm is very necessary.Table 1 as determines data security levels according to user identity and data use, and at corresponding selection anonymization
The table of comparisons of reason scheme.
1 security level of table and the anonymization processing scheme table of comparisons
It can be seen that the big data publication method for secret protection provided in an embodiment of the present invention based on anonymization can be for not
Same service object, different data purposes provide controlled data access, can better meet user side availability of data and
The secret protection requirement of data providing.
To achieve the above object, the present invention also provides a kind of intimacy protection systems towards big data publication.
Fig. 2 is the structural schematic diagram for the intimacy protection system that the embodiment of the present invention is issued towards big data, as shown in Fig. 2,
The big data provided by the invention issues intimacy protection system
User profile acquisition module 1, for obtaining identity information, data requirements range and the data description of use of user.
User security level determination module 2 is used for being determined according to the identity information and data description of use of the user
Family security level.
User requested data determining module 3, for the data requirements range retrieval user requested data according to the user.
Anonymization scheme and the corresponding privacy parameters determining module 4 of anonymization scheme, for according to the user security
Grade and security level and the anonymization processing scheme table of comparisons, determine that anonymization scheme and the anonymization scheme are corresponding
Initial privacy parameters value;The anonymization scheme includes directly providing retrieval data processing scheme, the processing side k-anonymity
Case, l-diversity processing scheme and t-closeness processing scheme;The k-anonymity processing scheme is corresponding hidden
Private parameter value is k value;The corresponding privacy parameters value of the l-diversity processing scheme is l value;At the t-closeness
The corresponding privacy parameters value of reason scheme is t value.
Privacy leakage probability and data quality value determining module 5, for according to the anonymization scheme and the anonymity
The corresponding initial privacy parameters value of change scheme determines privacy leakage probability and data quality value.
First judging result obtains module 6, for judging whether the privacy leakage probability is less than maximum privacy leakage threshold
Value and whether the data quality value is greater than quality of data threshold value, obtains the first judging result;Wherein, the maximum privacy leakage
Threshold value is provided by data providing, and the quality of data threshold value is provided by user side.
Cloth data publication module 7 is sent out, for indicating that the privacy leakage probability is less than institute when first judging result
When stating maximum privacy leakage threshold value and the data quality value and being greater than the quality of data threshold value, using the privacy leakage probability
Less than the maximum privacy leakage threshold value and the data quality value is greater than Anonymizer corresponding to the quality of data threshold value
Case and the corresponding privacy parameters value of anonymization scheme carry out secret protection processing to the user requested data retrieved, obtain
Cloth data are sent out, and send out cloth data to described in user's publication.
Second judging result obtains module 8, for indicating that the privacy leakage probability is greater than when first judging result
When being less than or equal to the quality of data threshold value equal to the maximum privacy leakage threshold value or the data quality value, described in judgement
Whether the corresponding privacy parameters value of anonymization scheme can be adjusted, and obtain the second judging result.
Privacy parameters adjust module 9, for indicating the corresponding privacy of the anonymization scheme when second judging result
When parameter value can be adjusted, adjust the corresponding privacy parameters value of the anonymization scheme, and return privacy leakage probability and
Data quality value determining module 5.
Third judging result obtains module 10, for that cannot carry out when the corresponding privacy parameters value of the anonymization scheme
When adjustment, judge whether the anonymization scheme can be adjusted, obtains third judging result.
Anonymization project setting module 11, for indicating that the anonymization scheme can carry out when the third judging result
When adjustment, anonymization scheme is adjusted, and return to privacy leakage probability and data quality value determining module 5.
Quality of data threshold value reduces module 12, for indicating that the anonymization scheme cannot when the third judging result
When being adjusted, the quality of data threshold value is reduced, and return to privacy leakage probability and data quality value determining module 5.
Fig. 3 is the structural schematic diagram that big data of the present invention issues secret protection platform, as shown in figure 3, user carries out first
Login/registration, big data issue secret protection land identification user identity, and following user submits the data requirements range of oneself
With data description of use, after data Layer receives the data requirements of user, SQL query statement is executed, is retrieved required for user
Data.Security level judgment module then determines security level according to the identity of user and data description of use for user, and will
Class information passes to secret protection processing part.Secret protection processing part is according to the security level of user in secret protection journey
It spends in three kinds of different anonymization algorithms and selects suitable anonymization processing scheme for it.After determining anonymization processing scheme, then
Privacy parameters selection is carried out for the anonymization processing scheme.Then, according to the secret protection constraint condition of data providing into
Row effect assessment carries out privacy parameters adjustment if being unsatisfactory for requiring, re-starts recruitment evaluation.Privacy parameters adjustment is completed
Afterwards, secret protection data processing module carries out the data retrieved according to selected anonymization processing scheme and privacy parameters
Secret protection processing, formation can issue data.User is distributed to by data publication module finally, data can be issued.
K value in k-anonymity algorithm is chosen.
To tables of data, (tables of data stores retrieval data to k value, and it is anonymous that k- is known as in k-anonymity processing scheme
Table) influence it is as follows: k value is bigger, and each equivalence class scale is bigger in k- anonymity table, in order to meet k- anonymity constraint condition
It needs, extensive attribute value is also more (extensive range is bigger), therefore the quality of data is poorer, but simultaneously, each equivalence class pair
The entity answered is also more, therefore the probability of each entity sensitive information of conjecture, with regard to smaller, secret protection degree is also higher.K value
Smaller (extensive range is smaller), equivalence class scale is extensive in order to meet k- anonymity constraint condition needs with regard to smaller in k- anonymity table
Attribute value it is fewer, therefore the quality of data is better, but simultaneously, and the corresponding entity of each equivalence class is also fewer, therefore guesses
The probability of each entity sensitive information is bigger, and secret protection degree is also poorer.
Therefore, the selection of k value is extremely important, needs that (i.e. maximum privacy leakage threshold value, is by data from secret protection degree
Provider provides) and two aspects of availability of data (refer to data quality threshold, being provided by data requirements person (user side))
Consider, it is improper to choose, and just will affect secret protection effect.For this purpose, the embodiment of the invention provides k value choosing method, purpose
It is to choose while meeting the k value that summed data availability requirement is wanted in secret protection.On how to measure secret protection degree, this hair
Bright embodiment proposes maximum privacy leakage threshold value Pmax, calculation method is the Sensitive Attributes in the equivalence class in k- anonymity table
The ratio between the maximum number of repetitions of value and equivalence class number of tuples, representative meaning is that a certain entity is deduced from any equivalence class
Maximum privacy leakage threshold value P is not to be exceeded in the probability of privacy informationmax。
Fig. 4 is the flow diagram of k value choosing method in k-anonymity algorithm of the present invention, as shown in figure 4, the k value is selected
The method is taken to include:
Step 1: determining maximum privacy leakage threshold value.Maximum privacy leakage threshold value PmaxIt is determined by data providing.
Step 2: obtaining k minimum value;Specially according to the maximum privacy leakage threshold value Pmax, calculate in k value set
Minimum value;Wherein the calculation formula of the minimum value in k value set isT is indicated in tables of data
The maximum number of repetitions of the Sensitive Attributes value of equivalence class;KminIt is expressed as the minimum value of k value set.
Step 3: determining quality of data threshold value, i.e., quality of data threshold value above-mentioned.
About parameter used in availability of data is measured, the embodiment of the present invention uses identification module (discern
Abilitymetric) as the evaluation criteria to the quality of data.In order to more accurately calculate the number of k value parameter and the quality of data
Relationship, the embodiment of the present invention improve data quality calculation formula, and improved quality of data calculation formula isWherein, Q indicates the scale of i-th of equivalence class in tables of data, and N is equivalence class number.CDMIt is worth smaller, illustrates to count
More uniform according to equivalence class scale in table, the quality of data of tables of data is preferable;CDMValue is bigger, illustrates equivalence class scale phase in tables of data
Difference is larger, and the quality of data of tables of data is poor.Tables of data is carried out to meet KminThe processing of constraint obtains initial CDMValue counts
According to quality threshold.
Step 4: obtaining k maximum value;Specially according to the quality of data threshold value, the maximum value in k value set is calculated,
Wherein the calculation formula of the maximum value in k value set is
Step 5: obtaining k optimal value;Specially according in the k value set minimum value and the k value set in most
Big value, determines the selection range [K of k value setmin, Kmax].When k value is minimized, the quality of data of tables of data is best;Work as k
When value is maximized, then the secret protection degree highest of tables of data is first gradually increased since the minimum value of k value set, but
No more than maximum value, until meeting the requirements, optimal k value is obtained.
If needing to pay attention to a little as KmaxLess than Kmin, then explanation can not meet availability of data constraint condition and privacy simultaneously
Protect constraint condition.
L value in l-diversity algorithm is chosen
Selection for l value in l-diversity algorithm, presently mainly using the method for determining l value based on comentropy.
The embodiment of the present invention is proposed on the basis of being determined the method for l value based on comentropy and meets given secret protection constraint item
Part (i.e. maximum privacy leakage threshold value, provided by data providing) and quality of data constraint condition (refer to data quality threshold,
Be by data requirements person (user side) provide) choosing method.
Fig. 5 is the flow diagram of l value choosing method in k-anonymity algorithm of the present invention, as shown in figure 5, the l value is selected
The method is taken to include:
Step 1: calculating l maximum value.
The process for the l value set maximum value that comentropy system determines is as follows:
Identity property subject to A, SA are Sensitive Attributes, the determination formula of maximum value in l value set are as follows:
Wherein, S={ Si…SjIt is Sensitive Attributes value, equivalence class collection is combined into E={ Ei…Ej}.P, E, s are equivalence class EiIn
The frequency that Sensitive Attributes value s occurs,For equivalence class EiComentropy.
Comentropy reflects the distribution situation of attribute.Comentropy is bigger, it is meant that the distribution of Sensitive Attributes value in equivalence class
It is more uniform, derive that the difficulty of specific individual is also bigger.According to above-mentioned formula, it can be deduced that the maximum value of l value set.
The minimum value value of l value set is by the Sensitive Attributes that the least equivalence class of Sensitive Attributes classification includes in tables of data
Classification number.
Step 2: the value since the minimum value of l value set, is gradually increased, terminate when maximum value.
Specifically:
Judge whether tables of data meets l-diversity constraint condition;L-diversity constraint condition includes secret protection
Constraint condition and quality of data constraint condition.If satisfied, then privacy parameters adjustment terminates.Privacy parameters are to take under the step herein
The l value obtained.If not satisfied, modification tables of data meets l-diversity constraint, i.e. addition and the least equivalence of Sensitive Attributes classification
The identical tuple of the value of quasi- identity property in table.The Sensitive Attributes value classification of addition tuple is in tables of data, and should
The classification not included in the least equivalent table of Sensitive Attributes classification, until tables of data meets l-diversity constraint condition.
Calculate the data quality value (C of tables of data at this timeDMValue), judge whether the data quality value of tables of data is less than l-
The quality of data threshold value of quality of data constraint condition in diversity.If so, explanation after l-diversity is handled, counts
Meet user according to the availability of data of table and require privacy parameters, adjustment terminates.If it is not, then l value increases by 1, until l value increases to most
Big value.
If l value is attempted to maximum value, it is still unsatisfactory for condition, then privacy parameters choose failure.
The privacy parameters that the l value choosing method obtains meet given secret protection constraint condition and the quality of data simultaneously
Constraint condition.If privacy parameters choose failure, illustrate that secret protection constraint condition mutually conflicts with quality of data constraint condition, l value
Choosing method cannot meet given secret protection constraint condition and quality of data constraint condition simultaneously.
T value is chosen in t-closeness algorithm
L-diversity algorithm has done very big improvement compared to k-anonymity algorithm, but is still unavoidable from phase
Like sexual assault.So-called similar attack refers in the case where a certain Sensitive Attributes value accounting is excessive, and attacker has high probability can be with
Release individual privacy.For these shortcomings, t-closeness algorithm further considers the distribution problem of Sensitive Attributes value,
It is required that the distributional difference of sensitive Distribution value in any group of equivalence class and the attribute in entire tables of data is no more than preparatory
Given threshold t, to solve the problems, such as Similarity Attack.
It is similar with the privacy parameters choosing method of l-diversity algorithm, it is therefore an objective to which that selection meets availability of data about
The t value of beam condition.Mainly by iteration adjustment parameter t, so that CDM(meet t-closeness minimum value less than initial value
The C of the tables of data of constraintDMValue).The availability of data of tables of data meets user's requirement at this time.
Fig. 6 is the flow diagram of t value choosing method in t-closeness algorithm of the present invention, as shown in fig. 6, the t value is selected
The method is taken to include:
Step 1: determining equivalence class set;Tables of data is enabled to meet k-anonymity constraint, equivalence class collection is combined into E=
{Ei…Ej, P DiThe set of all values, DiIndicate the distribution of Sensitive Attributes value in i-th of equivalence class and global distribution away from
From.
Step 2: determining t value set;It enablesDmin=min { Di, Dmax=max { Di}。
Wherein, EMD (Earth Mover ' s Distance) is land mobile distance, is that one kind is used to measure distributional difference
Algorithm.It indicates to be transferred to minimum cost required for another is distributed from a distribution herein.
Third step carries out t-closeness processing to user requested data, calculates data quality value;Specially enable data
It is D that table, which meets t value,minT-closeness constraint, calculate the data quality value of tables of data.
Step 4: judging whether the data quality value of tables of data is less than quality of data threshold value;
Step 405: if then meeting availability of data constraint condition, privacy parameters selection terminates, and the value of t is Dmin
Step 5: if not satisfied, then t value increases Dmin, return step third step, until t >=DmaxStop.
If t >=DmaxWhen be still unsatisfactory for, then privacy parameters choose failure.
Since the constraint of t-closeness algorithm is stringenter, the quality of data requires to be more difficult to compared with other two methods
Reach the requirement of user, limitation is larger.Therefore the data for meeting t-closeness constraint are difficult to use in data mining, data
The application such as analysis, but because it is most strong to the degree of protection of privacy, it is suitably applied the higher data publication scene of risk.
Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said
It is bright to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, foundation
Thought of the invention, there will be changes in the specific implementation manner and application range.In conclusion the content of the present specification is not
It is interpreted as limitation of the present invention.
Claims (10)
1. a kind of method for secret protection towards big data publication, which is characterized in that the method for secret protection includes:
Step 101: obtaining identity information, data requirements range and the data description of use of user;
Step 102: user security grade being determined according to the identity information of the user and data description of use, according to the user
Data requirements range retrieval user requested data;
Step 103: according to the user security grade and security level and the anonymization processing scheme table of comparisons, determining anonymization
Scheme and the corresponding initial privacy parameters value of the anonymization scheme;The anonymization scheme includes directly providing retrieval number
According to processing scheme, k-anonymity processing scheme, l-diversity processing scheme and t-closeness processing scheme;Institute
Stating the corresponding privacy parameters value of k-anonymity processing scheme is k value;The corresponding privacy of the l-diversity processing scheme
Parameter value is l value;The corresponding privacy parameters value of the t-closeness processing scheme is t value;
Step 104: according to the anonymization scheme and the anonymization scheme, corresponding initial privacy parameters value is determined
Privacy leakage probability and data quality value;
Step 105: judging whether the privacy leakage probability be less than maximum privacy leakage threshold value and the data quality value
Greater than quality of data threshold value, the first judging result is obtained;Wherein, the maximum privacy leakage threshold value is provided by data providing,
The quality of data threshold value is provided by user side;
Step 106: if first judging result indicate the privacy leakage probability be less than the maximum privacy leakage threshold value and
The data quality value is greater than the quality of data threshold value, then is less than the maximum privacy leakage using the privacy leakage probability
Threshold value and the data quality value are greater than anonymization scheme corresponding to the quality of data threshold value and anonymization scheme is corresponding
Privacy parameters value carries out secret protection processing to the user requested data that retrieves, obtains sending out cloth data, and to user
Cloth data are sent out described in publication;
Step 107: if first judging result indicates that the privacy leakage probability is more than or equal to the maximum privacy leakage threshold
Value or the data quality value are less than or equal to the quality of data threshold value, then judge the corresponding privacy ginseng of the anonymization scheme
Whether numerical value can be adjusted, and obtain the second judging result;
Step 108: if second judging result indicates the anonymization scheme, corresponding privacy parameters value can be adjusted,
The corresponding privacy parameters value of the anonymization scheme, and return step 104 are then adjusted, according to the anonymization scheme and described
Anonymization scheme privacy parameters value adjusted redefines privacy leakage probability and data quality value;
Step 109: if the corresponding privacy parameters value of the anonymization scheme cannot be adjusted, judging the Anonymizer
Whether case can be adjusted, and obtain third judging result;
Step 110: if the third judging result indicates that the anonymization scheme can be adjusted, adjusting Anonymizer
Case, and return step 104, according to anonymization scheme adjusted and the corresponding privacy parameters value of anonymization scheme adjusted
Redefine privacy leakage probability and data quality value;
Step 111: if the third judging result indicates that the anonymization scheme cannot be adjusted, reducing the data
Quality threshold, and return step 104, the Rule of judgment until meeting step 105 stop.
2. method for secret protection according to claim 1, which is characterized in that the data requirements model according to the user
Retrieval user requested data is enclosed, is specifically included:
Using SQL query statement, the data for meeting user data range of needs are extracted, and from source database with the shape of tables of data
Formula storage.
3. method for secret protection according to claim 1, which is characterized in that the security level and anonymization processing scheme
The table of comparisons includes four kinds of corresponding relationships, and being respectively as follows: the anonymization scheme when security level is 0 is directly to provide retrieval data processing
Scheme, when security level is 1, anonymization scheme is k-anonymity processing scheme, the anonymization scheme when security level is 2
For l-diversity processing scheme, when security level is 3, anonymization scheme is t-closeness processing scheme.
4. method for secret protection according to claim 1, which is characterized in that judging that the anonymization scheme is corresponding hidden
Before whether private parameter value can be adjusted, the big data issues method for secret protection further include:
Determine the value range of the corresponding privacy parameters value of the anonymization scheme.
5. method for secret protection according to claim 4, which is characterized in that the determination anonymization scheme is corresponding
The value range of privacy parameters value, specifically includes:
Determine k value set, l value set and t value set;The k value set is k value in the k-anonymity processing scheme
The value range of parameter, the l value set are the value range of l value parameter in the l-diversity processing scheme, the t
Value set is the value range of t value parameter in the t-closeness processing scheme.
6. method for secret protection according to claim 5, which is characterized in that the determining k value set specifically includes:
According to the maximum privacy leakage threshold value, the minimum value in the k value set is calculated;
Determine quality of data threshold value, and the maximum value in the k value set according to the quality of data threshold calculations;The data
Quality threshold is determined under the premise of tables of data meets the maximum privacy leakage threshold value;The tables of data is retrieval data
Storage table;
According to the maximum value in the minimum value and the k value set in the k value set, k value set is determined.
7. method for secret protection according to claim 5, which is characterized in that the determining l value set specifically includes:
The maximum value of the l value set is determined according to comentropy system;
Determine the least equivalence class of Sensitive Attributes classification in the tables of data retrieved, and the Sensitive Attributes classification is least etc.
The Sensitive Attributes classification number that valence class is included is determined as the minimum value of the l value set;The tables of data is to retrieve depositing for data
Store up table;
According to the maximum value in the minimum value and the l value set in the l value set, l value set is determined.
8. method for secret protection according to claim 6, which is characterized in that the determining t value set specifically includes:
The tables of data is adjusted, so that tables of data adjusted meets the maximum privacy leakage threshold value and the quality of data threshold
Value, and determine the equivalence class set of tables of data adjusted;
T value set is determined according to the equivalence class set;The t value set is the set of all Di values, and Di indicates i-th of equivalence
The distribution of Sensitive Attributes value is at a distance from overall situation distribution in class.
9. method for secret protection according to claim 4, which is characterized in that the judgement anonymization scheme is corresponding
Whether privacy parameters value can be adjusted, and specifically include:
According to the value range of the corresponding privacy parameters value of the anonymization scheme, since the minimum value of the value range,
It is successively carried out according to sequence from small to large, is adjusted to be more than the value when the corresponding privacy parameters value of the anonymization scheme
After the maximum value of range, the adjustment of the corresponding privacy parameters value of the anonymization scheme is no longer carried out.
10. a kind of intimacy protection system towards big data publication, which is characterized in that the intimacy protection system includes:
User profile acquisition module, for obtaining identity information, data requirements range and the data description of use of user;
User security level determination module, for determining user security according to the identity information and data description of use of the user
Grade;
User requested data determining module, for the data requirements range retrieval user requested data according to the user;
Anonymization scheme and the corresponding privacy parameters determining module of anonymization scheme, for according to the user security grade with
And security level and the anonymization processing scheme table of comparisons, determine that anonymization scheme and the anonymization scheme are corresponding initial
Privacy parameters value;The anonymization scheme includes directly providing retrieval data processing scheme, k-anonymity processing scheme, l-
Diversity processing scheme and t-closeness processing scheme;The corresponding privacy ginseng of the k-anonymity processing scheme
Numerical value is k value;The corresponding privacy parameters value of the l-diversity processing scheme is l value;The processing side t-closeness
The corresponding privacy parameters value of case is t value;
Privacy leakage probability and data quality value determining module, for according to the anonymization scheme and the anonymization scheme
Corresponding initial privacy parameters value determines privacy leakage probability and data quality value;
First judging result obtains module, for judging whether the privacy leakage probability is less than maximum privacy leakage threshold value and institute
It states whether data quality value is greater than quality of data threshold value, obtains the first judging result;Wherein, the maximum privacy leakage threshold value by
Data providing provides, and the quality of data threshold value is provided by user side;
Cloth data publication module is sent out, for indicating that the privacy leakage probability is less than the maximum when first judging result
When privacy leakage threshold value and the data quality value are greater than the quality of data threshold value, institute is less than using the privacy leakage probability
It states maximum privacy leakage threshold value and the data quality value is greater than anonymization scheme corresponding to the quality of data threshold value and hides
The corresponding privacy parameters value of nameization scheme carries out secret protection processing to the user requested data retrieved, obtains sending out cloth
Data, and cloth data are sent out to described in user's publication;
Second judging result obtains module, for indicating that the privacy leakage probability is more than or equal to institute when first judging result
When stating maximum privacy leakage threshold value or the data quality value less than or equal to the quality of data threshold value, the anonymization is judged
Whether the corresponding privacy parameters value of scheme can be adjusted, and obtain the second judging result;
Privacy parameters adjust module, for indicating the corresponding privacy parameters value of the anonymization scheme when second judging result
When can be adjusted, the corresponding privacy parameters value of the anonymization scheme is adjusted, and return to privacy leakage probability and data matter
Magnitude determining module;
Third judging result obtains module, for that cannot be adjusted when the corresponding privacy parameters value of the anonymization scheme
When, judge whether the anonymization scheme can be adjusted, obtains third judging result;
Anonymization project setting module, for indicating that the anonymization scheme can be adjusted when the third judging result
When, anonymization scheme is adjusted, and return to privacy leakage probability and data quality value determining module;
Quality of data threshold value reduces module, for indicating that the anonymization scheme cannot be adjusted when the third judging result
When whole, the quality of data threshold value is reduced, and return to privacy leakage probability and data quality value determining module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811356234.4A CN109446844B (en) | 2018-11-15 | 2018-11-15 | Privacy protection method and system for big data release |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811356234.4A CN109446844B (en) | 2018-11-15 | 2018-11-15 | Privacy protection method and system for big data release |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109446844A true CN109446844A (en) | 2019-03-08 |
CN109446844B CN109446844B (en) | 2020-06-05 |
Family
ID=65553616
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811356234.4A Active CN109446844B (en) | 2018-11-15 | 2018-11-15 | Privacy protection method and system for big data release |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446844B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008742A (en) * | 2019-03-21 | 2019-07-12 | 九江学院 | It is a kind of to regularly publish the anonymous guard method of the leakage of the efficient Q value zero in private data for SRS |
CN110543782A (en) * | 2019-07-10 | 2019-12-06 | 暨南大学 | Method and system for realizing desensitization of data set based on k-anonymity algorithm |
CN110995592A (en) * | 2019-12-16 | 2020-04-10 | 北京信息科技大学 | Novel self-maintenance method and route forwarding method of undetermined interest table |
CN111079185A (en) * | 2019-12-20 | 2020-04-28 | 南京医康科技有限公司 | Database information processing method and device, storage medium and electronic equipment |
CN111723396A (en) * | 2020-05-20 | 2020-09-29 | 华南理工大学 | SaaS-based general cloud data privacy protection platform and method |
CN112487415A (en) * | 2020-12-09 | 2021-03-12 | 华控清交信息科技(北京)有限公司 | Method and device for detecting safety of computing task |
CN112765659A (en) * | 2021-01-20 | 2021-05-07 | 丁同梅 | Data leakage protection method for big data cloud service and big data server |
CN115310135A (en) * | 2022-10-09 | 2022-11-08 | 北京中超伟业信息安全技术股份有限公司 | Storage data safe storage method and system based on hidden model |
CN115374460A (en) * | 2022-08-31 | 2022-11-22 | 北京华宜信科技有限公司 | Method for anonymously submitting data by multiple users |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103596133A (en) * | 2013-11-27 | 2014-02-19 | 哈尔滨工业大学深圳研究生院 | Location anonymous method and device for continuous queries and privacy protection system |
CN105512566A (en) * | 2015-11-27 | 2016-04-20 | 电子科技大学 | Health data privacy protection method based on K-anonymity |
WO2017008144A1 (en) * | 2015-07-15 | 2017-01-19 | Privacy Analytics Inc. | Re-identification risk measurement estimation of a dataset |
CN107145796A (en) * | 2017-04-24 | 2017-09-08 | 公安海警学院 | Track data k anonymities method for secret protection under a kind of uncertain environment |
CN107707530A (en) * | 2017-09-12 | 2018-02-16 | 福建师范大学 | A kind of method for secret protection and system of mobile intelligent perception |
CN108133146A (en) * | 2017-06-01 | 2018-06-08 | 徐州医科大学 | Sensitive Attributes l-diversity method for secret protection based on secondary division |
CN108268786A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of data desensitization technology based on T-Closeness algorithms |
CN108540936A (en) * | 2017-12-18 | 2018-09-14 | 西安电子科技大学 | Method for secret protection based on prediction |
-
2018
- 2018-11-15 CN CN201811356234.4A patent/CN109446844B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103596133A (en) * | 2013-11-27 | 2014-02-19 | 哈尔滨工业大学深圳研究生院 | Location anonymous method and device for continuous queries and privacy protection system |
WO2017008144A1 (en) * | 2015-07-15 | 2017-01-19 | Privacy Analytics Inc. | Re-identification risk measurement estimation of a dataset |
CN105512566A (en) * | 2015-11-27 | 2016-04-20 | 电子科技大学 | Health data privacy protection method based on K-anonymity |
CN108268786A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of data desensitization technology based on T-Closeness algorithms |
CN107145796A (en) * | 2017-04-24 | 2017-09-08 | 公安海警学院 | Track data k anonymities method for secret protection under a kind of uncertain environment |
CN108133146A (en) * | 2017-06-01 | 2018-06-08 | 徐州医科大学 | Sensitive Attributes l-diversity method for secret protection based on secondary division |
CN107707530A (en) * | 2017-09-12 | 2018-02-16 | 福建师范大学 | A kind of method for secret protection and system of mobile intelligent perception |
CN108540936A (en) * | 2017-12-18 | 2018-09-14 | 西安电子科技大学 | Method for secret protection based on prediction |
Non-Patent Citations (1)
Title |
---|
宋金玲等: "K-匿名隐私保护模型中K值的优化选择算法", 《小型微型计算机系统》 * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008742A (en) * | 2019-03-21 | 2019-07-12 | 九江学院 | It is a kind of to regularly publish the anonymous guard method of the leakage of the efficient Q value zero in private data for SRS |
CN110543782A (en) * | 2019-07-10 | 2019-12-06 | 暨南大学 | Method and system for realizing desensitization of data set based on k-anonymity algorithm |
CN110543782B (en) * | 2019-07-10 | 2022-03-29 | 暨南大学 | Method and system for realizing desensitization of data set based on k-anonymity algorithm |
CN110995592B (en) * | 2019-12-16 | 2021-09-07 | 北京信息科技大学 | Novel self-maintenance method and route forwarding method of undetermined interest table |
CN110995592A (en) * | 2019-12-16 | 2020-04-10 | 北京信息科技大学 | Novel self-maintenance method and route forwarding method of undetermined interest table |
CN111079185A (en) * | 2019-12-20 | 2020-04-28 | 南京医康科技有限公司 | Database information processing method and device, storage medium and electronic equipment |
CN111723396A (en) * | 2020-05-20 | 2020-09-29 | 华南理工大学 | SaaS-based general cloud data privacy protection platform and method |
CN111723396B (en) * | 2020-05-20 | 2023-02-10 | 华南理工大学 | SaaS-based universal cloud data privacy protection platform and method |
CN112487415A (en) * | 2020-12-09 | 2021-03-12 | 华控清交信息科技(北京)有限公司 | Method and device for detecting safety of computing task |
CN112487415B (en) * | 2020-12-09 | 2023-10-03 | 华控清交信息科技(北京)有限公司 | Method and device for detecting security of computing task |
CN112765659A (en) * | 2021-01-20 | 2021-05-07 | 丁同梅 | Data leakage protection method for big data cloud service and big data server |
CN112765659B (en) * | 2021-01-20 | 2021-09-21 | 曙光星云信息技术(北京)有限公司 | Data leakage protection method for big data cloud service and big data server |
CN115374460A (en) * | 2022-08-31 | 2022-11-22 | 北京华宜信科技有限公司 | Method for anonymously submitting data by multiple users |
CN115310135A (en) * | 2022-10-09 | 2022-11-08 | 北京中超伟业信息安全技术股份有限公司 | Storage data safe storage method and system based on hidden model |
Also Published As
Publication number | Publication date |
---|---|
CN109446844B (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446844A (en) | A kind of method for secret protection and system towards big data publication | |
Karr et al. | Secure regression on distributed databases | |
Wei et al. | Differential privacy-based location protection in spatial crowdsourcing | |
CN107395430B (en) | Cloud platform dynamic risk access control method | |
EP2930646B1 (en) | Systems and methods for anonymized user list counts | |
İmrohoroğlu et al. | On the political economy of income redistribution and crime | |
Wei et al. | Some geometric aggregation operators based on interval-valued intuitionistic fuzzy sets and their application to group decision making | |
US7818335B2 (en) | Selective privacy guarantees | |
US20110179011A1 (en) | Data obfuscation system, method, and computer implementation of data obfuscation for secret databases | |
US7769707B2 (en) | Data diameter privacy policies | |
CN106940777A (en) | A kind of identity information method for secret protection measured based on sensitive information | |
WO2020177484A1 (en) | Localized difference privacy urban sanitation data report and privacy calculation method | |
CN109117669B (en) | Privacy protection method and system for MapReduce similar connection query | |
CN106980795A (en) | Community network data-privacy guard method | |
CN108600271A (en) | A kind of method for secret protection of trust state assessment | |
CN112738172B (en) | Block chain node management method and device, computer equipment and storage medium | |
CN109154952A (en) | For protecting the method and system of storing data | |
CN109359480A (en) | A kind of the privacy of user guard method and system of Digital Library-Oriented | |
CN110110545A (en) | The space crowdsourcing Quality Control Model detected based on location privacy protection and tricker | |
CN109524065A (en) | Medical data querying method, medical data platform and relevant apparatus | |
Kuang et al. | A privacy protection model of data publication based on game theory | |
CN108768968A (en) | A kind of method and system that service request is handled based on data safety management engine | |
CN113806799B (en) | Block chain platform safety intensity assessment method and device | |
CN117171801B (en) | Efficient space query method and system with adjustable privacy protection intensity | |
Xu et al. | Privacy preserving online matching on ridesharing platforms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240429 Address after: 1211, Building A, China International Science and Technology Exhibition Center, No. 12 Yumin Road, Chaoyang District, Beijing, 100029 Patentee after: Beijing jiuweiwei'an Technology Co.,Ltd. Country or region after: China Address before: No.12, Xiaoying East Road, Qinghe, Haidian District, Beijing Patentee before: BEIJING INFORMATION SCIENCE AND TECHNOLOGY University Country or region before: China |
|
TR01 | Transfer of patent right |