CN107832631A - The method for secret protection and system of a kind of data publication - Google Patents
The method for secret protection and system of a kind of data publication Download PDFInfo
- Publication number
- CN107832631A CN107832631A CN201711115389.4A CN201711115389A CN107832631A CN 107832631 A CN107832631 A CN 107832631A CN 201711115389 A CN201711115389 A CN 201711115389A CN 107832631 A CN107832631 A CN 107832631A
- Authority
- CN
- China
- Prior art keywords
- data
- equivalence class
- record
- diversity
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6263—Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Storage Device Security (AREA)
Abstract
The invention discloses a kind of method for secret protection of data publication, the described method comprises the following steps:S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;Ensure that follow-up equivalence class partition has identical diversity;S20:According to diversity judged result, data equivalence class partition is carried out;S30:Result after equivalence class partition is subjected to data segmentation.The present invention realizes simple and convenient, and the data after being handled using the present invention have higher secret protection degree, relatively low information loss degree and preferable availability, practical, can resist a variety of privacies pry attacks.
Description
Technical field
The present invention relates to the method for secret protection and system in information safety protection field, more particularly to a kind of data publication.
Background technology
With the high speed development of internet, dependence of the people to network is also progressively deepened, and data message amount rapidly increases, when
While network provides convenient to people, such as shopping online, transfer accounts, order air ticket and need not walk out door, can be square on network
Just quickly realize, there is also substantial amounts of information leakage risk, such as individual privacy information, medical data, account number cipher, bank card
Information, trade secret information etc. are easily utilized after spreading through the internet, and cause identity leakage, damage to property etc.,
Serious even can life threatening health.As can be seen here, the importance of information safety protection.After " prism door " event, respectively
Also all in the security protection of Strengthens network, this brings new opportunities and challenges to data safety and secret protection for state.
For ensure data privacy, carry out data publication and it is shared while, it is necessary to data carry out secret protection
Processing.At present, issue tables of data is generally divided into three generic attributes:(1) individual marking attribute (Individually Identifier
Attribute, ID), individual identity attribute can be identified;(2) quasi- identity property (Quasi-dientifier, QI), deposits simultaneously
In privacy table and appearance, mark can be utilized link to and deduce individual information, Sensitive Attributes (Sensitive
Attribute, SA), the packet of record is not intended to by the privacy information known to other people containing user.
Privacy leakage can not be prevented by only deleting QI attributes or ID attributes when being issued to above-mentioned three generic attribute, when issue
Data and other data be attached and may result in identity information and Sensitive Attributes are revealed;The it is proposeds such as Sweeney in 2002
K- anonymity secret protection models, can effectively prevent connection from attacking, but k- is anonymous without constraint Sensitive Attributes value, enter without
Background knowledge attack and homogeneity sexual assault can be prevented;Effectively to solve the above problems, l- diversity (l-diversity), (α, k)
Anonymous, t approaches (t closeness) etc. and is suggested successively, and these secret protection models mainly use to the processing procedure of data
Generalization, extensive realization, this processing method maintain original semantic information substantially, but can cause information loss and reduce data
Effectiveness.
In recent years, clustering algorithm is largely used in data mining, and the data publication of secret protection requires the number announced
Identical standard identifier is arrived according to each cluster generalization of concentration, this is quite similar with the cluster process in data mining, then just has
Using clustering method realize the multifarious researchs of l-.
Patent document if notification number is CN104317904A discloses " a kind of extensive method of Weight community network ",
Including:Descending sort is carried out according to node degree to node and is grouped;The weight on extensive existing side, and calculate side and exist generally
Rate;All node Sensitive Attributes formation Sensitive Attributes bags are extracted after traveling through all anonymous group collection;Sensitive Attributes between calculate node
The maximum comparability of bag, according to extensive tree, obtain the extensive bag of Sensitive Attributes bag;The anonymous group collection of K- weights is traveled through, is finally given
Meet that K-Weighted-inv-l-diversityanonymous schemes.The invention considers the weight on side, and how quick considers
The problem of feeling attribute so that method for secret protection is more applicable for actual community network, can more completely protect Weight
Multi-sensitive attributes in figure.
And for example Publication No. CN106874788A patent document discloses a kind of " secret protection in sensitive data issue
Method, including:Receive the data set from user and corresponding multiple generalization input trees, each group of number that ergodic data is concentrated
According to, and judge that each column data in this group of data whether there is corresponding generalization input tree successively, if it is present according to this
The property value of data node corresponding to lookup in corresponding generalization input tree, and the information of the node is input to coordinate array
In, so as to obtain m row coordinate arrays, and it is every if it does not exist, then directly by the property value input coordinate array of the data
The flag bit that individual coordinate array addition initial value is 0, establishes p cluster, wherein p rows coordinate is randomly choosed from m row coordinate arrays
Central point of the array respectively as p cluster of foundation.The method generally changed again by first clustering, improves computational efficiency, is data
Privacy issue provide guarantee.
The content of the invention
The technical problem to be solved in the present invention is to be directed to above-mentioned the deficiencies in the prior art, there is provided one kind realize it is simple and convenient,
The method for secret protection of data publication with higher secret protection degree, relatively low information loss degree and preferable availability and
System.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of method for secret protection of data publication, the described method comprises the following steps:
S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
S20:According to diversity judged result, data equivalence class partition is carried out;
S30:Result after equivalence class partition is subjected to data segmentation.
Further, the Sensitive Attributes value of data described in the step S10 species carry out diversity judge be specially:
The species of the Sensitive Attributes value of the data is compared with diversity parameters L;
Comprise the following steps in the step S20:
S21:If the species of Sensitive Attributes value is more than or equal to diversity parameters L, selection performs single equivalence class partition;
S22:If the species of Sensitive Attributes value is less than diversity parameters L, selection performs candidate's equivalence class partition.
Further, advance line number Data preprocess is judged in progress diversity after reception data in the step S10, including
Following steps:
S11:Each standard identifier property value is standardized, the standard identifier property value is mapped to [0,1] scope;
S12:Calculate the weight of standard identifier attribute;
S13:The integrated value of every record is calculated, the calculation formula of the integrated value is as follows:
Wherein:WiRepresent integrated value, wjRepresent weight, xijThe standard identifier property value after standardization is represented, n indicates n
Individual standard identifier attribute, η indicate that η bars record.
Further, the data prediction also includes:
S14:Every record in data is ranked up according to integrated value size.
Further, the equivalence class partition that data are performed described in the step S20 comprises the following steps:
S201:The record of predetermined number is selected to be divided into same equivalence class successively according to integrated value size;
S202:Determine whether that remaining record is not carried out step S201;
S203:If there is remaining record to be not carried out step S201, candidate's equivalence class partition is performed.
Further, execution candidate's equivalence class partition is specially:
Judge that data whether there is candidate's equivalence class, if in the presence of data being referred into best candidate equivalence class, if not depositing
Data are then being referred to optimal equivalence class.
Further, it is further comprising the steps of after the step S30:
S40:Issued after data after segmentation are attached by predicable.
A kind of intimacy protection system of data publication, the system include:
Judge module, for receiving data, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
Equivalence class partition module, for according to diversity judged result, carrying out data equivalence class partition;
Split module, for the result after equivalence class partition to be carried out into data segmentation.
Further, the system also includes:
Release module, for being issued after the data after segmentation are attached by predicable;
The judge module includes:
Computing unit, for calculating the integrated value of every record;
Sequencing unit, for being ranked up to every record in data according to integrated value size;
Comparing unit, for the species of the Sensitive Attributes value of the data to be compared with diversity parameters L.
Further, the equivalence class partition module includes:
First division unit, for performing single equivalence class partition;
Second division unit, for performing candidate's equivalence class partition;
First division unit includes:
First judgment sub-unit, for determining whether that remaining record is not carried out selecting to preset successively according to integrated value size
The record of quantity is divided into same equivalence class;
Second division unit includes:
Second judgment sub-unit, for judging that data whether there is candidate's equivalence class;
Sorting out subelement, if for candidate's equivalence class be present, data being referred to best candidate equivalence class, if being not present
Candidate's equivalence class, then data are referred to optimal equivalence class.
After adopting the above technical scheme, the beneficial effects of the invention are as follows:
Equivalence class partition is carried out by the way that data are based on into diversity, does not destroy the original letter of data during equivalence class partition
Breath, similar element set is combined together, advantageously allowing data has preferable availability, more standardizes;
By the separated contact for reducing data of the data after equivalence class partition, Sensitive Attributes, standard identifier category are not changed
The initial data of property, it is smaller to the degree of loss of data, while be advantageous to secret protection after data publication so that data are safer;
By the way that equivalence class partition is divided into the single equivalence class partition dividing mode different with candidate's equivalence class, can more there is pin
To property and more complete processing data, the secret protection of each record is ensured;
By calculating in data the integrated value of every record, and according to integrated value size by the record with optimal similarity
Assemble an equivalence class so that data processing simply easily realizes that data assignment is more orderly, and practicality is more preferable, is advantageous to itself
Excavation and integration to data so that the availability of data greatly improves, and whole process information loss degree is low;
Issued after data after segmentation are attached by predicable, the predicable can drawing according to equivalence class
Divide result Custom Attributes, it is general that real information is disturbed using increase redundant information, the purpose of data-privacy protection is reached, is made
Data degree of loss it is low and secret protection effect is good.
Brief description of the drawings
In order to illustrate the embodiments of the present invention more clearly or prior art technical scheme, accompanying drawing is as follows:
Fig. 1 is a kind of method for secret protection flow chart for data publication that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of method for secret protection flow chart for data publication that the embodiment of the present invention 2 provides;
Fig. 3 is a kind of method for secret protection flow chart for data publication that the embodiment of the present invention 3 provides;
Fig. 4 is a kind of intimacy protection system block diagram for data publication that the embodiment of the present invention 4 provides.
Embodiment
It is the specific embodiment of the present invention and with reference to accompanying drawing below, technical scheme is further described,
But the present invention is not limited to these embodiments.
The data publication of secret protection mainly handles standard identifier attribute, either using generalization/hiding method still
Clustering method, all it is in alignment with identifier attribute and is handled.
Of the invention mainly use clusters and split two technologies, realizes the secret protection of issue data, particularly static number
According to secret protection.
Clustering technique is to be combined together similar element set, the similitude of analyze data, and different scenes demand is different, is gathered
The algorithm of class is also not quite similar.The design of Clustering Model is to realize a step more crucial in the present invention, different Clustering Models
Have different clustering algorithms.
The it is proposed of cutting techniques, new approaches and method are provided for the secret protection of data publication, it does not change Sensitive Attributes,
The raw value of quasi- identity property, reach the purpose of secret protection by reducing contacting for Sensitive Attributes and quasi- identity property,
Usually, the standard identifier attribute (QI) in data set and Sensitive Attributes (SA) are divided into two using cutting techniques to be mutually not attached to
Two datasets after issued, multiple data sets can also be divided into according to the related law of data set.
Present invention more detailed description is as follows:
Embodiment 1
As shown in figure 1, the present embodiment provides a kind of method for secret protection of data publication, methods described includes following step
Suddenly:
S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
In this step, the data for having secret protection demand, the individual of a generally tables of data, first understanding tables of data are received
Identity property, quasi- identity property, Sensitive Attributes, the sensitive data in initial data is determined, that is, checks the Sensitive Attributes value of data,
Counting the species of different Sensitive Attributes values in the Sensitive Attributes column of tables of data has several, such as the species of Sensitive Attributes value is
Sc, based on ScSize carry out tables of data different demarcation, typically by set a diversity parameters l, judge ScWith l pass
System, will meet that the multifarious records of l- are sorted out, two diversity parameters l of the setting also having and l ', l and l ' value difference, will be quick
Sense attribute is divided into two classes (such as main Sensitive Attributes, auxiliary Sensitive Attributes) and carries out the more targeted division of data afterwards.
S20:According to diversity judged result, data equivalence class partition is carried out;
This step is the cluster process of data, and in this step, the diversity of the species of Sensitive Attributes value is different, performs difference
Equivalence class partition mode.If for example, judge the species S of Sensitive Attributes valuec>=l, then single equivalence class partition is performed, by data
Record in table per l-1 quantity is classified as same equivalence class, and has higher similarity between these records, if single etc.
There is remaining record after the division of valency class terminates, in raw data table without division classification, then perform candidate's equivalence class partition, check
The equivalence class for whether having candidate can be classified as one kind with it, and candidate's DEFINED BY EQUIVALENT CLASS is with higher similarity and with identical
In the classification of Sensitive Attributes value, if judging the species S of Sensitive Attributes valuec< l, then directly perform candidate's equivalence class partition.
S30:Result after equivalence class partition is subjected to data segmentation.
In this step, tables of data that step S20 clusterings have been sorted out, according to the dividing condition of multiple equivalence classes, point
Multiple single tables of data are cut into, point of tables of data can be carried out with individual marking attribute, quasi- identity property, Sensitive Attributes classification
Cut, lower the contact between data, so as to reach the purpose of secret protection.
The data table information distortion factor after the step of performing the present embodiment is low, and after issue, the probability of privacy leakage is low, data
The availability in later stage is good.
Embodiment 2
As shown in Fig. 2 the present embodiment and the difference of embodiment before are, the present embodiment provides a kind of with specific cluster
The method for secret protection of the data publication of algorithm, the species progress of the Sensitive Attributes value of data described in the step S10 are various
Property judge be specially:For the species of the Sensitive Attributes value of the data compared with diversity parameters L, L values are number set in advance
Value, generally, such as Sensitive Attributes value species be 7 kinds, then L values be set as 7 divided by 2 round after value 3, i.e. L=3;
Comprise the following steps in the step S20:
S21:If the species of Sensitive Attributes value is more than or equal to diversity parameters L, selection performs single equivalence class partition;This
During single equivalence class is divided in step, whether cycle criterion equivalence class length is less than L, when more than L, performs next time
Equivalence class partition process, ensure that each equivalence class meets L diversity.
S22:If the species of Sensitive Attributes value is less than diversity parameters L, selection performs candidate's equivalence class partition.
Embodiment 3
As shown in figure 3, the present embodiment and the difference of embodiment 1 are, it is more in progress after reception data in the step S10
Sample judges advance line number Data preprocess, and data prediction is to consider that attribute different in data issuing process has different power
Weight, exploitation right weight cluster data is realized simply, can more be embodied the similarity of data, be specifically included following steps:
S11:Each standard identifier property value is standardized, the standard identifier property value is mapped to [0,1] scope;Logarithm
The property value of value type, [0,1] scope, new standard identifier property value=(former fiducial mark are mapped to using extreme difference standardized calculation formula
Know symbol property value-minimum)/(maximum-minimum);
Specifically, it is assumed that raw data table T=< ID, QI1,QI2,...,QIn, SA >, there is n standard identifier attribute,
η record, i.e., the object of η cluster, each clustering object have n key element;
Assuming that attribute codomain is [xmin,xmax], the calculation formula standardized using extreme difference is mapped to [0,1] scope, specifically
Formula is as follows:
Wherein, xijFor the standard identifier property value after standardization, x hereafterijAlso this implication, x are representedi'jRepresent i-th
Numerical value corresponding to j-th of quasi- identity property (QI) of individual record, i.e., former standard identifier property value;
To categorical attribute, then property value is first mapped to natural sequence, such as sex, has man, two kinds of female, be mapped as 1 He
2, then reuse extreme difference standardized calculation formula and be mapped to [0,1] scope, it is necessary to which explanation is the category when classification only has two kinds
Property value codomain need be expanded to [0,3] scope, such calculation error is relatively small.
S12:Calculate the weight of standard identifier attribute;Each standard identifier attribute presses column distribution in tables of data, per dependent of dead military hero
Property variance can reflect the tight ness rating of attribute value, when formed objects occur for smaller variance attribute and the attribute of greater variance
During change, the attribute information loss amount of greater variance is smaller, i.e. the attribute of greater variance occupies bigger weight, so this step
Middle to calculate weight using variance calculation formula, the calculation formula is as follows:
Wherein, Vj(1≤j≤n) be per Column Properties variance, avgj(1≤j≤n) be per Column Properties average value, wj(1
≤ j≤n) be per Column Properties weight.
S13:The integrated value of every record is calculated, the calculation formula of the integrated value is as follows:
Wherein:WiRepresent integrated value, wjRepresent weight, xijThe standard identifier property value after standardization is represented, n indicates n
Individual standard identifier attribute, η indicate that η bars record.
Further, the data prediction also includes:
S14:To sequence of the every record according to the progress of integrated value size from small to large or from big to small in data.
Assuming that raw data table is table 1:
Table 1
Name | Age | Race | Sex | Disease |
Alicy | 21 | Black | F | Flu |
Lucy | 45 | White | F | HIV |
Tom | 36 | Black | M | Gastritis |
Helen | 18 | White | M | Obesity |
David | 56 | White | M | Cancer |
Bob | 21 | Black | M | Dyspepsia |
Linda | 43 | Black | F | Gastritis |
Wherein { Name } is individual marking attribute, and attribute set { Age, Race, Sex } is standard identifier attribute,
{ Disease } is Sensitive Attributes.
The new data table such as table 2 after the present embodiment data prediction:
Table 2
Name | Age | Race | Sex | Disease | Integrated value |
Helen | 0.086956522 | 0.333 | 0.667 | Obesity | 0.277806565 |
Alicy | 0.152173913 | 0.667 | 0.333 | Flu | 0.312889739 |
Bob | 0.152173913 | 0.667 | 0.667 | Dyspepsia | 0.390053425 |
Lucy | 0.673913043 | 0.333 | 0.333 | HIV | 0.516391444 |
Tom | 0.47826087 | 0.667 | 0.667 | Gastritis | 0.565469295 |
Linda | 0.630434783 | 0.667 | 0.333 | Gastritis | 0.570166348 |
David | 0.913043478 | 0.333 | 0.667 | Cancer | 0.722193435 |
Further, the equivalence class partition that data are performed described in the step S20 comprises the following steps:
S201:The record of predetermined number is selected to be divided into same equivalence class successively according to integrated value size;In this step according to
Equivalence class is divided according to the similarity of integrated value, can obtain the maximum similarity cluster data between different records.
S202:Determine whether that remaining record is not carried out step S201;The integrality of data equivalence class partition can be ensured.
S203:If there is remaining record to be not carried out step S201, candidate's equivalence class partition is performed.
It is described execution candidate's equivalence class partition be specially:
Judge that data whether there is candidate's equivalence class, candidate's DEFINED BY EQUIVALENT CLASS is integrated value difference minimum and had identical quick
Feel the equivalence class of property value;
If in the presence of data are referred into best candidate equivalence class, if being not present, data are referred into optimal equivalence
Class, optimal DEFINED BY EQUIVALENT CLASS are the minimum equivalence class of integrated value difference.
Data are as shown in table 3 after above-mentioned table 2 performs the present embodiment equivalence class partition:
Table 3
Wherein GroupID numbers for equivalence class, and numbering identical is same equivalence class.
After equivalence class partition terminates in the present embodiment, Sensitive Attributes and standard identifier attribute are separated using cutting techniques,
The data of table 3 are divided into as shown in 5 two tables of table 4 and table:
Table 4
Age | Race | Sex | GroupID |
18 | White | M | 1 |
21 | Black | F | 1 |
21 | Black | M | 2 |
45 | White | F | 2 |
36 | Black | M | 3 |
43 | Black | F | 3 |
56 | White | M | 3 |
Table 5
GroupID | Disease |
1 | Obesity |
1 | Flu |
2 | Dyspepsia |
2 | HIV |
3 | Gastritis |
3 | Gastritis |
3 | Cancer |
It is further comprising the steps of after the step S30:
S40:Issued after data after segmentation are attached by predicable.
In this step, the data after segmentation are attached using cartesian product, privacy is reached with the information for producing unnecessary
The purpose of protection, and will not destroy and reduce the degree of loss of information.
Tables of data after being connected using cartesian product is as shown in table 6:
Table 6
The method for secret protection for a kind of data publication that the present embodiment provides so that data obtain more orderly, more complete
Arrangement, practicality more preferably, can resisting various attacks, the privacy for issuing data obtained further protection.
Embodiment 4
As shown in figure 4, the present embodiment provides a kind of intimacy protection system of data publication, the system is used for above-mentioned implementation
A kind of realization of 1 and 2 method for secret protection of data publication of example, the system include:
Judge module 100, for receiving data, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
Equivalence class partition module 200, for according to diversity judged result, carrying out data equivalence class partition;
Split module 300, for the result after equivalence class partition to be carried out into data segmentation.
The system also includes:
Release module 400, for being issued after the data after segmentation are attached by predicable;
The judge module 100 includes:
Computing unit 110, for calculating the integrated value of every record;
Sequencing unit 120, for being ranked up to every record in data according to integrated value size;
Comparing unit 130, for the species of the Sensitive Attributes value of the data to be compared with diversity parameters L.
The equivalence class partition module 200 includes:
First division unit 210, for performing single equivalence class partition;
Second division unit 220, for performing candidate's equivalence class partition;
First division unit 210 includes:
First judgment sub-unit 211, for determining whether that remaining record is not carried out selecting successively according to integrated value size
The record of predetermined number is divided into same equivalence class;
Second division unit 220 includes:
Second judgment sub-unit 221, for judging that data whether there is candidate's equivalence class;
Sorting out subelement 222, if for candidate's equivalence class be present, data being referred to best candidate equivalence class, if not
Candidate's equivalence class be present, then data are referred to optimal equivalence class.
The present embodiment provides a kind of intimacy protection system of data publication, using the more preferable clustering algorithm of practicality, will have
There are the data clusters of optimal similarity, the integrality of data is higher, and the availability of data is higher, also general by the way of input is set
Change data, the degree of loss of whole process data reduces, and connection increase redundant information disturbs data to data again after reasonable segmentation,
So that the secret protection degree of data greatly improves.
Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led
The technical staff in domain can be made various modifications or supplement to described specific embodiment or be replaced using similar mode
Generation, but without departing from the spiritual of the present invention or surmount scope defined in appended claims.
Claims (10)
1. a kind of method for secret protection of data publication, it is characterised in that the described method comprises the following steps:
S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
S20:According to diversity judged result, data equivalence class partition is carried out;
S30:Result after equivalence class partition is subjected to data segmentation.
2. the method for secret protection of a kind of data publication according to claim 1, it is characterised in that in the step S10
The species of the Sensitive Attributes value of the data carries out diversity judgement:The species of the Sensitive Attributes value of the data with it is more
Sample parameter L compares;
Comprise the following steps in the step S20:
S21:If the species of Sensitive Attributes value is more than or equal to diversity parameters L, selection performs single equivalence class partition;
S22:If the species of Sensitive Attributes value is less than diversity parameters L, selection performs candidate's equivalence class partition.
3. the method for secret protection of a kind of data publication according to claim 1, it is characterised in that in the step S10
Judge advance line number Data preprocess in progress diversity after reception data, comprise the following steps:
S11:Each standard identifier property value is standardized, the standard identifier property value is mapped to [0,1] scope;
S12:Calculate the weight of standard identifier attribute;
S13:The integrated value of every record is calculated, the calculation formula of the integrated value is as follows:
<mrow>
<msub>
<mi>W</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msub>
<mi>w</mi>
<mi>j</mi>
</msub>
<mo>&times;</mo>
<msub>
<mi>x</mi>
<mrow>
<mi>i</mi>
<mi>j</mi>
</mrow>
</msub>
<mo>,</mo>
<mn>1</mn>
<mo>&le;</mo>
<mi>i</mi>
<mo>&le;</mo>
<mi>&eta;</mi>
</mrow>
Wherein:WiRepresent integrated value, wjRepresent weight, xijThe standard identifier property value after standardization is represented, n indicates n fiducial mark
Know symbol attribute, η indicates that η bars record.
A kind of 4. method for secret protection of data publication according to claim 3, it is characterised in that the data prediction
Also include:
S14:Every record in data is ranked up according to integrated value size.
5. the method for secret protection of a kind of data publication according to claim 4, it is characterised in that in the step S20
The equivalence class partition for performing data comprises the following steps:
S201:The record of predetermined number is selected to be divided into same equivalence class successively according to integrated value size;
S202:Determine whether that remaining record is not carried out step S201;
S203:If there is remaining record to be not carried out step S201, candidate's equivalence class partition is performed.
6. the method for secret protection of a kind of data publication according to claim 2 or 5, it is characterised in that described to perform time
The equivalence class partition is selected to be specially:
Judge that data whether there is candidate's equivalence class, if in the presence of, data are referred to best candidate equivalence class, if being not present,
Data are then referred to optimal equivalence class.
A kind of 7. method for secret protection of data publication according to claim 1, it is characterised in that the step S30 it
It is further comprising the steps of afterwards:
S40:Issued after data after segmentation are attached by predicable.
8. a kind of intimacy protection system of data publication, it is characterised in that the system includes:
Judge module, for receiving data, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
Equivalence class partition module, for according to diversity judged result, carrying out data equivalence class partition;
Split module, for the result after equivalence class partition to be carried out into data segmentation.
9. the intimacy protection system of a kind of data publication according to claim 8, it is characterised in that the system is also wrapped
Include:
Release module, for being issued after the data after segmentation are attached by predicable;
The judge module includes:
Computing unit, for calculating the integrated value of every record;
Sequencing unit, for being ranked up to every record in data according to integrated value size;
Comparing unit, for the species of the Sensitive Attributes value of the data to be compared with diversity parameters L.
10. the intimacy protection system of a kind of data publication according to claim 8, it is characterised in that the equivalence class is drawn
Sub-module includes:
First division unit, for performing single equivalence class partition;
Second division unit, for performing candidate's equivalence class partition;
First division unit includes:
First judgment sub-unit, for determining whether that remaining record is not carried out selecting predetermined number successively according to integrated value size
Record be divided into same equivalence class;
Second division unit includes:
Second judgment sub-unit, for judging that data whether there is candidate's equivalence class;
Sorting out subelement, if for candidate's equivalence class be present, data being referred to best candidate equivalence class, if candidate is not present
Equivalence class, then data are referred to optimal equivalence class.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711115389.4A CN107832631A (en) | 2017-11-13 | 2017-11-13 | The method for secret protection and system of a kind of data publication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711115389.4A CN107832631A (en) | 2017-11-13 | 2017-11-13 | The method for secret protection and system of a kind of data publication |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107832631A true CN107832631A (en) | 2018-03-23 |
Family
ID=61654266
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711115389.4A Withdrawn CN107832631A (en) | 2017-11-13 | 2017-11-13 | The method for secret protection and system of a kind of data publication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107832631A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492429A (en) * | 2018-10-30 | 2019-03-19 | 华南师范大学 | A kind of method for secret protection of data publication |
CN109726589A (en) * | 2018-12-22 | 2019-05-07 | 北京工业大学 | A kind of private data access method towards many intelligence cloud environments |
CN109857780A (en) * | 2019-01-17 | 2019-06-07 | 西北大学 | A kind of linear-orthogonal data dissemination method for statistical query attack |
CN110348238A (en) * | 2019-05-28 | 2019-10-18 | 北京邮电大学 | A kind of application oriented secret protection stage division and device |
CN110968887A (en) * | 2018-09-28 | 2020-04-07 | 第四范式(北京)技术有限公司 | Method and system for executing machine learning under data privacy protection |
CN111046431A (en) * | 2019-12-13 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Data processing method, query method, device, electronic equipment and system |
CN111159730A (en) * | 2019-12-13 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Data processing method, query method, device, electronic equipment and system |
CN111241581A (en) * | 2020-01-09 | 2020-06-05 | 山东师范大学 | Multi-sensitive attribute privacy protection method and system based on sensitivity layering |
-
2017
- 2017-11-13 CN CN201711115389.4A patent/CN107832631A/en not_active Withdrawn
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968887A (en) * | 2018-09-28 | 2020-04-07 | 第四范式(北京)技术有限公司 | Method and system for executing machine learning under data privacy protection |
CN110968887B (en) * | 2018-09-28 | 2022-04-05 | 第四范式(北京)技术有限公司 | Method and system for executing machine learning under data privacy protection |
CN109492429A (en) * | 2018-10-30 | 2019-03-19 | 华南师范大学 | A kind of method for secret protection of data publication |
CN109492429B (en) * | 2018-10-30 | 2020-10-16 | 华南师范大学 | Privacy protection method for data release |
CN109726589A (en) * | 2018-12-22 | 2019-05-07 | 北京工业大学 | A kind of private data access method towards many intelligence cloud environments |
CN109726589B (en) * | 2018-12-22 | 2021-11-12 | 北京工业大学 | Crowd-sourcing cloud environment-oriented private data access method |
CN109857780A (en) * | 2019-01-17 | 2019-06-07 | 西北大学 | A kind of linear-orthogonal data dissemination method for statistical query attack |
CN109857780B (en) * | 2019-01-17 | 2023-04-28 | 西北大学 | Linear-orthogonal data publishing method for statistical query attack |
CN110348238A (en) * | 2019-05-28 | 2019-10-18 | 北京邮电大学 | A kind of application oriented secret protection stage division and device |
CN111046431A (en) * | 2019-12-13 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Data processing method, query method, device, electronic equipment and system |
CN111159730A (en) * | 2019-12-13 | 2020-05-15 | 支付宝(杭州)信息技术有限公司 | Data processing method, query method, device, electronic equipment and system |
CN111241581A (en) * | 2020-01-09 | 2020-06-05 | 山东师范大学 | Multi-sensitive attribute privacy protection method and system based on sensitivity layering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107832631A (en) | The method for secret protection and system of a kind of data publication | |
Mei et al. | Sgnn: A graph neural network based federated learning approach by hiding structure | |
Nahar et al. | Sentiment analysis for effective detection of cyber bullying | |
Chen et al. | Differentially private transit data publication: a case study on the montreal transportation system | |
Duczmal et al. | A genetic algorithm for irregularly shaped spatial scan statistics | |
CN107092929B (en) | Criminal case association series-parallel method and system based on clustering technology | |
CN106909643A (en) | The social media big data motif discovery method of knowledge based collection of illustrative plates | |
Araújo et al. | Identifying important characteristics in the KDD99 intrusion detection dataset by feature selection using a hybrid approach | |
CN107992887A (en) | Classifier generation method, sorting technique, device, electronic equipment and storage medium | |
CN103136372B (en) | URL quick position, classification and filter method in network trusted sexual behaviour management | |
Kiabod et al. | TSRAM: A time-saving k-degree anonymization method in social network | |
CN107358116A (en) | A kind of method for secret protection in multi-sensitive attributes data publication | |
Li et al. | Intelligent anti-money laundering solution based upon novel community detection in massive transaction networks on spark | |
CN107483451A (en) | Based on serial parallel structural network secure data processing method and system, social networks | |
CN104836805A (en) | Network intrusion detection method based on fuzzy immune theory | |
Chi et al. | Privacy preserving record linkage in the presence of missing values | |
CN107070932B (en) | Anonymous method for preventing label neighbor attack in social network dynamic release | |
CN106874788A (en) | A kind of method for secret protection in sensitive data issue | |
Williams et al. | Black-box sparse adversarial attack via multi-objective optimisation | |
CN107291930A (en) | The computational methods of weight number | |
Wang et al. | An evolutionary computation-based machine learning for network attack detection in big data traffic | |
Zheng et al. | Tegdetector: a phishing detector that knows evolving transaction behaviors | |
CN106557983B (en) | Microblog junk user detection method based on fuzzy multi-class SVM | |
CN107704872A (en) | A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method | |
Vasan et al. | Feature subset selection for intrusion detection using various rank-based algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200818 Address after: 318015 no.2-3167, zone a, Nonggang City, no.2388, Donghuan Avenue, Hongjia street, Jiaojiang District, Taizhou City, Zhejiang Province Applicant after: Taizhou Jiji Intellectual Property Operation Co.,Ltd. Address before: 201616 Shanghai city Songjiang District Sixian Road No. 3666 Applicant before: Phicomm (Shanghai) Co.,Ltd. |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180323 |