CN107832631A - The method for secret protection and system of a kind of data publication - Google Patents

The method for secret protection and system of a kind of data publication Download PDF

Info

Publication number
CN107832631A
CN107832631A CN201711115389.4A CN201711115389A CN107832631A CN 107832631 A CN107832631 A CN 107832631A CN 201711115389 A CN201711115389 A CN 201711115389A CN 107832631 A CN107832631 A CN 107832631A
Authority
CN
China
Prior art keywords
data
equivalence class
record
diversity
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201711115389.4A
Other languages
Chinese (zh)
Inventor
唐雪琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taizhou Jiji Intellectual Property Operation Co.,Ltd.
Original Assignee
Shanghai Feixun Data Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Feixun Data Communication Technology Co Ltd filed Critical Shanghai Feixun Data Communication Technology Co Ltd
Priority to CN201711115389.4A priority Critical patent/CN107832631A/en
Publication of CN107832631A publication Critical patent/CN107832631A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6263Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a kind of method for secret protection of data publication, the described method comprises the following steps:S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;Ensure that follow-up equivalence class partition has identical diversity;S20:According to diversity judged result, data equivalence class partition is carried out;S30:Result after equivalence class partition is subjected to data segmentation.The present invention realizes simple and convenient, and the data after being handled using the present invention have higher secret protection degree, relatively low information loss degree and preferable availability, practical, can resist a variety of privacies pry attacks.

Description

The method for secret protection and system of a kind of data publication
Technical field
The present invention relates to the method for secret protection and system in information safety protection field, more particularly to a kind of data publication.
Background technology
With the high speed development of internet, dependence of the people to network is also progressively deepened, and data message amount rapidly increases, when While network provides convenient to people, such as shopping online, transfer accounts, order air ticket and need not walk out door, can be square on network Just quickly realize, there is also substantial amounts of information leakage risk, such as individual privacy information, medical data, account number cipher, bank card Information, trade secret information etc. are easily utilized after spreading through the internet, and cause identity leakage, damage to property etc., Serious even can life threatening health.As can be seen here, the importance of information safety protection.After " prism door " event, respectively Also all in the security protection of Strengthens network, this brings new opportunities and challenges to data safety and secret protection for state.
For ensure data privacy, carry out data publication and it is shared while, it is necessary to data carry out secret protection Processing.At present, issue tables of data is generally divided into three generic attributes:(1) individual marking attribute (Individually Identifier Attribute, ID), individual identity attribute can be identified;(2) quasi- identity property (Quasi-dientifier, QI), deposits simultaneously In privacy table and appearance, mark can be utilized link to and deduce individual information, Sensitive Attributes (Sensitive Attribute, SA), the packet of record is not intended to by the privacy information known to other people containing user.
Privacy leakage can not be prevented by only deleting QI attributes or ID attributes when being issued to above-mentioned three generic attribute, when issue Data and other data be attached and may result in identity information and Sensitive Attributes are revealed;The it is proposeds such as Sweeney in 2002 K- anonymity secret protection models, can effectively prevent connection from attacking, but k- is anonymous without constraint Sensitive Attributes value, enter without Background knowledge attack and homogeneity sexual assault can be prevented;Effectively to solve the above problems, l- diversity (l-diversity), (α, k) Anonymous, t approaches (t closeness) etc. and is suggested successively, and these secret protection models mainly use to the processing procedure of data Generalization, extensive realization, this processing method maintain original semantic information substantially, but can cause information loss and reduce data Effectiveness.
In recent years, clustering algorithm is largely used in data mining, and the data publication of secret protection requires the number announced Identical standard identifier is arrived according to each cluster generalization of concentration, this is quite similar with the cluster process in data mining, then just has Using clustering method realize the multifarious researchs of l-.
Patent document if notification number is CN104317904A discloses " a kind of extensive method of Weight community network ", Including:Descending sort is carried out according to node degree to node and is grouped;The weight on extensive existing side, and calculate side and exist generally Rate;All node Sensitive Attributes formation Sensitive Attributes bags are extracted after traveling through all anonymous group collection;Sensitive Attributes between calculate node The maximum comparability of bag, according to extensive tree, obtain the extensive bag of Sensitive Attributes bag;The anonymous group collection of K- weights is traveled through, is finally given Meet that K-Weighted-inv-l-diversityanonymous schemes.The invention considers the weight on side, and how quick considers The problem of feeling attribute so that method for secret protection is more applicable for actual community network, can more completely protect Weight Multi-sensitive attributes in figure.
And for example Publication No. CN106874788A patent document discloses a kind of " secret protection in sensitive data issue Method, including:Receive the data set from user and corresponding multiple generalization input trees, each group of number that ergodic data is concentrated According to, and judge that each column data in this group of data whether there is corresponding generalization input tree successively, if it is present according to this The property value of data node corresponding to lookup in corresponding generalization input tree, and the information of the node is input to coordinate array In, so as to obtain m row coordinate arrays, and it is every if it does not exist, then directly by the property value input coordinate array of the data The flag bit that individual coordinate array addition initial value is 0, establishes p cluster, wherein p rows coordinate is randomly choosed from m row coordinate arrays Central point of the array respectively as p cluster of foundation.The method generally changed again by first clustering, improves computational efficiency, is data Privacy issue provide guarantee.
The content of the invention
The technical problem to be solved in the present invention is to be directed to above-mentioned the deficiencies in the prior art, there is provided one kind realize it is simple and convenient, The method for secret protection of data publication with higher secret protection degree, relatively low information loss degree and preferable availability and System.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of method for secret protection of data publication, the described method comprises the following steps:
S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
S20:According to diversity judged result, data equivalence class partition is carried out;
S30:Result after equivalence class partition is subjected to data segmentation.
Further, the Sensitive Attributes value of data described in the step S10 species carry out diversity judge be specially: The species of the Sensitive Attributes value of the data is compared with diversity parameters L;
Comprise the following steps in the step S20:
S21:If the species of Sensitive Attributes value is more than or equal to diversity parameters L, selection performs single equivalence class partition;
S22:If the species of Sensitive Attributes value is less than diversity parameters L, selection performs candidate's equivalence class partition.
Further, advance line number Data preprocess is judged in progress diversity after reception data in the step S10, including Following steps:
S11:Each standard identifier property value is standardized, the standard identifier property value is mapped to [0,1] scope;
S12:Calculate the weight of standard identifier attribute;
S13:The integrated value of every record is calculated, the calculation formula of the integrated value is as follows:
Wherein:WiRepresent integrated value, wjRepresent weight, xijThe standard identifier property value after standardization is represented, n indicates n Individual standard identifier attribute, η indicate that η bars record.
Further, the data prediction also includes:
S14:Every record in data is ranked up according to integrated value size.
Further, the equivalence class partition that data are performed described in the step S20 comprises the following steps:
S201:The record of predetermined number is selected to be divided into same equivalence class successively according to integrated value size;
S202:Determine whether that remaining record is not carried out step S201;
S203:If there is remaining record to be not carried out step S201, candidate's equivalence class partition is performed.
Further, execution candidate's equivalence class partition is specially:
Judge that data whether there is candidate's equivalence class, if in the presence of data being referred into best candidate equivalence class, if not depositing Data are then being referred to optimal equivalence class.
Further, it is further comprising the steps of after the step S30:
S40:Issued after data after segmentation are attached by predicable.
A kind of intimacy protection system of data publication, the system include:
Judge module, for receiving data, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
Equivalence class partition module, for according to diversity judged result, carrying out data equivalence class partition;
Split module, for the result after equivalence class partition to be carried out into data segmentation.
Further, the system also includes:
Release module, for being issued after the data after segmentation are attached by predicable;
The judge module includes:
Computing unit, for calculating the integrated value of every record;
Sequencing unit, for being ranked up to every record in data according to integrated value size;
Comparing unit, for the species of the Sensitive Attributes value of the data to be compared with diversity parameters L.
Further, the equivalence class partition module includes:
First division unit, for performing single equivalence class partition;
Second division unit, for performing candidate's equivalence class partition;
First division unit includes:
First judgment sub-unit, for determining whether that remaining record is not carried out selecting to preset successively according to integrated value size The record of quantity is divided into same equivalence class;
Second division unit includes:
Second judgment sub-unit, for judging that data whether there is candidate's equivalence class;
Sorting out subelement, if for candidate's equivalence class be present, data being referred to best candidate equivalence class, if being not present Candidate's equivalence class, then data are referred to optimal equivalence class.
After adopting the above technical scheme, the beneficial effects of the invention are as follows:
Equivalence class partition is carried out by the way that data are based on into diversity, does not destroy the original letter of data during equivalence class partition Breath, similar element set is combined together, advantageously allowing data has preferable availability, more standardizes;
By the separated contact for reducing data of the data after equivalence class partition, Sensitive Attributes, standard identifier category are not changed The initial data of property, it is smaller to the degree of loss of data, while be advantageous to secret protection after data publication so that data are safer;
By the way that equivalence class partition is divided into the single equivalence class partition dividing mode different with candidate's equivalence class, can more there is pin To property and more complete processing data, the secret protection of each record is ensured;
By calculating in data the integrated value of every record, and according to integrated value size by the record with optimal similarity Assemble an equivalence class so that data processing simply easily realizes that data assignment is more orderly, and practicality is more preferable, is advantageous to itself Excavation and integration to data so that the availability of data greatly improves, and whole process information loss degree is low;
Issued after data after segmentation are attached by predicable, the predicable can drawing according to equivalence class Divide result Custom Attributes, it is general that real information is disturbed using increase redundant information, the purpose of data-privacy protection is reached, is made Data degree of loss it is low and secret protection effect is good.
Brief description of the drawings
In order to illustrate the embodiments of the present invention more clearly or prior art technical scheme, accompanying drawing is as follows:
Fig. 1 is a kind of method for secret protection flow chart for data publication that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of method for secret protection flow chart for data publication that the embodiment of the present invention 2 provides;
Fig. 3 is a kind of method for secret protection flow chart for data publication that the embodiment of the present invention 3 provides;
Fig. 4 is a kind of intimacy protection system block diagram for data publication that the embodiment of the present invention 4 provides.
Embodiment
It is the specific embodiment of the present invention and with reference to accompanying drawing below, technical scheme is further described, But the present invention is not limited to these embodiments.
The data publication of secret protection mainly handles standard identifier attribute, either using generalization/hiding method still Clustering method, all it is in alignment with identifier attribute and is handled.
Of the invention mainly use clusters and split two technologies, realizes the secret protection of issue data, particularly static number According to secret protection.
Clustering technique is to be combined together similar element set, the similitude of analyze data, and different scenes demand is different, is gathered The algorithm of class is also not quite similar.The design of Clustering Model is to realize a step more crucial in the present invention, different Clustering Models Have different clustering algorithms.
The it is proposed of cutting techniques, new approaches and method are provided for the secret protection of data publication, it does not change Sensitive Attributes, The raw value of quasi- identity property, reach the purpose of secret protection by reducing contacting for Sensitive Attributes and quasi- identity property, Usually, the standard identifier attribute (QI) in data set and Sensitive Attributes (SA) are divided into two using cutting techniques to be mutually not attached to Two datasets after issued, multiple data sets can also be divided into according to the related law of data set.
Present invention more detailed description is as follows:
Embodiment 1
As shown in figure 1, the present embodiment provides a kind of method for secret protection of data publication, methods described includes following step Suddenly:
S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
In this step, the data for having secret protection demand, the individual of a generally tables of data, first understanding tables of data are received Identity property, quasi- identity property, Sensitive Attributes, the sensitive data in initial data is determined, that is, checks the Sensitive Attributes value of data, Counting the species of different Sensitive Attributes values in the Sensitive Attributes column of tables of data has several, such as the species of Sensitive Attributes value is Sc, based on ScSize carry out tables of data different demarcation, typically by set a diversity parameters l, judge ScWith l pass System, will meet that the multifarious records of l- are sorted out, two diversity parameters l of the setting also having and l ', l and l ' value difference, will be quick Sense attribute is divided into two classes (such as main Sensitive Attributes, auxiliary Sensitive Attributes) and carries out the more targeted division of data afterwards.
S20:According to diversity judged result, data equivalence class partition is carried out;
This step is the cluster process of data, and in this step, the diversity of the species of Sensitive Attributes value is different, performs difference Equivalence class partition mode.If for example, judge the species S of Sensitive Attributes valuec>=l, then single equivalence class partition is performed, by data Record in table per l-1 quantity is classified as same equivalence class, and has higher similarity between these records, if single etc. There is remaining record after the division of valency class terminates, in raw data table without division classification, then perform candidate's equivalence class partition, check The equivalence class for whether having candidate can be classified as one kind with it, and candidate's DEFINED BY EQUIVALENT CLASS is with higher similarity and with identical In the classification of Sensitive Attributes value, if judging the species S of Sensitive Attributes valuec< l, then directly perform candidate's equivalence class partition.
S30:Result after equivalence class partition is subjected to data segmentation.
In this step, tables of data that step S20 clusterings have been sorted out, according to the dividing condition of multiple equivalence classes, point Multiple single tables of data are cut into, point of tables of data can be carried out with individual marking attribute, quasi- identity property, Sensitive Attributes classification Cut, lower the contact between data, so as to reach the purpose of secret protection.
The data table information distortion factor after the step of performing the present embodiment is low, and after issue, the probability of privacy leakage is low, data The availability in later stage is good.
Embodiment 2
As shown in Fig. 2 the present embodiment and the difference of embodiment before are, the present embodiment provides a kind of with specific cluster The method for secret protection of the data publication of algorithm, the species progress of the Sensitive Attributes value of data described in the step S10 are various Property judge be specially:For the species of the Sensitive Attributes value of the data compared with diversity parameters L, L values are number set in advance Value, generally, such as Sensitive Attributes value species be 7 kinds, then L values be set as 7 divided by 2 round after value 3, i.e. L=3;
Comprise the following steps in the step S20:
S21:If the species of Sensitive Attributes value is more than or equal to diversity parameters L, selection performs single equivalence class partition;This During single equivalence class is divided in step, whether cycle criterion equivalence class length is less than L, when more than L, performs next time Equivalence class partition process, ensure that each equivalence class meets L diversity.
S22:If the species of Sensitive Attributes value is less than diversity parameters L, selection performs candidate's equivalence class partition.
Embodiment 3
As shown in figure 3, the present embodiment and the difference of embodiment 1 are, it is more in progress after reception data in the step S10 Sample judges advance line number Data preprocess, and data prediction is to consider that attribute different in data issuing process has different power Weight, exploitation right weight cluster data is realized simply, can more be embodied the similarity of data, be specifically included following steps:
S11:Each standard identifier property value is standardized, the standard identifier property value is mapped to [0,1] scope;Logarithm The property value of value type, [0,1] scope, new standard identifier property value=(former fiducial mark are mapped to using extreme difference standardized calculation formula Know symbol property value-minimum)/(maximum-minimum);
Specifically, it is assumed that raw data table T=< ID, QI1,QI2,...,QIn, SA >, there is n standard identifier attribute, η record, i.e., the object of η cluster, each clustering object have n key element;
Assuming that attribute codomain is [xmin,xmax], the calculation formula standardized using extreme difference is mapped to [0,1] scope, specifically Formula is as follows:
Wherein, xijFor the standard identifier property value after standardization, x hereafterijAlso this implication, x are representedi'jRepresent i-th Numerical value corresponding to j-th of quasi- identity property (QI) of individual record, i.e., former standard identifier property value;
To categorical attribute, then property value is first mapped to natural sequence, such as sex, has man, two kinds of female, be mapped as 1 He 2, then reuse extreme difference standardized calculation formula and be mapped to [0,1] scope, it is necessary to which explanation is the category when classification only has two kinds Property value codomain need be expanded to [0,3] scope, such calculation error is relatively small.
S12:Calculate the weight of standard identifier attribute;Each standard identifier attribute presses column distribution in tables of data, per dependent of dead military hero Property variance can reflect the tight ness rating of attribute value, when formed objects occur for smaller variance attribute and the attribute of greater variance During change, the attribute information loss amount of greater variance is smaller, i.e. the attribute of greater variance occupies bigger weight, so this step Middle to calculate weight using variance calculation formula, the calculation formula is as follows:
Wherein, Vj(1≤j≤n) be per Column Properties variance, avgj(1≤j≤n) be per Column Properties average value, wj(1 ≤ j≤n) be per Column Properties weight.
S13:The integrated value of every record is calculated, the calculation formula of the integrated value is as follows:
Wherein:WiRepresent integrated value, wjRepresent weight, xijThe standard identifier property value after standardization is represented, n indicates n Individual standard identifier attribute, η indicate that η bars record.
Further, the data prediction also includes:
S14:To sequence of the every record according to the progress of integrated value size from small to large or from big to small in data.
Assuming that raw data table is table 1:
Table 1
Name Age Race Sex Disease
Alicy 21 Black F Flu
Lucy 45 White F HIV
Tom 36 Black M Gastritis
Helen 18 White M Obesity
David 56 White M Cancer
Bob 21 Black M Dyspepsia
Linda 43 Black F Gastritis
Wherein { Name } is individual marking attribute, and attribute set { Age, Race, Sex } is standard identifier attribute, { Disease } is Sensitive Attributes.
The new data table such as table 2 after the present embodiment data prediction:
Table 2
Name Age Race Sex Disease Integrated value
Helen 0.086956522 0.333 0.667 Obesity 0.277806565
Alicy 0.152173913 0.667 0.333 Flu 0.312889739
Bob 0.152173913 0.667 0.667 Dyspepsia 0.390053425
Lucy 0.673913043 0.333 0.333 HIV 0.516391444
Tom 0.47826087 0.667 0.667 Gastritis 0.565469295
Linda 0.630434783 0.667 0.333 Gastritis 0.570166348
David 0.913043478 0.333 0.667 Cancer 0.722193435
Further, the equivalence class partition that data are performed described in the step S20 comprises the following steps:
S201:The record of predetermined number is selected to be divided into same equivalence class successively according to integrated value size;In this step according to Equivalence class is divided according to the similarity of integrated value, can obtain the maximum similarity cluster data between different records.
S202:Determine whether that remaining record is not carried out step S201;The integrality of data equivalence class partition can be ensured.
S203:If there is remaining record to be not carried out step S201, candidate's equivalence class partition is performed.
It is described execution candidate's equivalence class partition be specially:
Judge that data whether there is candidate's equivalence class, candidate's DEFINED BY EQUIVALENT CLASS is integrated value difference minimum and had identical quick Feel the equivalence class of property value;
If in the presence of data are referred into best candidate equivalence class, if being not present, data are referred into optimal equivalence Class, optimal DEFINED BY EQUIVALENT CLASS are the minimum equivalence class of integrated value difference.
Data are as shown in table 3 after above-mentioned table 2 performs the present embodiment equivalence class partition:
Table 3
Wherein GroupID numbers for equivalence class, and numbering identical is same equivalence class.
After equivalence class partition terminates in the present embodiment, Sensitive Attributes and standard identifier attribute are separated using cutting techniques, The data of table 3 are divided into as shown in 5 two tables of table 4 and table:
Table 4
Age Race Sex GroupID
18 White M 1
21 Black F 1
21 Black M 2
45 White F 2
36 Black M 3
43 Black F 3
56 White M 3
Table 5
GroupID Disease
1 Obesity
1 Flu
2 Dyspepsia
2 HIV
3 Gastritis
3 Gastritis
3 Cancer
It is further comprising the steps of after the step S30:
S40:Issued after data after segmentation are attached by predicable.
In this step, the data after segmentation are attached using cartesian product, privacy is reached with the information for producing unnecessary The purpose of protection, and will not destroy and reduce the degree of loss of information.
Tables of data after being connected using cartesian product is as shown in table 6:
Table 6
The method for secret protection for a kind of data publication that the present embodiment provides so that data obtain more orderly, more complete Arrangement, practicality more preferably, can resisting various attacks, the privacy for issuing data obtained further protection.
Embodiment 4
As shown in figure 4, the present embodiment provides a kind of intimacy protection system of data publication, the system is used for above-mentioned implementation A kind of realization of 1 and 2 method for secret protection of data publication of example, the system include:
Judge module 100, for receiving data, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
Equivalence class partition module 200, for according to diversity judged result, carrying out data equivalence class partition;
Split module 300, for the result after equivalence class partition to be carried out into data segmentation.
The system also includes:
Release module 400, for being issued after the data after segmentation are attached by predicable;
The judge module 100 includes:
Computing unit 110, for calculating the integrated value of every record;
Sequencing unit 120, for being ranked up to every record in data according to integrated value size;
Comparing unit 130, for the species of the Sensitive Attributes value of the data to be compared with diversity parameters L.
The equivalence class partition module 200 includes:
First division unit 210, for performing single equivalence class partition;
Second division unit 220, for performing candidate's equivalence class partition;
First division unit 210 includes:
First judgment sub-unit 211, for determining whether that remaining record is not carried out selecting successively according to integrated value size The record of predetermined number is divided into same equivalence class;
Second division unit 220 includes:
Second judgment sub-unit 221, for judging that data whether there is candidate's equivalence class;
Sorting out subelement 222, if for candidate's equivalence class be present, data being referred to best candidate equivalence class, if not Candidate's equivalence class be present, then data are referred to optimal equivalence class.
The present embodiment provides a kind of intimacy protection system of data publication, using the more preferable clustering algorithm of practicality, will have There are the data clusters of optimal similarity, the integrality of data is higher, and the availability of data is higher, also general by the way of input is set Change data, the degree of loss of whole process data reduces, and connection increase redundant information disturbs data to data again after reasonable segmentation, So that the secret protection degree of data greatly improves.
Specific embodiment described herein is only to spirit explanation for example of the invention.Technology belonging to the present invention is led The technical staff in domain can be made various modifications or supplement to described specific embodiment or be replaced using similar mode Generation, but without departing from the spiritual of the present invention or surmount scope defined in appended claims.

Claims (10)

1. a kind of method for secret protection of data publication, it is characterised in that the described method comprises the following steps:
S10:Data are received, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
S20:According to diversity judged result, data equivalence class partition is carried out;
S30:Result after equivalence class partition is subjected to data segmentation.
2. the method for secret protection of a kind of data publication according to claim 1, it is characterised in that in the step S10 The species of the Sensitive Attributes value of the data carries out diversity judgement:The species of the Sensitive Attributes value of the data with it is more Sample parameter L compares;
Comprise the following steps in the step S20:
S21:If the species of Sensitive Attributes value is more than or equal to diversity parameters L, selection performs single equivalence class partition;
S22:If the species of Sensitive Attributes value is less than diversity parameters L, selection performs candidate's equivalence class partition.
3. the method for secret protection of a kind of data publication according to claim 1, it is characterised in that in the step S10 Judge advance line number Data preprocess in progress diversity after reception data, comprise the following steps:
S11:Each standard identifier property value is standardized, the standard identifier property value is mapped to [0,1] scope;
S12:Calculate the weight of standard identifier attribute;
S13:The integrated value of every record is calculated, the calculation formula of the integrated value is as follows:
<mrow> <msub> <mi>W</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <msub> <mi>w</mi> <mi>j</mi> </msub> <mo>&amp;times;</mo> <msub> <mi>x</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mn>1</mn> <mo>&amp;le;</mo> <mi>i</mi> <mo>&amp;le;</mo> <mi>&amp;eta;</mi> </mrow>
Wherein:WiRepresent integrated value, wjRepresent weight, xijThe standard identifier property value after standardization is represented, n indicates n fiducial mark Know symbol attribute, η indicates that η bars record.
A kind of 4. method for secret protection of data publication according to claim 3, it is characterised in that the data prediction Also include:
S14:Every record in data is ranked up according to integrated value size.
5. the method for secret protection of a kind of data publication according to claim 4, it is characterised in that in the step S20 The equivalence class partition for performing data comprises the following steps:
S201:The record of predetermined number is selected to be divided into same equivalence class successively according to integrated value size;
S202:Determine whether that remaining record is not carried out step S201;
S203:If there is remaining record to be not carried out step S201, candidate's equivalence class partition is performed.
6. the method for secret protection of a kind of data publication according to claim 2 or 5, it is characterised in that described to perform time The equivalence class partition is selected to be specially:
Judge that data whether there is candidate's equivalence class, if in the presence of, data are referred to best candidate equivalence class, if being not present, Data are then referred to optimal equivalence class.
A kind of 7. method for secret protection of data publication according to claim 1, it is characterised in that the step S30 it It is further comprising the steps of afterwards:
S40:Issued after data after segmentation are attached by predicable.
8. a kind of intimacy protection system of data publication, it is characterised in that the system includes:
Judge module, for receiving data, diversity judgement is carried out to the species of the Sensitive Attributes value of the data;
Equivalence class partition module, for according to diversity judged result, carrying out data equivalence class partition;
Split module, for the result after equivalence class partition to be carried out into data segmentation.
9. the intimacy protection system of a kind of data publication according to claim 8, it is characterised in that the system is also wrapped Include:
Release module, for being issued after the data after segmentation are attached by predicable;
The judge module includes:
Computing unit, for calculating the integrated value of every record;
Sequencing unit, for being ranked up to every record in data according to integrated value size;
Comparing unit, for the species of the Sensitive Attributes value of the data to be compared with diversity parameters L.
10. the intimacy protection system of a kind of data publication according to claim 8, it is characterised in that the equivalence class is drawn Sub-module includes:
First division unit, for performing single equivalence class partition;
Second division unit, for performing candidate's equivalence class partition;
First division unit includes:
First judgment sub-unit, for determining whether that remaining record is not carried out selecting predetermined number successively according to integrated value size Record be divided into same equivalence class;
Second division unit includes:
Second judgment sub-unit, for judging that data whether there is candidate's equivalence class;
Sorting out subelement, if for candidate's equivalence class be present, data being referred to best candidate equivalence class, if candidate is not present Equivalence class, then data are referred to optimal equivalence class.
CN201711115389.4A 2017-11-13 2017-11-13 The method for secret protection and system of a kind of data publication Withdrawn CN107832631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711115389.4A CN107832631A (en) 2017-11-13 2017-11-13 The method for secret protection and system of a kind of data publication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711115389.4A CN107832631A (en) 2017-11-13 2017-11-13 The method for secret protection and system of a kind of data publication

Publications (1)

Publication Number Publication Date
CN107832631A true CN107832631A (en) 2018-03-23

Family

ID=61654266

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711115389.4A Withdrawn CN107832631A (en) 2017-11-13 2017-11-13 The method for secret protection and system of a kind of data publication

Country Status (1)

Country Link
CN (1) CN107832631A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492429A (en) * 2018-10-30 2019-03-19 华南师范大学 A kind of method for secret protection of data publication
CN109726589A (en) * 2018-12-22 2019-05-07 北京工业大学 A kind of private data access method towards many intelligence cloud environments
CN109857780A (en) * 2019-01-17 2019-06-07 西北大学 A kind of linear-orthogonal data dissemination method for statistical query attack
CN110348238A (en) * 2019-05-28 2019-10-18 北京邮电大学 A kind of application oriented secret protection stage division and device
CN110968887A (en) * 2018-09-28 2020-04-07 第四范式(北京)技术有限公司 Method and system for executing machine learning under data privacy protection
CN111046431A (en) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 Data processing method, query method, device, electronic equipment and system
CN111159730A (en) * 2019-12-13 2020-05-15 支付宝(杭州)信息技术有限公司 Data processing method, query method, device, electronic equipment and system
CN111241581A (en) * 2020-01-09 2020-06-05 山东师范大学 Multi-sensitive attribute privacy protection method and system based on sensitivity layering

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968887A (en) * 2018-09-28 2020-04-07 第四范式(北京)技术有限公司 Method and system for executing machine learning under data privacy protection
CN110968887B (en) * 2018-09-28 2022-04-05 第四范式(北京)技术有限公司 Method and system for executing machine learning under data privacy protection
CN109492429A (en) * 2018-10-30 2019-03-19 华南师范大学 A kind of method for secret protection of data publication
CN109492429B (en) * 2018-10-30 2020-10-16 华南师范大学 Privacy protection method for data release
CN109726589A (en) * 2018-12-22 2019-05-07 北京工业大学 A kind of private data access method towards many intelligence cloud environments
CN109726589B (en) * 2018-12-22 2021-11-12 北京工业大学 Crowd-sourcing cloud environment-oriented private data access method
CN109857780A (en) * 2019-01-17 2019-06-07 西北大学 A kind of linear-orthogonal data dissemination method for statistical query attack
CN109857780B (en) * 2019-01-17 2023-04-28 西北大学 Linear-orthogonal data publishing method for statistical query attack
CN110348238A (en) * 2019-05-28 2019-10-18 北京邮电大学 A kind of application oriented secret protection stage division and device
CN111046431A (en) * 2019-12-13 2020-04-21 支付宝(杭州)信息技术有限公司 Data processing method, query method, device, electronic equipment and system
CN111159730A (en) * 2019-12-13 2020-05-15 支付宝(杭州)信息技术有限公司 Data processing method, query method, device, electronic equipment and system
CN111241581A (en) * 2020-01-09 2020-06-05 山东师范大学 Multi-sensitive attribute privacy protection method and system based on sensitivity layering

Similar Documents

Publication Publication Date Title
CN107832631A (en) The method for secret protection and system of a kind of data publication
Mei et al. Sgnn: A graph neural network based federated learning approach by hiding structure
Nahar et al. Sentiment analysis for effective detection of cyber bullying
Chen et al. Differentially private transit data publication: a case study on the montreal transportation system
Duczmal et al. A genetic algorithm for irregularly shaped spatial scan statistics
CN107092929B (en) Criminal case association series-parallel method and system based on clustering technology
CN106909643A (en) The social media big data motif discovery method of knowledge based collection of illustrative plates
Araújo et al. Identifying important characteristics in the KDD99 intrusion detection dataset by feature selection using a hybrid approach
CN107992887A (en) Classifier generation method, sorting technique, device, electronic equipment and storage medium
CN103136372B (en) URL quick position, classification and filter method in network trusted sexual behaviour management
Kiabod et al. TSRAM: A time-saving k-degree anonymization method in social network
CN107358116A (en) A kind of method for secret protection in multi-sensitive attributes data publication
Li et al. Intelligent anti-money laundering solution based upon novel community detection in massive transaction networks on spark
CN107483451A (en) Based on serial parallel structural network secure data processing method and system, social networks
CN104836805A (en) Network intrusion detection method based on fuzzy immune theory
Chi et al. Privacy preserving record linkage in the presence of missing values
CN107070932B (en) Anonymous method for preventing label neighbor attack in social network dynamic release
CN106874788A (en) A kind of method for secret protection in sensitive data issue
Williams et al. Black-box sparse adversarial attack via multi-objective optimisation
CN107291930A (en) The computational methods of weight number
Wang et al. An evolutionary computation-based machine learning for network attack detection in big data traffic
Zheng et al. Tegdetector: a phishing detector that knows evolving transaction behaviors
CN106557983B (en) Microblog junk user detection method based on fuzzy multi-class SVM
CN107704872A (en) A kind of K means based on relatively most discrete dimension segmentation cluster initial center choosing method
Vasan et al. Feature subset selection for intrusion detection using various rank-based algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200818

Address after: 318015 no.2-3167, zone a, Nonggang City, no.2388, Donghuan Avenue, Hongjia street, Jiaojiang District, Taizhou City, Zhejiang Province

Applicant after: Taizhou Jiji Intellectual Property Operation Co.,Ltd.

Address before: 201616 Shanghai city Songjiang District Sixian Road No. 3666

Applicant before: Phicomm (Shanghai) Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20180323