CN110334757A - Secret protection clustering method and computer storage medium towards big data analysis - Google Patents
Secret protection clustering method and computer storage medium towards big data analysis Download PDFInfo
- Publication number
- CN110334757A CN110334757A CN201910565540.7A CN201910565540A CN110334757A CN 110334757 A CN110334757 A CN 110334757A CN 201910565540 A CN201910565540 A CN 201910565540A CN 110334757 A CN110334757 A CN 110334757A
- Authority
- CN
- China
- Prior art keywords
- privacy budget
- sequence
- secret protection
- privacy
- big data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a kind of secret protection clustering method and computer storage medium towards big data analysis, method the following steps are included: data normalization and Selection Center point, calculate minimum privacy budget and distribute privacy budget sequence, divide sample point to nearest central point, generate parameter addition noise during Laplacian noise, in the updating heart point thereto, continuous iteration until the difference of the error sum of squares of adjacent iteration twice less than threshold value or reaches maximum number of iterations.The present invention obeys the noise of laplacian distribution by the intermediate parameters addition into clustering algorithm implementation procedure to protect the sensitive information in data set; solves the problems, such as leak data collection sensitive information in clustering algorithm implementation procedure; improve the mode of the privacy budget allocation of difference secret protection clustering algorithm; the availability that cluster result is improved under identical secret protection degree, solves the problems, such as the privacy leakage in big data cluster result.
Description
Technical field
The present invention relates to a kind of secret protection clustering method and computer storage mediums, more particularly to one kind towards big number
According to the secret protection clustering method and computer storage medium of analysis.
Background technique
Currently, data mining is increasingly valued by people, mass data is excavated using machine learning algorithm
Analysis can obtain a large amount of valuable new knowledges and new rule.Clustering is as more commonly used in the field of data mining
Method, be all widely used under the scenes such as data prediction, target group's classification, pattern-recognition and image segmentation.K is equal
It is also most commonly used algorithm that value, which is the most simple and effective in big data cluster analysis, but during the execution of the algorithm, updates matter
Need to calculate the sample size of each cluster and the sum of each attribute when the heart, the sensitive letter of these operation meeting leak data collection
Breath.
Difference privacy is a kind of data-privacy protection technique, upsets data by way of adding noise, while can
Retain the property in terms of the statistics of data.Therefore it is combined using difference secret protection technology with clustering algorithm, can protect number
It is not revealed according to the sensitive information of collection and obtains relatively accurate cluster result.There is one for existing secret protection clustering algorithm
A little shortcomings, random selection and privacy the budget consumption of initial point are too fast all to cause cluster result availability undesirable.Separately
Outside, the excessive problem of the random noise that conventional privacy budget allocation is easy to cause is still without solution.
Summary of the invention
Goal of the invention: the secret protection cluster towards big data analysis that the technical problem to be solved in the present invention is to provide a kind of
Method and computer storage medium solve conventional privacy budget allocation and are easy to cause random noise excessive, to influence to cluster
The problem of outcome quality, improves the mode of the privacy budget allocation of difference secret protection clustering algorithm, proposes a kind of equal difference
Privacy budget allocation mode improves the availability of cluster result under identical secret protection degree, solves big data cluster and digs
Privacy leakage problem in pick.
Technical solution: the secret protection clustering method of the present invention towards big data analysis, comprising the following steps:
(1) data concentrated to data are normalized;
(2) data set is equally divided into k subset, a sample point is randomly choosed in each subset as initial center
Point;
(3) total privacy budget ε and maximum number of iterations t is setm, calculate minimum privacy budget εmWith the number of iterations t=ε/
εmIf t > tm, then privacy budget sequence is distributed using equal difference privacy budget allocation mode, if t≤tm, then using average
Privacy budget allocation mode distributes privacy budget sequence, obtains privacy budget sequence εp, wherein 1≤p≤tm;
(4) for all sample points in data set, its Euclidean distance for arriving k central point is calculated separately, by sample point
Nearest central point is distributed to, data set is divided into k cluster C={ C1,C2,…,Ck};
(5) according to privacy budget sequence εpIn corresponding item generate the random number of laplacian distribution;
(6) C is clustered for eachj, wherein 1≤j≤k, calculates the sum of cluster sample point number num and sample point
Vector sum adds noise to it respectively and obtains num ' and sum ', and above-mentioned noise is the random of laplacian distribution in step (5)
Number;
(7) each cluster C is updatedjCentral point be sum '/num ', wherein 1≤j≤k;
(8) error sum of squares is calculated, if the absolute value of the difference of the error sum of squares of this and previous iteration is less than setting
Threshold value or the number of iterations reach upper limit tm, then terminate to execute, obtain cluster result, otherwise go to step 4 continue to execute it is next
Secondary iteration.
Further, minimum privacy budget ε in step (3)mCalculation method are as follows:
Wherein, N is the record number of data set, and d is the dimension of data, and ρ is the average value per one-dimensional centroid estimation.
Further, the equal difference privacy budget allocation mode in step (3) specifically:
It is t that total privacy budget ε, which is decomposed into length,mIncremental arithmetic progression, the sequence initial term be εm, the sequence
All and be ε, the ordered series of numbers inverted order is obtained into privacy budget sequence εp。
Further, the average privacy budget allocation mode in step (3) specifically:
It is t that total privacy budget ε, which is decomposed into length,mAverage ordered series of numbers, the sequence is privacy budget sequence εp。
Further, random number is to obey the laplacian distribution point that location parameter is 0, scale parameter is b in step (5)
Random number, wherein b=d+1/ ε ', d are the dimension of data, and ε ' is according to current iteration number from privacy budget sequence εpIn look into
The numerical value for the corresponding position looked for.
Further, the initial center point in step (2) be each subset in randomly choose a sample point after be added with
Machine noise obtains.
Computer storage medium of the present invention, is stored thereon with computer program, and the computer program is being counted
Calculation machine processor realizes the above-mentioned secret protection clustering method towards big data analysis when executing.
The utility model has the advantages that the present invention has following technical effect that
1, privacy budget sequence is generated using equal difference privacy budget allocation method, calculates minimum privacy budget ε firstm, so
Privacy budget sequence is calculated using sum of arithmetic series formula and general term formula afterwards, the privacy budget sequence is gentle, solves
Privacy budget present in existing method consumes too fast problem;
2, using equal difference privacy budget allocation method, linear distribution is pressed into total privacy budget, solves existing method distribution
Privacy budget early period is excessive, problem that the later period is too small.When total privacy budget very little, even less than minimum privacy budget εm
When, the present invention uses equalitarian distribution method, and the too small influence algorithm of the privacy budget of distribution is avoided to execute as far as possible.Compared to existing
Method, the present invention have higher cluster availability and better clustering result quality.
Detailed description of the invention
Fig. 1 is the method flow diagram of embodiment of the present invention;
Fig. 2 is equal difference privacy budget allocation method flow diagram of the invention;
Fig. 3 is the cluster approve- useful index comparison diagram of method and comparison algorithm of the invention;
Fig. 4 is the privacy budget sequence results comparison diagram of method and two points of distribution methods of the invention, sum of series distribution method.
Specific embodiment
The method flow diagram of present embodiment is as shown in Figure 1, be specifically implemented according to the following steps:
Step 1, existing Image.csv data set, the data set come from university computer institute, eastern Finland cluster data collection
(http://cs.joensuu.fi/sipu/datasets/).Remember that the data set is D, data set record number N is 34112, data
Dimension d is 3, i.e., every data has 3 attributes.The size of total privacy budget ε control secret protection degree, ε is arranged smaller, institute
The noise of addition is bigger, and secret protection degree is higher.Here total privacy budget ε is set as 0.8, clusters number k is 3, every number
According to a sample point being considered as in k dimension space.Every one-dimensional data of data set D is normalized into [0,1].
Data normalization is to zoom to every one-dimensional data in [0,1], is carried out by following formula:
Wherein, for any one dimension of data, x is the data of this dimension, min and max be respectively minimum value and
Maximum value, x ' are the data after normalization.
Step 2, pretreated data set D is equally divided into k subset { S1,S2,…,Sk, from each subset SiIn with
Machine selects a sample point oi, wherein 1≤i≤k, is added after random noise as initial central point { u1,u2,…,uk}.This
In, data set D is equally divided into 3 subset { S1,S2,S3, a sample point is randomly selected from each subset, and noise is added
Initial center point is obtained later, as a result are as follows:
u1[0 0.08130081 0.00473934]
u2[0.44230769 0.27235772 0.16587678]
u3[0.65384615 0.43089431 0.1943128]。
Step 3, privacy budget sequence ε is obtainedp, wherein 1≤p≤tm.Maximum number of iterations t is setm, calculate minimum privacy
Budget εm, and the number of iterations t=ε/ε is thus calculatedmIf t > tm, then distributed using equal difference privacy budget allocation mode
Privacy budget sequence;If t < tm, then privacy budget sequence is distributed using average privacy budget allocation mode;It finally obtains hidden
Private budget sequence { ε1,ε2,…,εtm}.Privacy budget sequence allocation flow is as shown in Figure 2.
Minimum privacy budget εmCalculation formula are as follows:
Wherein, N indicates that the record number of data-oriented collection, d are dimension, and k is the number of cluster, and ρ is per one-dimensional centroid estimation
Average value, when data normalization is to [0,1], value 0.45.
It is t that total privacy budget is decomposed into a length by equal difference privacy budget allocation modemIncremental arithmetic progression, the number
Each single item in column is the privacy budget consumed in corresponding the number of iterations.Concrete operations are the ε for acquiring step 3mAs etc.
The initial term a of difference series1, total privacy budget ε as the ordered series of numbers all and Sn, arithmetic progression can be calculated by following formula
Tolerance dt:
an=a1+(n-1)dt,
Obtain tolerance dtLater and then length is obtained as tmIncremental arithmetic progression, by this ordered series of numbers inverted order up to required hidden
Private budget sequence, privacy budget sequence not necessarily all run out of.
Average privacy budget allocation mode is exactly that total privacy budget is consumed every time by maximum number of iterations mean allocation
Privacy budget is ε/tm.Mean allocation also can be regarded as a kind of special equal difference distribution that tolerance is 0.
Specifically, setting maximum number of iterations tmIt is 8, minimum privacy budget ε is calculatedm=0.031, then t=ε/εm=
25.806, because of t > tm, so calculating privacy budget sequence ε using equal difference privacy budget allocation methodp, wherein 1≤p≤8.First
Tolerance d is calculatedt=0.0197, the occurrence of each single item is then calculated according to arithmetic progression general term formula, finally by falling
Row obtains the required privacy budget sequence successively decreased, result be 0.169,0.14928571,0.12957143,0.10985714,
0.09014286,0.07042857,0.05071429,0.031 }.
Step 4, for calculating all the points in data set D, its Euclidean distance for arriving k central point is calculated separately, by this
Sample point distributes to nearest central point, and data set D is divided into k cluster C={ C1,C2,…,Ck}。
Specifically, all the points in data set D are calculated separately its Euclidean distance to 3 central points, by this sample point
Nearest central point is distributed to, data set D is divided into 3 cluster C={ C1,C2,C3}。
Step 5, current iteration institute noise to be added is calculated, which is that obedience location parameter is 0, scale parameter b
Laplacian distribution divide random number, be denoted as Lap (b), wherein b=Δ f/ ε ', Δ f indicate susceptibility, ε ' be privacy budget.It draws
Pula this distribution probability density function beHere the susceptibility of data is related with dimension, Δ f
=d+1, privacy budget are that ε ' is according to current iteration number from privacy budget sequence εpThe numerical value of the corresponding position of middle lookup, institute
Lap (Δ f/ ε ') is expressed as with noise.
Specifically, searching corresponding privacy budget from privacy budget sequence obtained in step 3 according to the number of iteration
εp, susceptibility Δ f=3+1=4, so first time iteration, ε1It is 0.169, noise size is Lap (4/0.169);Second repeatedly
Generation, ε2It is 0.1493, noise size is Lap (4/0.1493), below and so on.
Step 6, C is clustered for eachj, wherein 1≤j≤k, calculates cluster sample point number num and sample point
And vector sum, the noise added respectively to it in step 5 obtains num ' and sum '.Specifically, clustering C for eachj,
Wherein 1≤j≤3, calculate cluster sample point number num and sample point and vector sum.The concrete outcome of first time iteration
Are as follows:
Cluster C1Num be 1406 and vector sum be [240.29 177.76 107.42];
Cluster C2Num be 12301 and vector sum be [4665.25 3686.47 2473.31];
Cluster C3Num be 20405 and vector sum be [13469.21 11385.21 8768.39];
Then the noise in step 5 is added to it respectively and obtains num ' and sum ', the noise of first time iteration addition is Lap
(4/0.169), concrete outcome are as follows:
Cluster C1Num ' be 1421.99 and vector sum ' be [284.77 190.18 108.46];
Cluster C2Num ' be 12281.82 and vector sum ' be [4688.87 3697.67 2566.92];
Cluster C3Num ' be 20396.29 and vector sum ' be [13466.97 11402.30 8739.17];
Step 7, each cluster C is updatedjCenter uj'=sum '/num ', wherein 1≤j≤3;Then first time iteration
The center concrete outcome of update are as follows:
u1′[0.20026401 0.13374381 0.07627629]
u2′[0.38177298 0.30106816 0.20900154]
u3′[0.66026546 0.55903804 0.42846875]。
Step 8, error sum of squares is calculated, is set if the absolute value of the difference of the error sum of squares of this and previous iteration is less than
The threshold value or the number of iterations set reach upper limit tm, then terminate to execute, obtain cluster result, otherwise go to step 4 and continue to execute.
The error sum of squares refers specifically to the sum of the distance of the central point of point and this class in each cluster.Threshold value can voluntarily be set
It sets, the threshold value of setting decides the number of iterations, theoretically can be set to 0, but due to the randomness of noise, is set as 0 meeting
Cause the number of iterations excessive, therefore threshold value can suitably be relaxed, is set as 100 here.
The method of the present embodiment is compared with current existing two kinds of algorithms.For different ε values, respectively by this three
A algorithm is run 10 times, is calculated F-measure index with their result and standard K mean algorithm result, is calculated with this to evaluate
The cluster availability of method.The codomain of F-measure is [0,1], the cluster result for showing the algorithm closer to 1 and standard without making an uproar
Sound result is more similar, shows that cluster availability is higher.F-measure index comparison diagram of three kinds of algorithms on Image data set
As shown in Figure 3.
Fig. 4 is the method for the present embodiment and the comparison diagram of existing two methods distribution privacy budget sequence.In iteration early period
In, existing two methods have already consumed by most of total privacy budget, and the privacy budget that the middle and later periods gets is seldom, and too small is hidden
Private budget is easy to cause much noise to influence algorithmic statement.And the privacy budget sequence that method of the invention is distributed is in
Linear distribution is also more sufficient in the privacy budget that mid-term is got, it is not easy to the case where excess noise algorithm of interference convergence occur.
The present invention is a kind of secret protection clustering method towards big data analysis, and it is poly- that this method improves existing difference privacy
The privacy budget allocation mode of class algorithm solves existing method privacy budget consumption using equal difference privacy budget allocation mode
It is too fast, the problems such as iteration later period noise is excessive, under identical secret protection degree, improve cluster result availability.The present invention
It can be applied to protect personal information not to be leaked in the process the process of the clustering of big data.Such as to doctor
When treating the progress cluster result such as data, commercial consumption data and position data, these data include a large amount of privacy of user, are made
The privacy leakage problem in data acquisition and algorithm implementation procedure can be effectively taken precautions against with method of the invention, while retaining data
Statistical property and excavate effectiveness.
If the embodiment of the present invention is realized and when sold or used as an independent product in the form of software function module,
Also it can store in a computer readable storage medium.Based on this understanding, the technical solution of the embodiment of the present invention
Substantially the part that contributes to existing technology can be embodied in the form of software products in other words, the computer software
Product is stored in a storage medium, including some instructions are used so that computer equipment (can be personal computer,
Server or the network equipment etc.) execute all or part of each embodiment the method for the present invention.And storage above-mentioned is situated between
Matter, which includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), magnetic or disk etc. are various, to deposit
Store up the medium of program code.It is combined in this way, present example is not limited to any specific hardware and software.
Correspondingly, being stored thereon with computer program the embodiments of the present invention also provide a kind of computer storage medium.
When the computer program is executed by processor, the aforementioned secret protection clustering method towards big data analysis may be implemented.
For example, the computer storage medium is computer readable storage medium.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Claims (7)
1. a kind of secret protection clustering method towards big data analysis, which comprises the following steps:
(1) data concentrated to data are normalized;
(2) data set is equally divided into k subset, a sample point is randomly choosed in each subset as initial center point;
(3) total privacy budget ε and maximum number of iterations t is setm, calculate minimum privacy budget εmWith the number of iterations t=ε/εm, such as
Fruit t > tm, then privacy budget sequence is distributed using equal difference privacy budget allocation method, if t≤tm, then pre- using average privacy
Point counting method of completing the square distributes privacy budget sequence, obtains privacy budget sequence εp, wherein 1≤p≤tm;
(4) for all sample points in data set, its Euclidean distance for arriving k central point is calculated separately, sample point is distributed
To nearest central point, data set is divided into k cluster C={ C1,C2,…,Ck};
(5) according to privacy budget sequence εpIn corresponding item generate the random number of laplacian distribution;
(6) C is clustered for eachj, wherein 1≤j≤k, calculate cluster sample point number num and sample point and vector
Sum adds noise to it respectively and obtains num ' and sum ', and above-mentioned noise is the random number of laplacian distribution in step (5);
(7) each cluster C is updatedjCentral point be sum '/num ', wherein 1≤j≤k;
(8) error sum of squares is calculated, if the absolute value of the difference of the error sum of squares of this and previous iteration is less than setting threshold value
Or the number of iterations reaches upper limit tm, then terminate to execute, obtain cluster result, otherwise go to step 4 and continue to execute and change next time
Generation.
2. the secret protection clustering method according to claim 1 towards big data analysis, which is characterized in that step (3)
Middle minimum privacy budget εmCalculation method are as follows:
Wherein, N is the record number of data set, and d is the dimension of data, and ρ is the average value per one-dimensional centroid estimation.
3. the secret protection clustering method according to claim 1 towards big data analysis, which is characterized in that step (3)
In equal difference privacy budget allocation method specifically:
It is t that total privacy budget ε, which is decomposed into length,mIncremental arithmetic progression, the sequence initial term be εm, the sequence is all
And be ε, the ordered series of numbers inverted order is obtained into privacy budget sequence εp。
4. the secret protection clustering method according to claim 1 towards big data analysis, which is characterized in that step (3)
In average privacy budget allocation method specifically:
It is t that total privacy budget ε, which is decomposed into length,mAverage ordered series of numbers, the sequence is privacy budget sequence εp。
5. the secret protection clustering method according to claim 1 towards big data analysis, it is characterised in that: step (5)
Middle random number be obey location parameter be 0, the laplacian distribution point random number that scale parameter is b, wherein b=d+1/ ε ', d
For the dimension of data, ε ' is according to current iteration number from privacy budget sequence εpThe numerical value of the corresponding position of middle lookup.
6. the secret protection clustering method according to claim 1 towards big data analysis, it is characterised in that: step (2)
In each subset of initial center point in randomly choose a sample point after be added random noise obtain.
7. a kind of computer storage medium, is stored thereon with computer program, it is characterised in that: the computer program is being counted
Calculation machine processor realizes method as claimed in any one of claims 1 to 6 when executing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910565540.7A CN110334757A (en) | 2019-06-27 | 2019-06-27 | Secret protection clustering method and computer storage medium towards big data analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910565540.7A CN110334757A (en) | 2019-06-27 | 2019-06-27 | Secret protection clustering method and computer storage medium towards big data analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110334757A true CN110334757A (en) | 2019-10-15 |
Family
ID=68144509
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910565540.7A Pending CN110334757A (en) | 2019-06-27 | 2019-06-27 | Secret protection clustering method and computer storage medium towards big data analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110334757A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750725A (en) * | 2019-10-24 | 2020-02-04 | 河北经贸大学 | Privacy-protecting user portrait generation method, terminal device and storage medium |
CN111242196A (en) * | 2020-01-06 | 2020-06-05 | 广西师范大学 | Differential privacy protection method for interpretable deep learning |
CN111563272A (en) * | 2020-04-30 | 2020-08-21 | 支付宝实验室(新加坡)有限公司 | Information statistical method and device |
CN111444545B (en) * | 2020-06-12 | 2020-09-04 | 支付宝(杭州)信息技术有限公司 | Method and device for clustering private data of multiple parties |
CN111914285A (en) * | 2020-06-09 | 2020-11-10 | 深圳大学 | Geographical distributed graph calculation method and system based on differential privacy |
CN112202542A (en) * | 2020-09-30 | 2021-01-08 | 清华-伯克利深圳学院筹备办公室 | Data perturbation method, device and storage medium |
CN112199722A (en) * | 2020-10-15 | 2021-01-08 | 南京邮电大学 | K-means-based differential privacy protection clustering method |
CN112347088A (en) * | 2020-10-28 | 2021-02-09 | 南京邮电大学 | Data reliability optimization method, storage medium and equipment |
CN112613065A (en) * | 2020-12-02 | 2021-04-06 | 北京明朝万达科技股份有限公司 | Data sharing method and device based on differential privacy protection |
CN112767693A (en) * | 2020-12-31 | 2021-05-07 | 北京明朝万达科技股份有限公司 | Vehicle driving data processing method and device |
CN113094751A (en) * | 2021-04-21 | 2021-07-09 | 山东大学 | Personalized privacy data processing method, device, medium and computer equipment |
CN113537308A (en) * | 2021-06-29 | 2021-10-22 | 中国海洋大学 | Two-stage k-means clustering processing system and method based on localized differential privacy |
CN113609523A (en) * | 2021-07-29 | 2021-11-05 | 南京邮电大学 | Vehicle networking private data protection method based on block chain and differential privacy |
CN114117540A (en) * | 2022-01-25 | 2022-03-01 | 广州天鹏计算机科技有限公司 | Big data analysis processing method and system |
CN114817985A (en) * | 2022-04-22 | 2022-07-29 | 广东电网有限责任公司 | Privacy protection method, device, equipment and storage medium for electricity consumption data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778314A (en) * | 2017-03-01 | 2017-05-31 | 全球能源互联网研究院 | A kind of distributed difference method for secret protection based on k means |
CN108280491A (en) * | 2018-04-18 | 2018-07-13 | 南京邮电大学 | A kind of k means clustering methods towards difference secret protection |
CN108549904A (en) * | 2018-03-28 | 2018-09-18 | 西安理工大学 | Difference secret protection K-means clustering methods based on silhouette coefficient |
-
2019
- 2019-06-27 CN CN201910565540.7A patent/CN110334757A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106778314A (en) * | 2017-03-01 | 2017-05-31 | 全球能源互联网研究院 | A kind of distributed difference method for secret protection based on k means |
CN108549904A (en) * | 2018-03-28 | 2018-09-18 | 西安理工大学 | Difference secret protection K-means clustering methods based on silhouette coefficient |
CN108280491A (en) * | 2018-04-18 | 2018-07-13 | 南京邮电大学 | A kind of k means clustering methods towards difference secret protection |
Non-Patent Citations (3)
Title |
---|
C. DWORK: "Differential privacy", 《PROCEEDINGS OF 39TH INTERNATIONAL COLLOQUIUM ON AUTOMATA, LANGUAGES AND PROGRAMMING》 * |
SU D ET AL: "Differentially private k-means clustering", 《PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY》 * |
尚涛等: "基于等差隐私预算分配的大数据决策树算法", 《工程科学与技术》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110750725A (en) * | 2019-10-24 | 2020-02-04 | 河北经贸大学 | Privacy-protecting user portrait generation method, terminal device and storage medium |
CN111242196A (en) * | 2020-01-06 | 2020-06-05 | 广西师范大学 | Differential privacy protection method for interpretable deep learning |
CN111242196B (en) * | 2020-01-06 | 2022-06-21 | 广西师范大学 | Differential privacy protection method for interpretable deep learning |
CN111563272A (en) * | 2020-04-30 | 2020-08-21 | 支付宝实验室(新加坡)有限公司 | Information statistical method and device |
CN111914285A (en) * | 2020-06-09 | 2020-11-10 | 深圳大学 | Geographical distributed graph calculation method and system based on differential privacy |
CN111914285B (en) * | 2020-06-09 | 2022-06-17 | 深圳大学 | Geographic distributed graph calculation method and system based on differential privacy |
WO2021248937A1 (en) * | 2020-06-09 | 2021-12-16 | 深圳大学 | Geographically distributed graph computing method and system based on differential privacy |
CN111444545B (en) * | 2020-06-12 | 2020-09-04 | 支付宝(杭州)信息技术有限公司 | Method and device for clustering private data of multiple parties |
CN112202542A (en) * | 2020-09-30 | 2021-01-08 | 清华-伯克利深圳学院筹备办公室 | Data perturbation method, device and storage medium |
CN112199722A (en) * | 2020-10-15 | 2021-01-08 | 南京邮电大学 | K-means-based differential privacy protection clustering method |
CN112347088B (en) * | 2020-10-28 | 2024-02-20 | 南京邮电大学 | Data credibility optimization method, storage medium and equipment |
CN112347088A (en) * | 2020-10-28 | 2021-02-09 | 南京邮电大学 | Data reliability optimization method, storage medium and equipment |
CN112613065B (en) * | 2020-12-02 | 2024-08-20 | 北京明朝万达科技股份有限公司 | Data sharing method and device based on differential privacy protection |
CN112613065A (en) * | 2020-12-02 | 2021-04-06 | 北京明朝万达科技股份有限公司 | Data sharing method and device based on differential privacy protection |
CN112767693A (en) * | 2020-12-31 | 2021-05-07 | 北京明朝万达科技股份有限公司 | Vehicle driving data processing method and device |
CN113094751A (en) * | 2021-04-21 | 2021-07-09 | 山东大学 | Personalized privacy data processing method, device, medium and computer equipment |
CN113537308B (en) * | 2021-06-29 | 2023-11-03 | 中国海洋大学 | Two-stage k-means clustering processing system and method based on localized differential privacy |
CN113537308A (en) * | 2021-06-29 | 2021-10-22 | 中国海洋大学 | Two-stage k-means clustering processing system and method based on localized differential privacy |
CN113609523B (en) * | 2021-07-29 | 2022-04-01 | 南京邮电大学 | Vehicle networking private data protection method based on block chain and differential privacy |
CN113609523A (en) * | 2021-07-29 | 2021-11-05 | 南京邮电大学 | Vehicle networking private data protection method based on block chain and differential privacy |
CN114117540B (en) * | 2022-01-25 | 2022-04-29 | 广州天鹏计算机科技有限公司 | Big data analysis processing method and system |
CN114117540A (en) * | 2022-01-25 | 2022-03-01 | 广州天鹏计算机科技有限公司 | Big data analysis processing method and system |
CN114817985A (en) * | 2022-04-22 | 2022-07-29 | 广东电网有限责任公司 | Privacy protection method, device, equipment and storage medium for electricity consumption data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110334757A (en) | Secret protection clustering method and computer storage medium towards big data analysis | |
Got et al. | Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach | |
Arora et al. | Analysis of k-means and k-medoids algorithm for big data | |
Mahmud et al. | Improvement of K-means clustering algorithm with better initial centroids based on weighted average | |
Chen et al. | APSCAN: A parameter free algorithm for clustering | |
Davis et al. | Grids versus graphs: Partitioning space for improved taxi demand-supply forecasts | |
CN107783998A (en) | The method and device of a kind of data processing | |
Al Abd Alazeez et al. | EDDS: An enhanced density-based method for clustering data streams | |
Bharanidharan et al. | Improved chicken swarm optimization to classify dementia MRI images using a novel controlled randomness optimization algorithm | |
Wang et al. | M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning | |
Kwedlo | A hybrid steady-state evolutionary algorithm using random swaps for Gaussian model-based clustering | |
CN117407921A (en) | Differential privacy histogram release method and system based on must-connect and don-connect constraints | |
Zhang et al. | Scalegcn: Efficient and effective graph convolution via channel-wise scale transformation | |
Kanezashi et al. | An incremental local-first community detection method for dynamic graphs | |
CN104899232A (en) | Cooperative clustering method and cooperative clustering equipment | |
CN105589896B (en) | Data digging method and device | |
CN114298245A (en) | Anomaly detection method and device, storage medium and computer equipment | |
Lingras | Evolutionary rough K-means clustering | |
Cai et al. | The multi-task learning with an application of Pareto improvement | |
Azad et al. | Modified constrained differential evolution for solving nonlinear global optimization problems | |
Rahman et al. | AWST: A Novel Attribute Weight Selection Technique for Data Clustering. | |
Santos et al. | A comparative study of GPU metaheuristics for data clustering | |
Fahim et al. | Unsupervised Space Partitioning for Nearest Neighbor Search | |
Yu et al. | Genetic-based K-means algorithm for selection of feature variables | |
Chen et al. | Power Grid Missing Data Filling Method Based on Historical Data Mining Assisted Multi-dimensional Scenario Analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |