CN108280491A - A kind of k means clustering methods towards difference secret protection - Google Patents

A kind of k means clustering methods towards difference secret protection Download PDF

Info

Publication number
CN108280491A
CN108280491A CN201810347108.6A CN201810347108A CN108280491A CN 108280491 A CN108280491 A CN 108280491A CN 201810347108 A CN201810347108 A CN 201810347108A CN 108280491 A CN108280491 A CN 108280491A
Authority
CN
China
Prior art keywords
cluster
point
data
indicate
retry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810347108.6A
Other languages
Chinese (zh)
Other versions
CN108280491B (en
Inventor
杨庚
胡闯
白云璐
王璇
唐海霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Mengda Group Co.,Ltd.
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201810347108.6A priority Critical patent/CN108280491B/en
Publication of CN108280491A publication Critical patent/CN108280491A/en
Application granted granted Critical
Publication of CN108280491B publication Critical patent/CN108280491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a kind of k means clustering methods towards difference secret protection, including data prediction;Center point set after indicating cluster with C, C indicate the error sum of squares under given data set and cluster center C;Judge C, size;Cycle executes, until retry is more than given number of retries maximum value retrymax, it is then back to optimal central point Cbest;It, is categorized into nearest central point by each point in ergodic data collection X;The random noise of addition is set;The summation of the data point of each cluster, the quantity of point are recalculated, noise, the barycenter of final updating cluster are added;Step is repeated until error sum of squares restrains or iterations reach the upper limit.The present invention increases the random noise appropriate for meeting specific distribution in the iterative process of k means clustering algorithms so that cluster result is distorted to a certain extent, achievees the purpose that secret protection, while ensure that the availability of data.

Description

A kind of k means clustering methods towards difference secret protection
Technical field
The present invention relates to a kind of secret protection, clustering methods, and in particular to a kind of k mean values towards difference secret protection are poly- Class method, belongs to field of information security technology.
Background technology
With the fast development of cloud computing and big data, data mining technology obtains in some in-depth studies and application Significant progress.One of important method as data mining, clustering algorithm can excavate implicit, unknown knowledge and rule Then, and in the operational decision making of a large amount of related datas there is important potential value.But at the same time, bulk information discloses sensitive Information brings the threat and loss that can not be estimated to user.Therefore, how data-privacy to be protected to become in process of cluster analysis The hot issue of data mining and data secret protection field.With the proposition and development of secret protection technology, difference privacy is protected Maintaining method becomes a kind of current secret protection technology of hot topic.Difference privacy is realized by noise mechanism, i.e., into output result Random noise is added to protect data safety, the noise of addition is bigger, and data are safer, however, the availability of data is lower, instead It is as the same.
As one of most common clustering method, k-means algorithms are realized simply, while being provided high speed and being clustered.But tradition Difference secret protection k-means algorithms (such as difference privacy k-means algorithms, difference privacy k-means++ algorithms), at the beginning of it The selection of beginning central point is more sensitive, and there are certain blindness in the selection for the number k values that cluster, and reduce cluster knot The availability of fruit.
Invention content
Problem to be solved by this invention is paid aiming at insufficient present in background technology, proposes one kind towards difference The k means clustering methods of secret protection, increase in the iterative process of k means clustering algorithms and meet the appropriate of specific distribution Random noise so that cluster result is distorted to a certain extent, achievees the purpose that secret protection, while ensure that the available of data Property;Method is simple, easy to operate and do not limit data set size and attribute.
The method of the present invention executes the result of k-means++ algorithms acquisition as input value on data set, then passes through Alternately a series of non local " jumps " k-means algorithm traditional with execution, the cluster initial center point optimized, profit With this center point set again implementation center's point plus the cluster process of iteration of making an uproar;Difference secret protection technology of the present invention is fixed Justice one and its stringent challenge model, and carried out stringent mathematical proof and quantificational expression, the same time difference to privacy risk It is divided to privacy mechanism also can obtain and preferably put down in k-means cluster data mining result availabilities and two aspect of secret protection rank Weighing apparatus.
A kind of k means clustering methods towards difference secret protection of the present invention, include the following steps:
Step 1:Sample data pre-processes;
Step 2:Center point set after indicating cluster with C, φ (C, X) indicate given sample data set X and cluster central point Collecting the error sum of squares under C, x indicates that a data that sample data is concentrated, c indicate the central point that cluster central point is concentrated, Wherein
φ (C, X)=∑x∈X minc∈C||x-c||2 (2)
Retry indicates the number retried, retrymaxIndicate maximum reattempt times, φbestIndicate updated square-error With CbestIndicate updated center point set;Be then store in data set X execute obtain after k-means++ algorithms be at present Only minimum error sum of squares φ (C, X) Dao φbestNeutralize optimal cluster centre point set C to CbestIn;Enable retrymax=m, m ∈ { 0,1,2 ... }, and initialize retry=0;
Step 3:As retry≤retrymaxWhen, enable λ indicate the central point of most " useless ", CiIndicate the barycenter of cluster i, whereinCμIndicate that the barycenter of cluster μ, μ indicate the maximum central point of intra-cluster distance quadratic sum, dμTable Show the average distance of cluster μ, whereinEnable o tables Show that a small random number, u indicate that the random vector of d dimension unit hyper-spheres, ∈ indicate offset vector, wherein o=∈ dμu;So Enable λ=μ+o, μ=μ-o again afterwards;
Step 4:The center point set C obtained using step 3 executes traditional k-means algorithms as initial center point set, Judge the size of φ (C, X);If φ (C, X) is less than φbest, then φbest=φ (C, X), Cbest=C, retry=0, otherwise Current this layer of cycle is exited, retry=retry+1, C are enabledbest=C;
Step 5:Cycle executes step 3 and 4, until retry is more than given number of retries maximum value retrymax, then Return to optimal central point Cbest
Step 6:Each point in ergodic data collection X calculates each point and arrives the distance between all central points, it is classified To nearest central point, and k cluster will be divided into X;
Step 7:The random noise of addition is set:
Random noise is Laplace noises, i.e., noise obeys Laplace distribution Lap (b), and b=Δs f/ ε, Δ f are the overall situation Susceptibility, ε are secret protection budget;Remember that location parameter is 0, the Laplace that scale parameter is b is distributed as Lap (b), probability Density function is
Wherein, η indicates stochastic variable;
Step 8:Recalculate the summation of the data point of each cluster, the quantity of point, addition noise Lap (b), obtain sum '= The barycenter of sum+Lap (b) and num '=num+Lap (b), final updating cluster are sum '/num ';
Step 9:Step 7 and 8 is repeated until error sum of squares restrains or iterations reach the upper limit, error sum of squares is got over Small, cluster result is more independent and compact.
In step 1, the method for data prediction is as follows:
If sample data set is X, sample space dimension is d, number of samples n;Determine the ratio between each attribute of sample Relationship;Maximum value Max based on initial data and minimum M in carries out the standardization of data, number using normalization processing method According to each record be d dimensional vectors, need to zoom in and out to space [0,1] d dimension spaces per one-dimensionaldIn, such as formula (1) institute Show:
Min, Max indicate that the minimum value of l dimensions, maximum value, f (l) are the data of l dimensions respectively, and y (l) is l dimension scalings Data afterwards.
In step 3, the offset vector ∈ takes 0.01.
In step 6, the distance between point x and point y, x are indicated with dist (x, y)iIndicate the value of the i-th dimension of point x, yiIt indicates The value of the i-th dimension of point y, dim indicate the dimension of point;The distance between 2 points calculate using Euclidean distance calculation formula, calculating side Shown in method such as formula (3)
In step 7, different data sets executes different iterations and can be only achieved the condition of convergence in clustering algorithm,
If (a) iterations N is fixed, the privacy budget of each iteration consumption is ε/N, and it is Lap that can add size every time The noise of ((d+1) N/ ε) obtains ε-difference secret protection;
If (b) iterations N is unknown, the value of privacy budget ε will be constantly adjusted in an iterative process.
Early period, influence of the iteration to cluster result was greater than later stage iteration;Select the increase privacy in cluster process gradually Budget ε, the pre- of the first sub-distribution is ε/2, and noise size is Lap (2 (d+1)/ε), each iteration consumption later it is pre- at last Previous half, until to the last an iteration is completed.
The present invention has the beneficial effect that:
To ensure the safety of k-means clustering algorithms, appropriate make an uproar is added by the central point in k-means algorithms Sound devises the clustering algorithm based on difference secret protection, and proves that algorithm meets difference privacy conditions.It is hidden with existing difference Private k mean algorithms are compared, and method of the invention is using the result of k-means++ algorithms as input value, then by alternately A series of non local " jumps " k-means algorithm traditional with execution, improves the selection of initial center point, it can effectively keep away Exempt from k values blindness and initial point sensibility, and its iterations can be reduced, to improve the availability of cluster, protects simultaneously Privacy.
Description of the drawings
Fig. 1 is the number for testing difference privacy k-means clustering algorithm performances used in experiment provided by the invention According to schematic diagram;
Fig. 2 is the work flow diagram of the k-means clustering methods provided by the invention towards difference secret protection.
Specific implementation mode
The implementation of technical scheme of the present invention is described in further detail below in conjunction with the accompanying drawings, it should be understood that these examples It is only illustrative of the invention and is not intended to limit the scope of the invention, after having read the present invention, those skilled in the art couple The modification of the various equivalent forms of the present invention falls within the application range as defined in the appended claims.
A kind of k means clustering methods towards difference privacy of the present invention, this method are made with the result of k-means++ algorithms For input value, then by alternately a series of non local " jumps " with execute traditional k-means algorithms, improve it is initial in The selection of heart point, and difference secret protection Laplace mechanism is utilized, increase in the iterative process of k means clustering algorithms and meets The random noise appropriate of specific distribution so that cluster result is distorted to a certain extent, achievees the purpose that secret protection, simultaneously It ensure that the availability of data.The method of the present invention is simple, easy to operate and theoretical proof its meet ε-difference privacy conditions, Ke Yiyou Effect avoids k values blindness and initial point sensibility, and can reduce its iterations, to improve the availability of cluster, simultaneously Privacy is protected, data publication and the secret protection of the data set of different scales and different dimensions are applicable to.
Referring to Fig. 2, specific implementation mode is as follows:
Step 1:Collection obtains a sample data set housec8.txt, storage be house color three color values, sample This number is 34112, attribute 3, image pattern collection X={ x1,x2,…,x34112, it is contracted to every one-dimensional data with formula (1) It puts to [0,1] section.20 row data in data set after scaling are taken, as follows:
x1=[0. 0.08130081 0.00473934] x50=[0.00961538 0.0203252 0.00947867]
x102=[0.02403846 0.01626016 0.03317536] x155=[0.02403846 0.0203252 0.01895735]
x250=[0.03365385 0.06910569 0.00473934] x350=[0.03365385 0.11788618 0.01895735]
x1000=[0.04326923 0.03658537 0.07109005] x3020=[0.04326923 0.10569106 0.02369668]
x5030=[0.04807692 0.03658537 0.02843602] x6000=[0.05288462 0.06097561 0.04265403]
x9843=[0.05288462 0.08130081 0.04265403] x10345=[0.05288462 0.09349593 0.01895735]
x18546=[0.05769231 0.01219512 0.04265403] x20345=[0.05769231 0.05284553 0.04739336]
x24675=[0.05769231 0.06097561 0.07582938] x26546=[0.05769231 0.06910569 0.02843602]
x29654=[0.0625 0.06097561 0.10900474] x30000=[0.0625 0.06910569 0.07582938]
Step 2:K-means++ algorithms are executed in data set X, obtain φbest=0.024421323538, Cbest= [[0.33290311 0.25585707 0.15738572][0.61027347 0.44192056 0.28916641] [0.70476998 0.79096867 0.75084066]]
Step 3:The C obtained with step 2bestThe choosing of more preferably initial center point is carried out according to step 3-5 in technical solution It takes, result is
[[0.59984837 0.42572074 0.28687944][0.70476998 0.79096867 0.75084066] [0.59510802 0.42289839 0.28504289]]
Step 4:The random noise of addition is set.Difference privacy budget total amount ε ∈ [0.1,1] are taken, due to experimental data set Attribute d=3, and iterations are unknown, it is ε/2 that can obtain the pre- of the first sub-distribution in iterative process using formula (4), and noise is big Small is Lap (8/ ε), and the pre- of the second sub-distribution is ε/4, and noise size is Lap (16/ ε), later the budget of each iteration consumption It is previous half, until to the last an iteration is completed.
Step 5:The noisiness being arranged according to step 4 is carried out plus is made an uproar to the summation of data point and the quantity of point of each cluster, Barycenter is updated, in the first iteration, the summation matrix of 3 attributes at 3 cluster midpoints is
And the quantity of the respective point of 3 clusters is
Num=[23,101 3,600 7411]
The noisiness of first time iteration addition is Lap (8/ ε) known to step 4, therefore newer new barycenter is for the first timeAs a result it is
[[0.6381381 0.53917325 0.41654519][0.24548196 0.1825142 0.10771185] [0.39675444 0.31281772 0.20345833]]
Iteration concrete outcome is no longer described in detail below, and experiment finally reaches convergence after the 40th iteration, finally obtains plus poor Point privacy cluster result center point set is
[[0.5893953 0.4049542 0.27335089][0.59260889 0.40021986 0.27148743] [0.70471501 0.790935 0.75081869]]
Step 6:Assess clustering performance.Since reference class is provided via selected data set, we use F-measure To assess clustering performance.The range of F-measure values is [0,1], and value means that more greatly algorithm has preferably cluster availability.
It is that both are poor by difference privacy clustering algorithm proposed by the present invention and DPk-means and DPk-means++ herein Point privacy clustering algorithm is compared, corresponding each ε values, and data set calls three difference privacy clustering algorithms 50 times respectively, takes The average value of corresponding F-measure results, as shown in Fig. 1 (wherein red lines are arithmetic result provided by the invention).
As seen from the figure, at the horizontal ε of identical privacy, difference privacy clustering algorithm proposed by the present invention and other two kinds calculations Method is compared, and the result of F-measure, which has been worth to, largely to be improved, this illustrates the present invention under identical secret protection rank The cluster availability higher of acquisition, and privacy budget is bigger, and cluster availability is higher, but privacy level reduces.
In conclusion the present invention proposes a kind of k-means clustering methods towards difference privacy, the program is with k- Then the result of means++ algorithms passes through alternately a series of non local " jumps " k- traditional with execution as input value Means algorithms improve the selection of initial center point, and utilize difference secret protection Laplace mechanism, in k means clustering algorithms Iterative process in increase and meet the random noise appropriate of specific distribution so that cluster result is distorted to a certain extent, is reached To the purpose of secret protection, while it ensure that the availability of data.
The above is only some embodiments of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (6)

1. a kind of k means clustering methods towards difference secret protection, which is characterized in that include the following steps:
Step 1:Sample data pre-processes;
Step 2:Center point set after indicating cluster with C, φ (C, X) are indicated under given sample data set X and cluster center point set C Error sum of squares, x indicate sample data concentrate a data, c indicate cluster central point concentrate a central point, wherein
φ (C, X)=∑x∈Xminc∈C||x-c||2 (2)
Retry indicates the number retried, retrymaxIndicate maximum reattempt times, φbestIndicate updated error sum of squares, CbestIndicate updated center point set;Be then store in data set X execute obtain after k-means++ algorithms so far most Small error sum of squares φ (C, X) Dao φbestNeutralize optimal cluster centre point set C to CbestIn;Enable retrymax=m, m ∈ { 0,1,2 ... }, and initialize retry=0;
Step 3:As retry≤retrymaxWhen, enable λ indicate the central point of most " useless ", CiIndicate the barycenter of cluster i, whereinCμIndicate that the barycenter of cluster μ, μ indicate the maximum central point of intra-cluster distance quadratic sum, dμTable Show the average distance of cluster μ, wherein O is enabled to indicate One small random number, u indicate that the random vector of d dimension unit hyper-spheres, ∈ indicate offset vector, wherein o=∈ dμu;Then λ=μ+o, μ=μ-o are enabled again;
Step 4:The center point set C obtained using step 3 executes traditional k-means algorithms as initial center point set, judges The size of φ (C, X);If φ (C, X) is less than φbest, then φbest=φ (C, X), Cbest=C, retry=0, is otherwise exited Current this layer cycle, enables retry=retry+1, Cbest=C;
Step 5:Cycle executes step 3 and 4, until retry is more than given number of retries maximum value retrymax, it is then back to Optimal central point Cbest
Step 6:Each point in ergodic data collection X calculates each point and arrives the distance between all central points, it is categorized into most Close central point, and k cluster will be divided into X;
Step 7:The random noise of addition is set:
Random noise is Laplace noises, i.e., noise obeys Laplace distribution Lap (b), and b=Δs f/ ε, Δ f are global sensitive Degree, ε are secret protection budget;Remember that location parameter is 0, the Laplace that scale parameter is b is distributed as Lap (b), probability density Function is
Wherein, η indicates stochastic variable;
Step 8:The summation of the data point of each cluster, the quantity of point are recalculated, addition noise Lap (b) obtains sum '=sum+ The barycenter of Lap (b) and num '=num+Lap (b), final updating cluster are sum '/num ';
Step 9:Step 7 and 8 is repeated until error sum of squares restrains or iterations reach the upper limit, error sum of squares is smaller, gathers Class result is more independent and compact.
2. the k means clustering methods according to claim 1 towards difference secret protection, which is characterized in that in step 1, The method of data prediction is as follows:
If sample data set is X, sample space dimension is d, number of samples n;Determine the proportionate relationship between each attribute of sample; Maximum value Max based on initial data and minimum M in carries out the standardization of data using normalization processing method, data Each record is d dimensional vectors, needs to zoom in and out to space [0,1] d dimension spaces per one-dimensionaldIn, as shown in formula (1):
Min, Max indicate that the minimum value of l dimensions, maximum value, f (l) are the data of l dimensions respectively, and y (l) is after l dimensions scale Data.
3. the k means clustering methods according to claim 1 towards difference secret protection, which is characterized in that in step 3, The offset vector ∈ takes 0.01.
4. the k means clustering methods according to claim 1 towards difference secret protection, which is characterized in that in step 6, The distance between point x and point y, x are indicated with dist (x, y)iIndicate the value of the i-th dimension of point x, yiIndicate the value of the i-th dimension of point y, Dim indicates the dimension of point;The distance between 2 points calculate using Euclidean distance calculation formula, shown in computational methods such as formula (3)
5. the k means clustering methods according to claim 1 towards difference secret protection, which is characterized in that in step 7, Different data sets executes different iterations and can be only achieved the condition of convergence in clustering algorithm,
If (a) iterations N is fixed, the privacy budget of each iteration consumption is ε/N, and it is Lap ((d+ that can add size every time 1) N/ ε) noise obtain ε-difference secret protection;
If (b) iterations N is unknown, the value of privacy budget ε will be constantly adjusted in an iterative process.
6. the k means clustering methods according to claim 5 towards difference secret protection, which is characterized in that early period iteration Influence to cluster result is greater than later stage iteration;Select the increase privacy budget ε in cluster process gradually, the first sub-distribution It is pre- be ε/2, noise size is Lap (2 (d+1)/ε), later the pre- half previous at last of each iteration consumption, until Until last time iteration is completed.
CN201810347108.6A 2018-04-18 2018-04-18 K-means clustering method for differential privacy protection Active CN108280491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810347108.6A CN108280491B (en) 2018-04-18 2018-04-18 K-means clustering method for differential privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810347108.6A CN108280491B (en) 2018-04-18 2018-04-18 K-means clustering method for differential privacy protection

Publications (2)

Publication Number Publication Date
CN108280491A true CN108280491A (en) 2018-07-13
CN108280491B CN108280491B (en) 2020-03-06

Family

ID=62811644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810347108.6A Active CN108280491B (en) 2018-04-18 2018-04-18 K-means clustering method for differential privacy protection

Country Status (1)

Country Link
CN (1) CN108280491B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388972A (en) * 2018-10-29 2019-02-26 山东科技大学 Medical data Singular variance difference method for secret protection based on OPTICS cluster
CN109495476A (en) * 2018-11-19 2019-03-19 中南大学 A kind of data flow difference method for secret protection and system based on edge calculations
CN109615021A (en) * 2018-12-20 2019-04-12 暨南大学 A kind of method for protecting privacy based on k mean cluster
CN109784092A (en) * 2019-01-23 2019-05-21 北京工业大学 A kind of recommended method based on label and difference secret protection
CN110097119A (en) * 2019-04-30 2019-08-06 西安理工大学 Difference secret protection support vector machine classifier algorithm based on dual variable disturbance
CN110334757A (en) * 2019-06-27 2019-10-15 南京邮电大学 Secret protection clustering method and computer storage medium towards big data analysis
CN110516476A (en) * 2019-08-31 2019-11-29 贵州大学 Geographical indistinguishable location privacy protection method based on frequent location classification
CN111027585A (en) * 2019-10-25 2020-04-17 南京大学 K-means algorithm hardware realization method and system based on k-means + + centroid initialization
CN111242194A (en) * 2020-01-06 2020-06-05 广西师范大学 Differential privacy protection method for affinity propagation clustering
CN111931235A (en) * 2020-08-18 2020-11-13 重庆邮电大学 Differential privacy protection method and system under error constraint condition
CN112199722A (en) * 2020-10-15 2021-01-08 南京邮电大学 K-means-based differential privacy protection clustering method
CN112560984A (en) * 2020-12-25 2021-03-26 广西师范大学 Differential privacy protection method for self-adaptive K-Nets clustering
CN113516199A (en) * 2021-07-30 2021-10-19 山西清众科技股份有限公司 Image data generation method based on differential privacy
CN113537308A (en) * 2021-06-29 2021-10-22 中国海洋大学 Two-stage k-means clustering processing system and method based on localized differential privacy
CN113704787A (en) * 2021-08-30 2021-11-26 国网江苏省电力有限公司营销服务中心 Privacy protection clustering method based on differential privacy
CN113792760A (en) * 2021-08-19 2021-12-14 北京爱笔科技有限公司 Cluster analysis method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN107423636A (en) * 2017-07-06 2017-12-01 北京航空航天大学 A kind of difference privacy K mean cluster method based on MapReduce
CN107766740A (en) * 2017-10-20 2018-03-06 辽宁工业大学 A kind of data publication method based on difference secret protection under Spark frameworks
CN107862220A (en) * 2017-11-28 2018-03-30 河海大学 Anonymous Synergistic method based on difference privacy under a kind of MapReduce frameworks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN107423636A (en) * 2017-07-06 2017-12-01 北京航空航天大学 A kind of difference privacy K mean cluster method based on MapReduce
CN107766740A (en) * 2017-10-20 2018-03-06 辽宁工业大学 A kind of data publication method based on difference secret protection under Spark frameworks
CN107862220A (en) * 2017-11-28 2018-03-30 河海大学 Anonymous Synergistic method based on difference privacy under a kind of MapReduce frameworks

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DONG SU 等: "Differentially Private k-Means Clustering", 《HTTPS://WWW.RESEARCHGATE.NET/PUBLICATION/275363962》 *
JUN REN 等: "DPLK-means: A novel Differential Privacy K-means Mechanism", 《2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE》 *
高瑜 等: "基于差分隐私保护的DPk-medoids聚类算法", 《计算机技术与发展》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109388972A (en) * 2018-10-29 2019-02-26 山东科技大学 Medical data Singular variance difference method for secret protection based on OPTICS cluster
CN109495476A (en) * 2018-11-19 2019-03-19 中南大学 A kind of data flow difference method for secret protection and system based on edge calculations
CN109615021A (en) * 2018-12-20 2019-04-12 暨南大学 A kind of method for protecting privacy based on k mean cluster
CN109615021B (en) * 2018-12-20 2022-09-27 暨南大学 Privacy information protection method based on k-means clustering
CN109784092A (en) * 2019-01-23 2019-05-21 北京工业大学 A kind of recommended method based on label and difference secret protection
CN110097119A (en) * 2019-04-30 2019-08-06 西安理工大学 Difference secret protection support vector machine classifier algorithm based on dual variable disturbance
CN110334757A (en) * 2019-06-27 2019-10-15 南京邮电大学 Secret protection clustering method and computer storage medium towards big data analysis
CN110516476A (en) * 2019-08-31 2019-11-29 贵州大学 Geographical indistinguishable location privacy protection method based on frequent location classification
CN111027585A (en) * 2019-10-25 2020-04-17 南京大学 K-means algorithm hardware realization method and system based on k-means + + centroid initialization
CN111242194A (en) * 2020-01-06 2020-06-05 广西师范大学 Differential privacy protection method for affinity propagation clustering
CN111242194B (en) * 2020-01-06 2022-03-08 广西师范大学 Differential privacy protection method for affinity propagation clustering
CN111931235A (en) * 2020-08-18 2020-11-13 重庆邮电大学 Differential privacy protection method and system under error constraint condition
CN111931235B (en) * 2020-08-18 2021-10-22 重庆邮电大学 Differential privacy protection method and system under error constraint condition
CN112199722A (en) * 2020-10-15 2021-01-08 南京邮电大学 K-means-based differential privacy protection clustering method
CN112560984B (en) * 2020-12-25 2022-04-05 广西师范大学 Differential privacy protection method for self-adaptive K-Nets clustering
CN112560984A (en) * 2020-12-25 2021-03-26 广西师范大学 Differential privacy protection method for self-adaptive K-Nets clustering
CN113537308A (en) * 2021-06-29 2021-10-22 中国海洋大学 Two-stage k-means clustering processing system and method based on localized differential privacy
CN113537308B (en) * 2021-06-29 2023-11-03 中国海洋大学 Two-stage k-means clustering processing system and method based on localized differential privacy
CN113516199A (en) * 2021-07-30 2021-10-19 山西清众科技股份有限公司 Image data generation method based on differential privacy
CN113516199B (en) * 2021-07-30 2022-07-15 山西清众科技股份有限公司 Image data generation method based on differential privacy
CN113792760A (en) * 2021-08-19 2021-12-14 北京爱笔科技有限公司 Cluster analysis method and device, computer equipment and storage medium
CN113704787A (en) * 2021-08-30 2021-11-26 国网江苏省电力有限公司营销服务中心 Privacy protection clustering method based on differential privacy
CN113704787B (en) * 2021-08-30 2023-12-29 国网江苏省电力有限公司营销服务中心 Privacy protection clustering method based on differential privacy

Also Published As

Publication number Publication date
CN108280491B (en) 2020-03-06

Similar Documents

Publication Publication Date Title
CN108280491A (en) A kind of k means clustering methods towards difference secret protection
Got et al. Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach
CN104809408B (en) A kind of histogram dissemination method based on difference privacy
US10733332B2 (en) Systems for solving general and user preference-based constrained multi-objective optimization problems
CN108427891A (en) Neighborhood based on difference secret protection recommends method
CN110334757A (en) Secret protection clustering method and computer storage medium towards big data analysis
CN110266672B (en) Network intrusion detection method based on information entropy and confidence degree downsampling
CN110555316A (en) privacy protection table data sharing algorithm based on cluster anonymity
CN113011888B (en) Abnormal transaction behavior detection method, device, equipment and medium for digital currency
CN109117669B (en) Privacy protection method and system for MapReduce similar connection query
CN112001788B (en) Credit card illegal fraud identification method based on RF-DBSCAN algorithm
CN111444232A (en) Method for mining digital currency exchange address and storage medium
CN112101452B (en) Access right control method and device
CN108549904A (en) Difference secret protection K-means clustering methods based on silhouette coefficient
CN109271421A (en) A kind of large data clustering method based on MapReduce
CN102930275A (en) Remote sensing image feature selection method based on Cramer's V index
CN106096052A (en) A kind of consumer's clustering method towards wechat marketing
CN114187112A (en) Training method of account risk model and determination method of risk user group
CN114491644A (en) Differential privacy data publishing method meeting personalized privacy budget allocation
Hallaji et al. Constrained generative adversarial learning for dimensionality reduction
CN108959956A (en) Difference private data dissemination method based on Bayesian network
CN115630964A (en) Construction method of high-dimensional private data-oriented correlation data transaction framework
Wedashwara et al. Combination of genetic network programming and knapsack problem to support record clustering on distributed databases
CN116821913A (en) Intelligent contract vulnerability detection method based on utility adjustment strategy
CN105824785A (en) Rapid abnormal point detection method based on penalized regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200131

Address after: 523000 Guangdong province Dongguan Nancheng District week Xilong Xi Road No. 5 Goldman tech park two of Goldman Technology Building seventh 701-703 room

Applicant after: Dongguan Mengda Plasticizing Technology Co., Ltd.

Address before: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210003

Applicant before: NANJING UNIVERSITY OF POSTS AND TELECOMMUNICATIONS

GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 523000 room 1301, unit 2, building 4, Tian'an Digital City, No. 1, Huangjin Road, Nancheng street, Dongguan City, Guangdong Province

Patentee after: Dongguan Mengda Group Co.,Ltd.

Address before: Room 701-703, 7th floor, Goldman Sachs technology building, phase II, Goldman Sachs Technology Park, 5 Longxi Road, Zhouxi, Nancheng District, Dongguan City, Guangdong Province, 523000

Patentee before: DONGGUAN MENGDA PLASTICIZING SCIENCE & TECHNOLOGY Co.,Ltd.