CN110334757A - Secret protection clustering method and computer storage medium towards big data analysis - Google Patents

Secret protection clustering method and computer storage medium towards big data analysis Download PDF

Info

Publication number
CN110334757A
CN110334757A CN201910565540.7A CN201910565540A CN110334757A CN 110334757 A CN110334757 A CN 110334757A CN 201910565540 A CN201910565540 A CN 201910565540A CN 110334757 A CN110334757 A CN 110334757A
Authority
CN
China
Prior art keywords
privacy budget
sequence
secret protection
privacy
big data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910565540.7A
Other languages
Chinese (zh)
Inventor
徐小龙
范泽轩
孙雁飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201910565540.7A priority Critical patent/CN110334757A/en
Publication of CN110334757A publication Critical patent/CN110334757A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of secret protection clustering method and computer storage medium towards big data analysis, method the following steps are included: data normalization and Selection Center point, calculate minimum privacy budget and distribute privacy budget sequence, divide sample point to nearest central point, generate parameter addition noise during Laplacian noise, in the updating heart point thereto, continuous iteration until the difference of the error sum of squares of adjacent iteration twice less than threshold value or reaches maximum number of iterations.The present invention obeys the noise of laplacian distribution by the intermediate parameters addition into clustering algorithm implementation procedure to protect the sensitive information in data set; solves the problems, such as leak data collection sensitive information in clustering algorithm implementation procedure; improve the mode of the privacy budget allocation of difference secret protection clustering algorithm; the availability that cluster result is improved under identical secret protection degree, solves the problems, such as the privacy leakage in big data cluster result.

Description

Secret protection clustering method and computer storage medium towards big data analysis
Technical field
The present invention relates to a kind of secret protection clustering method and computer storage mediums, more particularly to one kind towards big number According to the secret protection clustering method and computer storage medium of analysis.
Background technique
Currently, data mining is increasingly valued by people, mass data is excavated using machine learning algorithm Analysis can obtain a large amount of valuable new knowledges and new rule.Clustering is as more commonly used in the field of data mining Method, be all widely used under the scenes such as data prediction, target group's classification, pattern-recognition and image segmentation.K is equal It is also most commonly used algorithm that value, which is the most simple and effective in big data cluster analysis, but during the execution of the algorithm, updates matter Need to calculate the sample size of each cluster and the sum of each attribute when the heart, the sensitive letter of these operation meeting leak data collection Breath.
Difference privacy is a kind of data-privacy protection technique, upsets data by way of adding noise, while can Retain the property in terms of the statistics of data.Therefore it is combined using difference secret protection technology with clustering algorithm, can protect number It is not revealed according to the sensitive information of collection and obtains relatively accurate cluster result.There is one for existing secret protection clustering algorithm A little shortcomings, random selection and privacy the budget consumption of initial point are too fast all to cause cluster result availability undesirable.Separately Outside, the excessive problem of the random noise that conventional privacy budget allocation is easy to cause is still without solution.
Summary of the invention
Goal of the invention: the secret protection cluster towards big data analysis that the technical problem to be solved in the present invention is to provide a kind of Method and computer storage medium solve conventional privacy budget allocation and are easy to cause random noise excessive, to influence to cluster The problem of outcome quality, improves the mode of the privacy budget allocation of difference secret protection clustering algorithm, proposes a kind of equal difference Privacy budget allocation mode improves the availability of cluster result under identical secret protection degree, solves big data cluster and digs Privacy leakage problem in pick.
Technical solution: the secret protection clustering method of the present invention towards big data analysis, comprising the following steps:
(1) data concentrated to data are normalized;
(2) data set is equally divided into k subset, a sample point is randomly choosed in each subset as initial center Point;
(3) total privacy budget ε and maximum number of iterations t is setm, calculate minimum privacy budget εmWith the number of iterations t=ε/ εmIf t > tm, then privacy budget sequence is distributed using equal difference privacy budget allocation mode, if t≤tm, then using average Privacy budget allocation mode distributes privacy budget sequence, obtains privacy budget sequence εp, wherein 1≤p≤tm
(4) for all sample points in data set, its Euclidean distance for arriving k central point is calculated separately, by sample point Nearest central point is distributed to, data set is divided into k cluster C={ C1,C2,…,Ck};
(5) according to privacy budget sequence εpIn corresponding item generate the random number of laplacian distribution;
(6) C is clustered for eachj, wherein 1≤j≤k, calculates the sum of cluster sample point number num and sample point Vector sum adds noise to it respectively and obtains num ' and sum ', and above-mentioned noise is the random of laplacian distribution in step (5) Number;
(7) each cluster C is updatedjCentral point be sum '/num ', wherein 1≤j≤k;
(8) error sum of squares is calculated, if the absolute value of the difference of the error sum of squares of this and previous iteration is less than setting Threshold value or the number of iterations reach upper limit tm, then terminate to execute, obtain cluster result, otherwise go to step 4 continue to execute it is next Secondary iteration.
Further, minimum privacy budget ε in step (3)mCalculation method are as follows:
Wherein, N is the record number of data set, and d is the dimension of data, and ρ is the average value per one-dimensional centroid estimation.
Further, the equal difference privacy budget allocation mode in step (3) specifically:
It is t that total privacy budget ε, which is decomposed into length,mIncremental arithmetic progression, the sequence initial term be εm, the sequence All and be ε, the ordered series of numbers inverted order is obtained into privacy budget sequence εp
Further, the average privacy budget allocation mode in step (3) specifically:
It is t that total privacy budget ε, which is decomposed into length,mAverage ordered series of numbers, the sequence is privacy budget sequence εp
Further, random number is to obey the laplacian distribution point that location parameter is 0, scale parameter is b in step (5) Random number, wherein b=d+1/ ε ', d are the dimension of data, and ε ' is according to current iteration number from privacy budget sequence εpIn look into The numerical value for the corresponding position looked for.
Further, the initial center point in step (2) be each subset in randomly choose a sample point after be added with Machine noise obtains.
Computer storage medium of the present invention, is stored thereon with computer program, and the computer program is being counted Calculation machine processor realizes the above-mentioned secret protection clustering method towards big data analysis when executing.
The utility model has the advantages that the present invention has following technical effect that
1, privacy budget sequence is generated using equal difference privacy budget allocation method, calculates minimum privacy budget ε firstm, so Privacy budget sequence is calculated using sum of arithmetic series formula and general term formula afterwards, the privacy budget sequence is gentle, solves Privacy budget present in existing method consumes too fast problem;
2, using equal difference privacy budget allocation method, linear distribution is pressed into total privacy budget, solves existing method distribution Privacy budget early period is excessive, problem that the later period is too small.When total privacy budget very little, even less than minimum privacy budget εm When, the present invention uses equalitarian distribution method, and the too small influence algorithm of the privacy budget of distribution is avoided to execute as far as possible.Compared to existing Method, the present invention have higher cluster availability and better clustering result quality.
Detailed description of the invention
Fig. 1 is the method flow diagram of embodiment of the present invention;
Fig. 2 is equal difference privacy budget allocation method flow diagram of the invention;
Fig. 3 is the cluster approve- useful index comparison diagram of method and comparison algorithm of the invention;
Fig. 4 is the privacy budget sequence results comparison diagram of method and two points of distribution methods of the invention, sum of series distribution method.
Specific embodiment
The method flow diagram of present embodiment is as shown in Figure 1, be specifically implemented according to the following steps:
Step 1, existing Image.csv data set, the data set come from university computer institute, eastern Finland cluster data collection (http://cs.joensuu.fi/sipu/datasets/).Remember that the data set is D, data set record number N is 34112, data Dimension d is 3, i.e., every data has 3 attributes.The size of total privacy budget ε control secret protection degree, ε is arranged smaller, institute The noise of addition is bigger, and secret protection degree is higher.Here total privacy budget ε is set as 0.8, clusters number k is 3, every number According to a sample point being considered as in k dimension space.Every one-dimensional data of data set D is normalized into [0,1].
Data normalization is to zoom to every one-dimensional data in [0,1], is carried out by following formula:
Wherein, for any one dimension of data, x is the data of this dimension, min and max be respectively minimum value and Maximum value, x ' are the data after normalization.
Step 2, pretreated data set D is equally divided into k subset { S1,S2,…,Sk, from each subset SiIn with Machine selects a sample point oi, wherein 1≤i≤k, is added after random noise as initial central point { u1,u2,…,uk}.This In, data set D is equally divided into 3 subset { S1,S2,S3, a sample point is randomly selected from each subset, and noise is added Initial center point is obtained later, as a result are as follows:
u1[0 0.08130081 0.00473934]
u2[0.44230769 0.27235772 0.16587678]
u3[0.65384615 0.43089431 0.1943128]。
Step 3, privacy budget sequence ε is obtainedp, wherein 1≤p≤tm.Maximum number of iterations t is setm, calculate minimum privacy Budget εm, and the number of iterations t=ε/ε is thus calculatedmIf t > tm, then distributed using equal difference privacy budget allocation mode Privacy budget sequence;If t < tm, then privacy budget sequence is distributed using average privacy budget allocation mode;It finally obtains hidden Private budget sequence { ε12,…,εtm}.Privacy budget sequence allocation flow is as shown in Figure 2.
Minimum privacy budget εmCalculation formula are as follows:
Wherein, N indicates that the record number of data-oriented collection, d are dimension, and k is the number of cluster, and ρ is per one-dimensional centroid estimation Average value, when data normalization is to [0,1], value 0.45.
It is t that total privacy budget is decomposed into a length by equal difference privacy budget allocation modemIncremental arithmetic progression, the number Each single item in column is the privacy budget consumed in corresponding the number of iterations.Concrete operations are the ε for acquiring step 3mAs etc. The initial term a of difference series1, total privacy budget ε as the ordered series of numbers all and Sn, arithmetic progression can be calculated by following formula Tolerance dt:
an=a1+(n-1)dt,
Obtain tolerance dtLater and then length is obtained as tmIncremental arithmetic progression, by this ordered series of numbers inverted order up to required hidden Private budget sequence, privacy budget sequence not necessarily all run out of.
Average privacy budget allocation mode is exactly that total privacy budget is consumed every time by maximum number of iterations mean allocation Privacy budget is ε/tm.Mean allocation also can be regarded as a kind of special equal difference distribution that tolerance is 0.
Specifically, setting maximum number of iterations tmIt is 8, minimum privacy budget ε is calculatedm=0.031, then t=ε/εm= 25.806, because of t > tm, so calculating privacy budget sequence ε using equal difference privacy budget allocation methodp, wherein 1≤p≤8.First Tolerance d is calculatedt=0.0197, the occurrence of each single item is then calculated according to arithmetic progression general term formula, finally by falling Row obtains the required privacy budget sequence successively decreased, result be 0.169,0.14928571,0.12957143,0.10985714, 0.09014286,0.07042857,0.05071429,0.031 }.
Step 4, for calculating all the points in data set D, its Euclidean distance for arriving k central point is calculated separately, by this Sample point distributes to nearest central point, and data set D is divided into k cluster C={ C1,C2,…,Ck}。
Specifically, all the points in data set D are calculated separately its Euclidean distance to 3 central points, by this sample point Nearest central point is distributed to, data set D is divided into 3 cluster C={ C1,C2,C3}。
Step 5, current iteration institute noise to be added is calculated, which is that obedience location parameter is 0, scale parameter b Laplacian distribution divide random number, be denoted as Lap (b), wherein b=Δ f/ ε ', Δ f indicate susceptibility, ε ' be privacy budget.It draws Pula this distribution probability density function beHere the susceptibility of data is related with dimension, Δ f =d+1, privacy budget are that ε ' is according to current iteration number from privacy budget sequence εpThe numerical value of the corresponding position of middle lookup, institute Lap (Δ f/ ε ') is expressed as with noise.
Specifically, searching corresponding privacy budget from privacy budget sequence obtained in step 3 according to the number of iteration εp, susceptibility Δ f=3+1=4, so first time iteration, ε1It is 0.169, noise size is Lap (4/0.169);Second repeatedly Generation, ε2It is 0.1493, noise size is Lap (4/0.1493), below and so on.
Step 6, C is clustered for eachj, wherein 1≤j≤k, calculates cluster sample point number num and sample point And vector sum, the noise added respectively to it in step 5 obtains num ' and sum '.Specifically, clustering C for eachj, Wherein 1≤j≤3, calculate cluster sample point number num and sample point and vector sum.The concrete outcome of first time iteration Are as follows:
Cluster C1Num be 1406 and vector sum be [240.29 177.76 107.42];
Cluster C2Num be 12301 and vector sum be [4665.25 3686.47 2473.31];
Cluster C3Num be 20405 and vector sum be [13469.21 11385.21 8768.39];
Then the noise in step 5 is added to it respectively and obtains num ' and sum ', the noise of first time iteration addition is Lap (4/0.169), concrete outcome are as follows:
Cluster C1Num ' be 1421.99 and vector sum ' be [284.77 190.18 108.46];
Cluster C2Num ' be 12281.82 and vector sum ' be [4688.87 3697.67 2566.92];
Cluster C3Num ' be 20396.29 and vector sum ' be [13466.97 11402.30 8739.17];
Step 7, each cluster C is updatedjCenter uj'=sum '/num ', wherein 1≤j≤3;Then first time iteration The center concrete outcome of update are as follows:
u1′[0.20026401 0.13374381 0.07627629]
u2′[0.38177298 0.30106816 0.20900154]
u3′[0.66026546 0.55903804 0.42846875]。
Step 8, error sum of squares is calculated, is set if the absolute value of the difference of the error sum of squares of this and previous iteration is less than The threshold value or the number of iterations set reach upper limit tm, then terminate to execute, obtain cluster result, otherwise go to step 4 and continue to execute. The error sum of squares refers specifically to the sum of the distance of the central point of point and this class in each cluster.Threshold value can voluntarily be set It sets, the threshold value of setting decides the number of iterations, theoretically can be set to 0, but due to the randomness of noise, is set as 0 meeting Cause the number of iterations excessive, therefore threshold value can suitably be relaxed, is set as 100 here.
The method of the present embodiment is compared with current existing two kinds of algorithms.For different ε values, respectively by this three A algorithm is run 10 times, is calculated F-measure index with their result and standard K mean algorithm result, is calculated with this to evaluate The cluster availability of method.The codomain of F-measure is [0,1], the cluster result for showing the algorithm closer to 1 and standard without making an uproar Sound result is more similar, shows that cluster availability is higher.F-measure index comparison diagram of three kinds of algorithms on Image data set As shown in Figure 3.
Fig. 4 is the method for the present embodiment and the comparison diagram of existing two methods distribution privacy budget sequence.In iteration early period In, existing two methods have already consumed by most of total privacy budget, and the privacy budget that the middle and later periods gets is seldom, and too small is hidden Private budget is easy to cause much noise to influence algorithmic statement.And the privacy budget sequence that method of the invention is distributed is in Linear distribution is also more sufficient in the privacy budget that mid-term is got, it is not easy to the case where excess noise algorithm of interference convergence occur.
The present invention is a kind of secret protection clustering method towards big data analysis, and it is poly- that this method improves existing difference privacy The privacy budget allocation mode of class algorithm solves existing method privacy budget consumption using equal difference privacy budget allocation mode It is too fast, the problems such as iteration later period noise is excessive, under identical secret protection degree, improve cluster result availability.The present invention It can be applied to protect personal information not to be leaked in the process the process of the clustering of big data.Such as to doctor When treating the progress cluster result such as data, commercial consumption data and position data, these data include a large amount of privacy of user, are made The privacy leakage problem in data acquisition and algorithm implementation procedure can be effectively taken precautions against with method of the invention, while retaining data Statistical property and excavate effectiveness.
If the embodiment of the present invention is realized and when sold or used as an independent product in the form of software function module, Also it can store in a computer readable storage medium.Based on this understanding, the technical solution of the embodiment of the present invention Substantially the part that contributes to existing technology can be embodied in the form of software products in other words, the computer software Product is stored in a storage medium, including some instructions are used so that computer equipment (can be personal computer, Server or the network equipment etc.) execute all or part of each embodiment the method for the present invention.And storage above-mentioned is situated between Matter, which includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read Only Memory), magnetic or disk etc. are various, to deposit Store up the medium of program code.It is combined in this way, present example is not limited to any specific hardware and software.
Correspondingly, being stored thereon with computer program the embodiments of the present invention also provide a kind of computer storage medium. When the computer program is executed by processor, the aforementioned secret protection clustering method towards big data analysis may be implemented. For example, the computer storage medium is computer readable storage medium.
It should be understood by those skilled in the art that, embodiments herein can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the application Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the application, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The application is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present application Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Claims (7)

1. a kind of secret protection clustering method towards big data analysis, which comprises the following steps:
(1) data concentrated to data are normalized;
(2) data set is equally divided into k subset, a sample point is randomly choosed in each subset as initial center point;
(3) total privacy budget ε and maximum number of iterations t is setm, calculate minimum privacy budget εmWith the number of iterations t=ε/εm, such as Fruit t > tm, then privacy budget sequence is distributed using equal difference privacy budget allocation method, if t≤tm, then pre- using average privacy Point counting method of completing the square distributes privacy budget sequence, obtains privacy budget sequence εp, wherein 1≤p≤tm
(4) for all sample points in data set, its Euclidean distance for arriving k central point is calculated separately, sample point is distributed To nearest central point, data set is divided into k cluster C={ C1,C2,…,Ck};
(5) according to privacy budget sequence εpIn corresponding item generate the random number of laplacian distribution;
(6) C is clustered for eachj, wherein 1≤j≤k, calculate cluster sample point number num and sample point and vector Sum adds noise to it respectively and obtains num ' and sum ', and above-mentioned noise is the random number of laplacian distribution in step (5);
(7) each cluster C is updatedjCentral point be sum '/num ', wherein 1≤j≤k;
(8) error sum of squares is calculated, if the absolute value of the difference of the error sum of squares of this and previous iteration is less than setting threshold value Or the number of iterations reaches upper limit tm, then terminate to execute, obtain cluster result, otherwise go to step 4 and continue to execute and change next time Generation.
2. the secret protection clustering method according to claim 1 towards big data analysis, which is characterized in that step (3) Middle minimum privacy budget εmCalculation method are as follows:
Wherein, N is the record number of data set, and d is the dimension of data, and ρ is the average value per one-dimensional centroid estimation.
3. the secret protection clustering method according to claim 1 towards big data analysis, which is characterized in that step (3) In equal difference privacy budget allocation method specifically:
It is t that total privacy budget ε, which is decomposed into length,mIncremental arithmetic progression, the sequence initial term be εm, the sequence is all And be ε, the ordered series of numbers inverted order is obtained into privacy budget sequence εp
4. the secret protection clustering method according to claim 1 towards big data analysis, which is characterized in that step (3) In average privacy budget allocation method specifically:
It is t that total privacy budget ε, which is decomposed into length,mAverage ordered series of numbers, the sequence is privacy budget sequence εp
5. the secret protection clustering method according to claim 1 towards big data analysis, it is characterised in that: step (5) Middle random number be obey location parameter be 0, the laplacian distribution point random number that scale parameter is b, wherein b=d+1/ ε ', d For the dimension of data, ε ' is according to current iteration number from privacy budget sequence εpThe numerical value of the corresponding position of middle lookup.
6. the secret protection clustering method according to claim 1 towards big data analysis, it is characterised in that: step (2) In each subset of initial center point in randomly choose a sample point after be added random noise obtain.
7. a kind of computer storage medium, is stored thereon with computer program, it is characterised in that: the computer program is being counted Calculation machine processor realizes method as claimed in any one of claims 1 to 6 when executing.
CN201910565540.7A 2019-06-27 2019-06-27 Secret protection clustering method and computer storage medium towards big data analysis Pending CN110334757A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910565540.7A CN110334757A (en) 2019-06-27 2019-06-27 Secret protection clustering method and computer storage medium towards big data analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910565540.7A CN110334757A (en) 2019-06-27 2019-06-27 Secret protection clustering method and computer storage medium towards big data analysis

Publications (1)

Publication Number Publication Date
CN110334757A true CN110334757A (en) 2019-10-15

Family

ID=68144509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910565540.7A Pending CN110334757A (en) 2019-06-27 2019-06-27 Secret protection clustering method and computer storage medium towards big data analysis

Country Status (1)

Country Link
CN (1) CN110334757A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750725A (en) * 2019-10-24 2020-02-04 河北经贸大学 Privacy-protecting user portrait generation method, terminal device and storage medium
CN111242196A (en) * 2020-01-06 2020-06-05 广西师范大学 Differential privacy protection method for interpretable deep learning
CN111563272A (en) * 2020-04-30 2020-08-21 支付宝实验室(新加坡)有限公司 Information statistical method and device
CN111444545B (en) * 2020-06-12 2020-09-04 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties
CN111914285A (en) * 2020-06-09 2020-11-10 深圳大学 Geographical distributed graph calculation method and system based on differential privacy
CN112202542A (en) * 2020-09-30 2021-01-08 清华-伯克利深圳学院筹备办公室 Data perturbation method, device and storage medium
CN112199722A (en) * 2020-10-15 2021-01-08 南京邮电大学 K-means-based differential privacy protection clustering method
CN112347088A (en) * 2020-10-28 2021-02-09 南京邮电大学 Data reliability optimization method, storage medium and equipment
CN112613065A (en) * 2020-12-02 2021-04-06 北京明朝万达科技股份有限公司 Data sharing method and device based on differential privacy protection
CN112767693A (en) * 2020-12-31 2021-05-07 北京明朝万达科技股份有限公司 Vehicle driving data processing method and device
CN113094751A (en) * 2021-04-21 2021-07-09 山东大学 Personalized privacy data processing method, device, medium and computer equipment
CN113537308A (en) * 2021-06-29 2021-10-22 中国海洋大学 Two-stage k-means clustering processing system and method based on localized differential privacy
CN113609523A (en) * 2021-07-29 2021-11-05 南京邮电大学 Vehicle networking private data protection method based on block chain and differential privacy
CN114117540A (en) * 2022-01-25 2022-03-01 广州天鹏计算机科技有限公司 Big data analysis processing method and system
CN114817985A (en) * 2022-04-22 2022-07-29 广东电网有限责任公司 Privacy protection method, device, equipment and storage medium for electricity consumption data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN108280491A (en) * 2018-04-18 2018-07-13 南京邮电大学 A kind of k means clustering methods towards difference secret protection
CN108549904A (en) * 2018-03-28 2018-09-18 西安理工大学 Difference secret protection K-means clustering methods based on silhouette coefficient

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means
CN108549904A (en) * 2018-03-28 2018-09-18 西安理工大学 Difference secret protection K-means clustering methods based on silhouette coefficient
CN108280491A (en) * 2018-04-18 2018-07-13 南京邮电大学 A kind of k means clustering methods towards difference secret protection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
C. DWORK: "Differential privacy", 《PROCEEDINGS OF 39TH INTERNATIONAL COLLOQUIUM ON AUTOMATA, LANGUAGES AND PROGRAMMING》 *
SU D ET AL: "Differentially private k-means clustering", 《PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY》 *
尚涛等: "基于等差隐私预算分配的大数据决策树算法", 《工程科学与技术》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110750725A (en) * 2019-10-24 2020-02-04 河北经贸大学 Privacy-protecting user portrait generation method, terminal device and storage medium
CN111242196A (en) * 2020-01-06 2020-06-05 广西师范大学 Differential privacy protection method for interpretable deep learning
CN111242196B (en) * 2020-01-06 2022-06-21 广西师范大学 Differential privacy protection method for interpretable deep learning
CN111563272A (en) * 2020-04-30 2020-08-21 支付宝实验室(新加坡)有限公司 Information statistical method and device
CN111914285A (en) * 2020-06-09 2020-11-10 深圳大学 Geographical distributed graph calculation method and system based on differential privacy
CN111914285B (en) * 2020-06-09 2022-06-17 深圳大学 Geographic distributed graph calculation method and system based on differential privacy
WO2021248937A1 (en) * 2020-06-09 2021-12-16 深圳大学 Geographically distributed graph computing method and system based on differential privacy
CN111444545B (en) * 2020-06-12 2020-09-04 支付宝(杭州)信息技术有限公司 Method and device for clustering private data of multiple parties
CN112202542A (en) * 2020-09-30 2021-01-08 清华-伯克利深圳学院筹备办公室 Data perturbation method, device and storage medium
CN112199722A (en) * 2020-10-15 2021-01-08 南京邮电大学 K-means-based differential privacy protection clustering method
CN112347088B (en) * 2020-10-28 2024-02-20 南京邮电大学 Data credibility optimization method, storage medium and equipment
CN112347088A (en) * 2020-10-28 2021-02-09 南京邮电大学 Data reliability optimization method, storage medium and equipment
CN112613065B (en) * 2020-12-02 2024-08-20 北京明朝万达科技股份有限公司 Data sharing method and device based on differential privacy protection
CN112613065A (en) * 2020-12-02 2021-04-06 北京明朝万达科技股份有限公司 Data sharing method and device based on differential privacy protection
CN112767693A (en) * 2020-12-31 2021-05-07 北京明朝万达科技股份有限公司 Vehicle driving data processing method and device
CN113094751A (en) * 2021-04-21 2021-07-09 山东大学 Personalized privacy data processing method, device, medium and computer equipment
CN113537308B (en) * 2021-06-29 2023-11-03 中国海洋大学 Two-stage k-means clustering processing system and method based on localized differential privacy
CN113537308A (en) * 2021-06-29 2021-10-22 中国海洋大学 Two-stage k-means clustering processing system and method based on localized differential privacy
CN113609523B (en) * 2021-07-29 2022-04-01 南京邮电大学 Vehicle networking private data protection method based on block chain and differential privacy
CN113609523A (en) * 2021-07-29 2021-11-05 南京邮电大学 Vehicle networking private data protection method based on block chain and differential privacy
CN114117540B (en) * 2022-01-25 2022-04-29 广州天鹏计算机科技有限公司 Big data analysis processing method and system
CN114117540A (en) * 2022-01-25 2022-03-01 广州天鹏计算机科技有限公司 Big data analysis processing method and system
CN114817985A (en) * 2022-04-22 2022-07-29 广东电网有限责任公司 Privacy protection method, device, equipment and storage medium for electricity consumption data

Similar Documents

Publication Publication Date Title
CN110334757A (en) Secret protection clustering method and computer storage medium towards big data analysis
Got et al. Hybrid filter-wrapper feature selection using whale optimization algorithm: A multi-objective approach
Arora et al. Analysis of k-means and k-medoids algorithm for big data
Mahmud et al. Improvement of K-means clustering algorithm with better initial centroids based on weighted average
Chen et al. APSCAN: A parameter free algorithm for clustering
Davis et al. Grids versus graphs: Partitioning space for improved taxi demand-supply forecasts
CN107783998A (en) The method and device of a kind of data processing
Al Abd Alazeez et al. EDDS: An enhanced density-based method for clustering data streams
Bharanidharan et al. Improved chicken swarm optimization to classify dementia MRI images using a novel controlled randomness optimization algorithm
Wang et al. M2SPL: Generative multiview features with adaptive meta-self-paced sampling for class-imbalance learning
Kwedlo A hybrid steady-state evolutionary algorithm using random swaps for Gaussian model-based clustering
CN117407921A (en) Differential privacy histogram release method and system based on must-connect and don-connect constraints
Zhang et al. Scalegcn: Efficient and effective graph convolution via channel-wise scale transformation
Kanezashi et al. An incremental local-first community detection method for dynamic graphs
CN104899232A (en) Cooperative clustering method and cooperative clustering equipment
CN105589896B (en) Data digging method and device
CN114298245A (en) Anomaly detection method and device, storage medium and computer equipment
Lingras Evolutionary rough K-means clustering
Cai et al. The multi-task learning with an application of Pareto improvement
Azad et al. Modified constrained differential evolution for solving nonlinear global optimization problems
Rahman et al. AWST: A Novel Attribute Weight Selection Technique for Data Clustering.
Santos et al. A comparative study of GPU metaheuristics for data clustering
Fahim et al. Unsupervised Space Partitioning for Nearest Neighbor Search
Yu et al. Genetic-based K-means algorithm for selection of feature variables
Chen et al. Power Grid Missing Data Filling Method Based on Historical Data Mining Assisted Multi-dimensional Scenario Analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191015