CN117216796B - Energy big data privacy protection method based on privacy class - Google Patents

Energy big data privacy protection method based on privacy class Download PDF

Info

Publication number
CN117216796B
CN117216796B CN202311238880.1A CN202311238880A CN117216796B CN 117216796 B CN117216796 B CN 117216796B CN 202311238880 A CN202311238880 A CN 202311238880A CN 117216796 B CN117216796 B CN 117216796B
Authority
CN
China
Prior art keywords
privacy
data
record
class
privacy protection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311238880.1A
Other languages
Chinese (zh)
Other versions
CN117216796A (en
Inventor
张宸
于翔
詹昕
王春蕾
刘钰
严安
刘全
崔惠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou Power Supply Branch Of State Grid Jiangsu Electric Power Co ltd
Original Assignee
Yangzhou Power Supply Branch Of State Grid Jiangsu Electric Power Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou Power Supply Branch Of State Grid Jiangsu Electric Power Co ltd filed Critical Yangzhou Power Supply Branch Of State Grid Jiangsu Electric Power Co ltd
Priority to CN202311238880.1A priority Critical patent/CN117216796B/en
Publication of CN117216796A publication Critical patent/CN117216796A/en
Application granted granted Critical
Publication of CN117216796B publication Critical patent/CN117216796B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an energy big data privacy protection method based on privacy class, which comprises the following steps: firstly, classifying the features of an original data set through privacy class marking; secondly, clustering the data set based on a K-means++ clustering algorithm aiming at the characteristics of the power grid data; and finally, searching optimal noise parameter configuration for different categories by utilizing a genetic algorithm, and balancing anonymization effect and query result accuracy. The method can provide privacy protection with finer granularity according to the sensitivity and importance of the data, improves the availability of the published data, and effectively differentiates the sensitivity of the query function.

Description

Energy big data privacy protection method based on privacy class
Technical Field
The invention belongs to the field of data privacy protection, and particularly relates to an energy big data privacy protection method based on a privacy class.
Background
Along with the rapid development of information technologies such as the Internet of things and artificial intelligence, the intelligent power grid becomes more and more intelligent and informationized, and brings about the influence of aspects on social life. Smart grids serve as an important infrastructure, aggregating large amounts of sensitive power data. Such sensitive data typically requires external distribution and sharing for data analysis and mining by off-grid research institutions. The power grid enterprises provide personalized services for the data users, and meanwhile potential privacy disclosure and data security problems are brought.
Privacy preserving technology has received extensive attention from the industry and a number of researchers have developed different studies against this problem. Existing privacy protection techniques can be categorized into three techniques, data encryption based techniques, restricted release based techniques, and data distortion based techniques.
Data encryption-based techniques typically employ techniques such as symmetric encryption and asymmetric encryption, homomorphic encryption, etc. to cryptographically protect data to ensure that only authorized users can decrypt and access the data. However, the data encryption technology algorithm is complex in data processing procedure and large in calculation overhead.
Technologies based on restricted distribution limit and control the distribution of data, but an attacker can analyze the privacy information by using the associated information through background knowledge.
Techniques based on data distortion achieve user privacy protection by moderately perturbing or distorting the original data, but may result in reduced usability and accuracy of the data.
The existing differential privacy model can not easily infer individual privacy information even if an attacker obtains any other data except target data by adding noise to the query result to disturb the data. However, noise introduction may lead to inaccuracy of the query results and loss of information, reducing availability of published data.
Disclosure of Invention
The present invention aims to solve at least one of the technical problems existing in the prior art.
The technical scheme of the invention is as follows: an energy big data privacy protection method based on privacy class, the method comprising the following steps:
step 1, carrying out privacy grade marking and screening on an original data set, selecting part of attributes as a feature set, selecting a candidate set with high correlation with the feature set by utilizing a pearson correlation coefficient, and finally forming an output data set together;
step 2, realizing classification of the data set based on a K-means++ clustering algorithm;
And 3, searching optimal noise parameter configuration for different categories by utilizing a genetic algorithm, and carrying out differential privacy on each group of classified data.
In step 1, the method comprises the following steps:
step 1-1, extracting an original electric power data set from a power grid database;
step 1-2, setting W as a power data set, wherein the power data set comprises n data records, a privacy class list rank= {1,2,3,4,5}, dividing the power grid privacy class into 5 classes according to the privacy classes, wherein each record comprises a plurality of attributes (A1, the first-order, the An), and the five-order privacy class is the highest-order privacy;
Step 1-3, dividing the attribute with the privacy level higher than the threshold mu into a feature set F= { F 1,f2,...,fk }, and taking the remaining data set F' as a candidate set;
and step 1-4, outputting a final data set classified based on the privacy level.
In the steps 1-4, the method comprises the following steps:
step 1-4-1, calculating a pearson correlation coefficient r between the candidate set variable f' and the feature set variable f:
Wherein n is the number of records;
step 1-4-2, comparing the pearson correlation coefficient with a threshold value rho, and screening out a feature set B with high correlation with the feature set;
Step 1-4-3, forming the final dataset d=f.
In step 2, it includes:
Step 2-1, randomly selecting a record t from data as a first type clustering center according to the attribute with the highest privacy class;
Step 2-2, calculating the Euclidean distance between each record variable x i∈n and the current cluster record variable x j∈n on different attributes m:
Wherein x im and x jm represent the mth attribute of the f record and the mth attribute of the j record, respectively, and M is the number of attributes;
Step 2-3, calculating a probability P i that each record is selected as the next cluster center, the probability of which is obtained by the following formula:
wherein d ij represents the Euclidean distance between record i and record j, n is the number of records;
step 2-4, selecting the next cluster center by using a roulette method;
step 2-5, repeating the steps 2-3, 2-4 and 2-5 until K cluster centers are selected;
And 2-6, classifying the data set into K classes through a K-means algorithm.
In the steps 2-6, the method comprises the following steps:
step 2-6-1, calculating the distance between each record and each clustering center point, and dividing the records into nearest clustering centers;
Step 2-6-2, calculating the average value of all records classified into each category, and taking the average value as a clustering center of each category;
And step 2-6-3, repeating the above two steps until the data set is divided into K classes.
In step 3, it includes:
step 3-1, searching optimal noise parameter configurations for different categories by utilizing a genetic algorithm;
and 3-2, performing differential privacy protection on each well-classified record.
In step 3-1, it includes:
Step 3-1-1, defining a multi-objective fitness function fit:
fit(x)=w1*Anon(x)+w2*Accuracy(x)
where x represents the noise parameter configuration, including the differential privacy budget epsilon and the global sensitivity deltaf, Representing the degree of anonymization of data calculated from noise parameter configuration x,/>Representing the accuracy of the query result calculated according to the noise parameter configuration x, W1 and W2 being weight coefficients;
Step 3-1-2, initializing a population; randomly generating an initial population p= { x 1,x2,...,xn }, comprising N individuals, each individual representing a noise parameter configuration, i.e. x i=(εi,Δfi);
The initial population P needs to be binary coded, decimal to binary, mapped by:
x′=g(x)
wherein the mapping g functions to map the decimal value x to a binary value x';
Step 3-1-3, evaluating fitness; calculating fitness value fit i =fit (xi) for each individual x i e P;
step 3-1-4, selecting operation; selecting a parent individual from the population P by using a roulette method, and constructing a parent set Q= { Q 1,q2,...,qn };
Step 3-1-5, performing cross operation; generating a child set R= { R 1,r2,...,rn } for individuals in the parent set Q through a crossover operator;
Step 3-1-6, mutation operation; introducing randomness to individuals in the child set R through mutation operators, and generating a mutated child set M= { M 1,m2,...,mn }, through mutation operator gene mutation;
step 3-1-7, evaluating fitness; carrying out fitness evaluation on each child generation individual M i epsilon M, and calculating a fitness value fit i =fit (mi);
Step 3-1-8, updating the population; generating a new population P= { P 1,p2,...,pn } with large adaptability from the child set M;
Step 3-1-9, repeating the steps 3-1-4 to 3-1-8 until the maximum iteration times are reached or the maximum adaptability meeting the requirements is found;
Step 3-1-10, returning the individual with the best fitness value, namely the best noise parameter configuration;
And 3-1-11, performing the operations on the divided K groups of data respectively to obtain K groups of optimal noise parameter configurations.
In step 3-2, it includes:
Step 3-2-1, adding noise to each piece of data in a group by using a Laplace mechanism, so as to realize data privacy protection:
q′=q+Laplace(b)
Wherein b=Δf/epsilon represents the noise dispersion degree, and the global sensitivity Δf and the privacy budget epsilon are found by using a genetic algorithm through the step 3-1 to find the optimal noise parameter configuration; q is data before privacy protection, and q' is data after privacy protection;
Step 3-2-2, repeating the step 3-2-1 for K groups, each group performing differential privacy operations using the configuration corresponding to its optimal noise parameters;
and step 3-2-3, outputting the data set after privacy protection.
The privacy class includes:
The five-level privacy class comprises the electricity utilization time, the electricity consumption amount and the electricity utilization behavior of the user;
The fourth-level privacy class comprises sensitive personal identity information such as the name, the ID card number, the address and the like of the user;
The three-level privacy class comprises real-time positions and movement tracks of the user;
The secondary privacy level comprises electricity utilization habit, activity time period and energy saving consciousness of the user;
the primary privacy level includes total power usage statistics and trend analysis for a particular region or country.
Compared with the prior art, the invention has the following remarkable advantages:
1) And adopting a non-uniform privacy protection strategy, and adopting personalized privacy protection measures aiming at different types of data according to the characteristics and privacy requirements of the data. The optimal noise parameter configuration is searched through a genetic algorithm, and optimization is carried out on each data set, so that more flexible and fine privacy protection is realized.
2) Comprehensively considering a plurality of targets such as data anonymization degree, query result accuracy and the like. Through the definition of the fitness function and the optimization process of the genetic algorithm, the data privacy can be protected while the accuracy of the query result is maintained as much as possible.
3) Searching for the optimal noise parameter configuration by using a genetic algorithm, and performing personalized optimization for each data set. Therefore, the method can better adapt to the characteristics and privacy requirements of different data sets, and improves the privacy protection effect and the data processing efficiency.
4) The method has good expandability and adaptability in the aspect of processing the privacy protection problem of the power grid data. The method can adapt to grid data sets with different scales and complexity, and is optimized in a customized mode according to actual requirements, and has strong universality and adaptability.
The invention solves the problem of privacy protection and data release sharing by combining cluster analysis and differential privacy. The method effectively differentiates the sensitivity of the query function and improves the availability of the published data.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a flow of feature classification based on privacy classes in the present invention;
FIG. 3 is a flowchart of a clustering and heterogeneous privacy preserving method in an embodiment of the present invention;
Fig. 4 is a functional block diagram of the present invention.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
1-4, The invention provides an energy big data privacy protection method based on privacy class, which comprises the following steps:
step 1, marking and screening privacy classes of an original data set; selecting part of attributes as feature sets, selecting candidate sets with high correlation with the feature sets by using pearson correlation coefficients, and finally forming an output data set together;
The step 1 is used for preprocessing data, so that the method is more in line with the actual scene. For example, the power data is to issue the data to the bank, wherein not all the information is extremely high in privacy, for example, the power consumption time privacy level is low, privacy protection is not needed, and finally the data-oriented privacy protection is realized.
Step 2, realizing classification of the data set based on a K-means++ clustering algorithm;
by combining the clustering method, the follow-up privacy operation is facilitated.
And step 3, performing non-uniform privacy protection processing. And searching optimal noise parameter configuration for different categories by using a genetic algorithm, and carrying out differential privacy on each group of classified data.
Because the traditional method does not consider non-uniform privacy protection, the data are always subjected to uniform privacy protection, and the difference of data fine granularity is not considered.
The method can adapt to the grid data sets with different scales and complexity, and is optimized in a customized mode according to actual requirements, and has strong universality and adaptability.
Further, in step 1, it includes:
step 1-1, extracting an original electric power data set from a power grid database;
Step 1-2, let W be the power data set, including n data records, privacy level list rank= {1,2,3,4,5}, according to the privacy level, the invention classifies the power grid privacy level into 5 levels, each record contains several attributes (a 1..a., an), five levels of privacy level are the highest level of privacy.
(1) Five-level privacy level (FIRST LEVEL PRIVACY): such information may include sensitive content such as user lifestyle, behavior patterns, etc., and is generally considered the highest level of privacy. It is required to be classified into higher-level privacy such as detailed information of the user's power consumption time, power consumption amount, and power consumption behavior.
(2) Four-level privacy level (Second LEVEL PRIVACY): such information needs to be classified into a higher level of privacy and need to be strictly protected from sensitive personal identification information such as the user's name, identification number, address, etc.
(3) Three-level privacy level (THIRD LEVEL PRIVACY): such information relates to the whereabouts and range of motion of the user, and therefore requires a high level of privacy protection, such as sensitive location information, e.g., real-time location and movement trajectories of the user.
(4) Secondary privacy level (fourier LEVEL PRIVACY): such information may reveal life style and behavioral characteristics of the user, and need to be classified into general levels of privacy, such as information of the user's electricity usage habits, activity time periods, and energy conservation consciousness.
(5) Primary privacy level (FIFTH LEVEL PRIVACY): such information is typically used for energy planning and decision making, requiring a lower level of privacy protection, such as total power usage statistics and trend analysis for a particular region or country. .
Step 1-3, dividing the attribute with the privacy level higher than the threshold mu into a feature set F= { F 1,f2,...,fk }, and taking the remaining data set F' as a candidate set;
The threshold value is used as a super parameter and can be artificially selected in the [1,5] interval.
And step 1-4, outputting a final data set classified based on the privacy level.
Further, in the steps 1-4, it includes:
step 1-4-1, calculating a pearson correlation coefficient r between the candidate set variable f' and the feature set variable f:
Wherein n is the number of records;
step 1-4-2, comparing the pearson correlation coefficient with a threshold value rho, and screening out a feature set B with high correlation with the feature set;
Step 1-4-3, forming the final dataset d=f.
As is known from d=fub, the feature set to be processed is composed of the feature set F having a high privacy level and the feature set B having a high similarity thereto. The purpose of such processing is to perform privacy protection only for these data, and to ignore irrelevant (low-privacy-level) data, thereby improving privacy calculation efficiency and distribution efficiency.
Fig. 2 is a flow chart of feature classification based on privacy classes in this embodiment, and by performing privacy class marking and screening on an original data set, data with higher correlation with the feature set is finally obtained, so as to improve the correlation and usefulness of the data.
Further, in step 2, it includes:
Step 2-1, randomly selecting a record t from data as a first type clustering center according to the attribute with the highest privacy class;
Step 2-2, calculating the Euclidean distance between each record variable x i∈n and the current cluster record variable x j∈n on different attributes m:
Wherein x im and x jm represent the mth attribute of the ith record and the mth attribute of the jth record, respectively, and M is the attribute number;
Step 2-3, calculating a probability P i that each record is selected as the next cluster center, the probability of which is obtained by the following formula:
wherein d ij represents the Euclidean distance between record i and record j, n is the number of records;
Step 2-4, selecting the next cluster center by using a roulette method; the principle of roulette selection is: the record with larger distance has larger probability of being selected as the clustering center;
step 2-5, repeating the steps 2-3, 2-4 and 2-5 until K cluster centers are selected;
And 2-6, classifying the data set into K classes through a K-means algorithm.
Further, in the step 2-6, it includes:
step 2-6-1, calculating the distance between each record and each clustering center point, and dividing the records into nearest clustering centers;
Step 2-6-2, calculating the average value of all records classified into each category, and taking the average value as a clustering center of each category;
And step 2-6-3, repeating the above two steps until the data set is divided into K classes.
Step 2-6 classifies the records and then differential privacy is performed for each class, thus forming a "heterogeneous privacy preserving policy".
Further, in step 3, it includes:
step 3-1, searching optimal noise parameter configurations for different categories by utilizing a genetic algorithm;
The traditional method uses unified noise to disturb all data, and the invention realizes a nonuniform privacy protection strategy by setting specific noise parameters for different types of records.
And 3-2, performing differential privacy protection on each well-classified record.
Further, in step 3-1, it includes:
Step 3-1-1, the invention comprehensively considers two factors and defines a multi-objective fitness function fit:
fit(x)=w1*Anon(x)+w2*Accuracy(x)
where x represents the noise parameter configuration, including the differential privacy budget epsilon and the global sensitivity deltaf, Representing the degree of anonymization of data calculated from noise parameter configuration x,/>Representing the accuracy of the query result calculated according to the noise parameter configuration x, W1 and W2 being weight coefficients;
Step 3-1-2, initializing a population; randomly generating an initial population p= { x 1,x2,...,xn }, comprising N individuals, each individual representing a noise parameter configuration, i.e. x i=(εi,Δfi);
The initial population P needs to be binary coded, decimal to binary, mapped by:
x′=g(x)
wherein the mapping g functions to map the decimal value x to a binary value x';
Step 3-1-3, evaluating fitness; calculating fitness value fit i =fit (xi) for each individual x i e P;
Step 3-1-4, selecting operation; selecting a parent individual from the population P by using a roulette method, wherein the larger the adaptability is, the larger the selected probability is, and constructing a parent set Q= { Q 1,q2,...,qn };
Step 3-1-5, performing cross operation; the individuals in the parent set Q are exchanged by crossover operators, i.e. the codes of several of them located at the same position are randomly exchanged, so as to generate new child individuals. Generating a child set r= { R 1,r2,...,rn };
Step 3-1-6, mutation operation; the individuals in the sub-generation set R introduce certain randomness through mutation operators, namely, randomly changing the coding of one position. Generating a mutated offspring set M= { M 1,m2,...,mn } through mutation of mutation operator genes;
step 3-1-7, evaluating fitness; carrying out fitness evaluation on each child generation individual M i epsilon M, and calculating a fitness value fit i =fit (mi);
Step 3-1-8, updating the population; generating a new population P= { P 1,p2,...,pn } with large adaptability from the child set M;
Step 3-1-9, repeating the steps 3-1-4 to 3-1-8 until the maximum iteration times are reached or the maximum adaptability meeting the requirements is found;
step 3-1-10, returning the individual with the best fitness value, i.e. the best noise parameter configuration (differential privacy budget, global sensitivity);
And 3-1-11, performing the operations on the divided K groups of data respectively to obtain K groups of optimal noise parameter configurations.
Further, in step 3-2, it includes:
Step 3-2-1, adding noise to each piece of data in a group by using a Laplace mechanism, so as to realize data privacy protection:
q′=q+Laplace(b)
Wherein b=Δf/epsilon represents the noise dispersion degree, and the global sensitivity Δf and the privacy budget epsilon are found by using a genetic algorithm through the step 3-1 to find the optimal noise parameter configuration; q is data before privacy protection, and q' is data after privacy protection;
Step 3-2-2, repeating the step 3-2-1 for K groups, each group performing differential privacy operations using the configuration corresponding to its optimal noise parameters;
and step 3-2-3, outputting the data set after privacy protection.
Fig. 3 is a flowchart of a clustering and non-uniform privacy preserving method in this embodiment. Data are divided into different clusters through attribute cluster analysis with highest privacy class, and a non-uniform differential privacy technology is applied to each cluster. In this way, the usability of the data can be improved while protecting the privacy of the data.
In summary, the method provided by the invention provides non-uniform privacy protection, and solves the problem of sharing privacy protection and data release by combining cluster analysis and differential privacy. The non-uniform privacy protection makes up for the shortages of inaccurate query results, information loss, low availability of published data and the like in the traditional method. The method and the device effectively improve the usability of the data while protecting the privacy of the data.
The foregoing has outlined and described the basic principles, features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (5)

1. The energy big data privacy protection method based on the privacy class is characterized by comprising the following steps:
step 1, carrying out privacy grade marking and screening on an original data set, selecting part of attributes as a feature set, selecting a candidate set with high correlation with the feature set by utilizing a pearson correlation coefficient, and finally forming an output data set together;
In step 1, the method comprises the following steps:
step 1-1, extracting an original electric power data set from a power grid database;
Step 1-2, setting W as a power data set, wherein the power data set comprises n data records, a privacy class list rank= {1,2,3,4,5}, dividing the power grid privacy class into 5 classes according to the privacy classes, wherein each record comprises a plurality of attributes (A1, the first-order, the An), and the five-order privacy class is the highest-order privacy;
Step 1-3, dividing the attribute with the privacy level higher than the threshold mu into a feature set F= { F 1,f2,...,fk }, and taking the remaining data set F' as a candidate set;
Step 1-4, outputting a final data set classified based on privacy classes;
step 2, realizing classification of the data set based on a K-means++ clustering algorithm;
step 3, searching optimal noise parameter configuration for different categories by utilizing a genetic algorithm, and carrying out differential privacy on each group of classified data;
in step 3, it includes:
step 3-1, searching optimal noise parameter configurations for different categories by utilizing a genetic algorithm;
step 3-2, performing differential privacy protection on each well-classified record;
In step 3-1, it includes:
Step 3-1-1, defining a multi-objective fitness function fit:
fit(x)=w1*Anon(x)+w2*Accuracy(x)
where x represents the noise parameter configuration, including the differential privacy budget epsilon and the global sensitivity deltaf, Representing the degree of anonymization of data calculated from noise parameter configuration x,/>Representing the accuracy of the query result calculated according to the noise parameter configuration x, W1 and W2 being weight coefficients;
Step 3-1-2, initializing a population; randomly generating an initial population p= { x 1,x2,...,xn }, comprising N individuals, each individual representing a noise parameter configuration, i.e. x i=(εi,Δfi);
The initial population P needs to be binary coded, decimal to binary, mapped by:
x′=g(x)
wherein the mapping g functions to map the decimal value x to a binary value x';
Step 3-1-3, evaluating fitness; calculating fitness value fit i =fit (xi) for each individual x i e P;
step 3-1-4, selecting operation; selecting a parent individual from the population P by using a roulette method, and constructing a parent set Q= { Q 1,q2,...,qn };
Step 3-1-5, performing cross operation; generating a child set R= { R 1,r2,...,rn } for individuals in the parent set Q through a crossover operator;
Step 3-1-6, mutation operation; introducing randomness to individuals in the child set R through mutation operators, and generating a mutated child set M= { M 1,m2,...,mn }, through mutation operator gene mutation;
step 3-1-7, evaluating fitness; carrying out fitness evaluation on each child generation individual M i epsilon M, and calculating a fitness value fit i =fit (mi);
Step 3-1-8, updating the population; generating a new population P= { P 1,p2,...,pn } with large adaptability from the child set M;
Step 3-1-9, repeating the steps 3-1-4 to 3-1-8 until the maximum iteration times are reached or the maximum adaptability meeting the requirements is found;
Step 3-1-10, returning the individual with the best fitness value, namely the best noise parameter configuration;
step 3-1-11, performing the operations on the divided K groups of data respectively to obtain K groups of optimal noise parameter configurations;
in step 3-2, it includes:
Step 3-2-1, adding noise to each piece of data in a group by using a Laplace mechanism, so as to realize data privacy protection:
q′=q+Laplace(b)
Wherein b=Δf/epsilon represents the noise dispersion degree, and the global sensitivity Δf and the privacy budget epsilon are found by using a genetic algorithm through the step 3-1 to find the optimal noise parameter configuration; q is data before privacy protection, and q' is data after privacy protection;
Step 3-2-2, repeating the step 3-2-1 for K groups, each group performing differential privacy operations using the configuration corresponding to its optimal noise parameters;
and step 3-2-3, outputting the data set after privacy protection.
2. The privacy class-based energy big data privacy protection method according to claim 1, wherein in step 1-4, comprising:
step 1-4-1, calculating a pearson correlation coefficient r between the candidate set variable f' and the feature set variable f:
Wherein n is the number of records;
step 1-4-2, comparing the pearson correlation coefficient with a threshold value rho, and screening out a feature set B with high correlation with the feature set;
Step 1-4-3, forming the final dataset d=f.
3. The privacy class-based energy big data privacy protection method according to claim 2, wherein in step 2, comprising:
Step 2-1, randomly selecting a record t from data as a first type clustering center according to the attribute with the highest privacy class;
Step 2-2, calculating the Euclidean distance between each record variable x i∈n and the current cluster record variable x j∈n on different attributes m:
Wherein x im and x jm represent the mth attribute of the ith record and the mth attribute of the jth record, respectively, and M is the attribute number;
Step 2-3, calculating a probability P i that each record is selected as the next cluster center, the probability of which is obtained by the following formula:
wherein d ij represents the Euclidean distance between record i and record j, n is the number of records;
step 2-4, selecting the next cluster center by using a roulette method;
step 2-5, repeating the steps 2-3, 2-4 and 2-5 until K cluster centers are selected;
And 2-6, classifying the data set into K classes through a K-means algorithm.
4. The privacy class-based energy big data privacy protection method according to claim 3, wherein in step 2-6, comprising:
step 2-6-1, calculating the distance between each record and each clustering center point, and dividing the records into nearest clustering centers;
Step 2-6-2, calculating the average value of all records classified into each category, and taking the average value as a clustering center of each category;
And step 2-6-3, repeating the above two steps until the data set is divided into K classes.
5. The privacy class-based energy big data privacy protection method of claim 1,
The privacy class includes:
The five-level privacy class comprises the electricity utilization time, the electricity consumption amount and the electricity utilization behavior of the user;
The fourth-level privacy class comprises sensitive personal identity information such as the name, the ID card number, the address and the like of the user;
The three-level privacy class comprises real-time positions and movement tracks of the user;
The secondary privacy level comprises electricity utilization habit, activity time period and energy saving consciousness of the user;
the primary privacy level includes total power usage statistics and trend analysis for a predetermined area or country.
CN202311238880.1A 2023-09-22 2023-09-22 Energy big data privacy protection method based on privacy class Active CN117216796B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311238880.1A CN117216796B (en) 2023-09-22 2023-09-22 Energy big data privacy protection method based on privacy class

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311238880.1A CN117216796B (en) 2023-09-22 2023-09-22 Energy big data privacy protection method based on privacy class

Publications (2)

Publication Number Publication Date
CN117216796A CN117216796A (en) 2023-12-12
CN117216796B true CN117216796B (en) 2024-05-28

Family

ID=89044070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311238880.1A Active CN117216796B (en) 2023-09-22 2023-09-22 Energy big data privacy protection method based on privacy class

Country Status (1)

Country Link
CN (1) CN117216796B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491644A (en) * 2022-02-15 2022-05-13 辽宁工业大学 Differential privacy data publishing method meeting personalized privacy budget allocation
CN116186757A (en) * 2022-12-21 2023-05-30 南京航空航天大学 Method for publishing condition feature selection differential privacy data with enhanced utility

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019122854A1 (en) * 2017-12-18 2019-06-27 Privitar Limited Data product release method or system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491644A (en) * 2022-02-15 2022-05-13 辽宁工业大学 Differential privacy data publishing method meeting personalized privacy budget allocation
CN116186757A (en) * 2022-12-21 2023-05-30 南京航空航天大学 Method for publishing condition feature selection differential privacy data with enhanced utility

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘晓迁 ; 李千目 ; .基于聚类匿名化的差分隐私保护数据发布方法.通信学报.2016,(05),全文. *
基于聚类匿名化的差分隐私保护数据发布方法;刘晓迁;李千目;;通信学报;20160525(05);全文 *

Also Published As

Publication number Publication date
CN117216796A (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN111199016B (en) Daily load curve clustering method for improving K-means based on DTW
Eseye et al. Machine learning based integrated feature selection approach for improved electricity demand forecasting in decentralized energy systems
Li et al. Efficiency analysis of machine learning intelligent investment based on K-means algorithm
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
Zhou et al. Multiobjective evolutionary algorithms: A survey of the state of the art
Wu et al. A sparse Gaussian process regression model for tourism demand forecasting in Hong Kong
CN110245783B (en) Short-term load prediction method based on C-means clustering fuzzy rough set
Wang et al. Manifold interpolation for large-scale multiobjective optimization via generative adversarial networks
Han et al. Short-term forecasting of individual residential load based on deep learning and K-means clustering
Mehmanpazir et al. Development of an evolutionary fuzzy expert system for estimating future behavior of stock price
Eseye et al. Adaptive predictor subset selection strategy for enhanced forecasting of distributed PV power generation
Almannaa et al. A novel supervised clustering algorithm for transportation system applications
Geetha et al. Prediction of domestic power peak demand and consumption using supervised machine learning with smart meter dataset
Shi et al. Handling uncertainty in financial decision making: a clustering estimation of distribution algorithm with simplified simulation
CN113780679A (en) Load prediction method and device based on ubiquitous power Internet of things
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
Liu et al. An improved fuzzy trajectory clustering method for exploring urban travel patterns
Yang et al. An Evidence Combination Method based on DBSCAN Clustering.
Game et al. Optimized Decision tree rules using divergence based grey wolf optimization for big data classification in health care
Xu et al. PSARE: A RL-based online participant selection scheme incorporating area coverage ratio and degree in mobile crowdsensing
Yin et al. A fuzzy clustering based collaborative filtering algorithm for time-aware POI recommendation
CN117495109B (en) Power stealing user identification system based on neural network
Wang et al. Dynamic multiobjective squirrel search algorithm based on decomposition with evolutionary direction prediction and bidirectional memory populations
Chen et al. Knowledge-inspired subdomain adaptation for cross-domain knowledge transfer
CN113435101A (en) Power failure prediction method for support vector machine based on particle swarm optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant