CN117235770A - Power data sharing analysis system and method based on differential privacy - Google Patents

Power data sharing analysis system and method based on differential privacy Download PDF

Info

Publication number
CN117235770A
CN117235770A CN202311411585.1A CN202311411585A CN117235770A CN 117235770 A CN117235770 A CN 117235770A CN 202311411585 A CN202311411585 A CN 202311411585A CN 117235770 A CN117235770 A CN 117235770A
Authority
CN
China
Prior art keywords
data
power data
privacy
power
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311411585.1A
Other languages
Chinese (zh)
Inventor
王红凯
毛冬
王嘉琦
饶涵宇
陈祖歌
谢裕清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Zhejiang Electric Power Co Ltd
Priority to CN202311411585.1A priority Critical patent/CN117235770A/en
Publication of CN117235770A publication Critical patent/CN117235770A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of power data processing, in particular to a power data sharing analysis system and method based on differential privacy, wherein the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for predicting a power sample data set, generating a power data prediction data set, homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encrypting data set; the differential privacy distribution module is used for inputting the homomorphic encryption data set of the power data into the multi-layer neural network model for differential privacy enhancement processing to obtain class probability distribution, and carrying out privacy budget distribution according to the class probability distribution; and the sharing display module is used for sharing the homomorphic encryption data set of the power data according to the privacy budget allocation result. The invention realizes the balance between data privacy protection and data analysis through homomorphic encryption, differential privacy and other technologies, and can also maintain the data accuracy and time correlation requirements while protecting the power data privacy.

Description

Power data sharing analysis system and method based on differential privacy
Technical Field
The invention relates to the technical field of power data processing, in particular to a power data sharing analysis system and method based on differential privacy.
Background
With the continuous development of information technology, data collection and sharing of a power system become more and more important. Accurate prediction and analysis of power data is of great importance for energy supply, scheduling, management and planning, however, sharing and analysis of power data involves privacy problems, particularly when the data relates to user privacy information, and thus differential privacy techniques have been developed in order to protect user privacy.
The existing power data privacy protection method mainly comprises the methods of data desensitization, data encryption, data shielding and the like, but the methods sacrifice the accuracy and the effectiveness of the data to a certain extent, and meanwhile, the traditional privacy protection method also influences the time correlation and the prediction capability of the data while protecting the data privacy because the power data usually comprises time sequence characteristics; the traditional data encryption method ensures confidentiality of data in the data transmission process, but needs decryption in the data analysis process, and sensitive information is easy to expose; in the data desensitization and masking method, the accuracy and integrity of the data are reduced due to the data being modified or partially hidden, thereby affecting the accurate prediction and analysis of the power data.
In addition, some methods in the prior art still have the risk of privacy disclosure, for example, in the data desensitizing method, although the desensitized data does not contain direct identification information, by combining external information and background knowledge, it is still possible to identify part of sensitive information, so that it is a problem to be solved in a method that can protect the data privacy and maintain the accuracy and time correlation of the data.
Disclosure of Invention
The invention provides a power data sharing analysis system and method based on differential privacy, which solve the technical problem that the existing power data privacy protection method cannot meet the requirements of protecting the power data privacy and simultaneously maintaining the data accuracy and time correlation.
In order to solve the technical problems, the invention provides a power data sharing analysis system and method based on differential privacy.
In a first aspect, the invention provides a differential privacy-based power data sharing analysis system, which comprises a data processing module, a differential privacy distribution module and a sharing display module;
the data processing module is used for obtaining a plurality of sample power data time sequences to form a power sample data set, predicting according to the power sample data set to generate a power data prediction data set, and homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encryption data set; the power data homomorphic encryption data set comprises a plurality of homomorphic encryption sample power data time sequences and a sample power data prediction sequence;
The differential privacy distribution module is used for inputting the electric power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, and carrying out privacy budget distribution on each sequence in the electric power data homomorphic encryption data set according to the category probability distribution to obtain a privacy budget distribution result;
and the sharing display module is used for responding to the received sharing instruction and sharing the homomorphic encryption data set of the power data according to the privacy budget allocation result.
In a further embodiment, the multi-layer neural network model comprises a number of hidden layers; inputting the homomorphic encryption data set of the power data into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, wherein the category probability distribution comprises the following specific steps of:
each hidden layer of the multi-layer neural network model is used for carrying out feature extraction on each sequence in the homomorphic encryption data set of the power data, random privacy noise and a differential privacy mechanism are introduced into each hidden layer, and the activation output of the last hidden layer is mapped into a probability space to obtain category probability distribution; the random privacy noise is added into the input of the hidden layer, and the differential privacy mechanism is deployed in each hidden layer output result of the multi-layer neural network model.
In a further embodiment, the formula of adding the random privacy noise in the input of the hidden layer is expressed as:
in the method, in the process of the invention,representing the output result of the ith sequence in the first hidden layer in the homomorphic encryption data set of the power data; />Representing the output result of the ith sequence in the (l-1) th hidden layer in the homomorphic encryption data set of the power data;the function representation will +.>Limited to the interval [ -C, C]An inner part; />Random privacy noise representing the first hidden layer weight matrix; c represents a truncation parameter; sigma (sigma) l Representing an activation function of the first hidden layer; Δf represents sensitivity; e represents privacy budget; l represents the total number of hidden layers;
the weight matrix W (l) Gradient of (2)Adding incremental privacy noise to obtain a composite weight matrix gradientAnd updating the weight matrix according to the composite weight matrix gradient:
in which W is (l) Representing a weight matrix;representing the gradient of the weight matrix; />Representing a composite weight matrix gradient; η represents the learning rate.
In a further embodiment, the deploying the differential privacy mechanism in each hidden layer output result of the multi-layer neural network model is specifically:
wherein Lap represents Laplacian distribution; delta x Representing the privacy parameter range.
In a further embodiment, the predicting is performed according to the power sample data set, and a power data prediction data set is generated, specifically:
smoothing each sample power data time sequence in the power sample data set to obtain a smoothed power data time sequence;
performing wavelet transformation on the smooth power data time sequence to obtain a frequency domain coefficient;
calculating the power data prediction information gain of each frequency domain coefficient, and screening out the frequency domain coefficient corresponding to the highest value of the power data prediction information gain as a power input characteristic;
and predicting the power input characteristics by adopting a support vector machine regression model to obtain corresponding power data prediction values, thereby forming a power data prediction data set.
In a further embodiment, the support vector machine regression model is specifically:
wherein f (x) represents a regression model function of the support vector machine; w (w) q Weights representing the power input characteristics;representing a q-th power input feature; b represents bias; n represents the number of frequency domain coefficients corresponding to the highest value of the power data prediction information gain;
the loss function adopted by the support vector machine regression model in the training stage is as follows:
In the formula, loss represents a loss function of a regression model of the support vector machine; w represents a weight vector supporting a regression model of the vector machine; i 2 Represents the square of the L2 norm; a represents regularization parameters; e represents a tolerance threshold; t represents the t timeA point of separation; l (L) t A time series of sample power data representing a t-th time point.
In a further embodiment, the frequency domain coefficients are calculated as:
wherein,
wherein C is j Frequency domain coefficients representing a j-th scale;a smoothed power data time sequence representing a t-th point in time; ψ (·) represents the wavelet basis functions; f represents the wavelet period; k represents the radius of the smoothing window.
In a second aspect, the present invention provides a method for power data sharing analysis based on differential privacy, the method comprising the steps of:
acquiring a plurality of sample power data time sequences to form a power sample data set, predicting according to the power sample data set to generate a power data prediction data set, and homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encryption data set; the power data homomorphic encryption data set comprises a plurality of homomorphic encryption sample power data time sequences and a sample power data prediction sequence;
Inputting the electric power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, and carrying out privacy budget allocation on each sequence in the electric power data homomorphic encryption data set according to the category probability distribution to obtain a privacy budget allocation result;
and responding to the received sharing instruction, and sharing the power data homomorphic encryption data set according to the privacy budget allocation result.
In a third aspect, the present invention also provides a computer device, including a processor and a memory, where the processor is connected to the memory, the memory is used to store a computer program, and the processor is used to execute the computer program stored in the memory, so that the computer device performs steps for implementing the method.
In a fourth aspect, the present invention also provides a computer readable storage medium having stored therein a computer program which when executed by a processor performs the steps of the above method.
The invention provides a power data sharing analysis system and a method based on differential privacy, wherein the system carries out homomorphic encryption on a power sample data set and a power data prediction data set through a data processing module to obtain a power data homomorphic encryption data set; differential privacy enhancement processing and privacy budget allocation are carried out through a differential privacy allocation module; and sharing the homomorphic encrypted data through a sharing display module. Compared with the prior art, the system not only protects the data privacy through homomorphic encryption, differential privacy and other technologies, but also can enable different samples to obtain different privacy protection levels according to the privacy requirements, thereby realizing accurate prediction and sharing display of the power data and fully balancing the requirements of the data privacy and data analysis.
Drawings
FIG. 1 is a block diagram of a differential privacy-based power data sharing analysis system provided by an embodiment of the present invention;
fig. 2 is a schematic flow chart of a power data sharing analysis method based on differential privacy according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following examples are given for the purpose of illustration only and are not to be construed as limiting the invention, including the drawings for reference and description only, and are not to be construed as limiting the scope of the invention as many variations thereof are possible without departing from the spirit and scope of the invention.
Referring to fig. 1, an embodiment of the present invention provides a power data sharing analysis system based on differential privacy, and as shown in fig. 1, the system includes a data processing module 101, a differential privacy allocation module 102, and a sharing display module 103.
The data processing module 101 is configured to obtain a power sample data set including a plurality of sample power data time sequences, predict the power sample data set to generate a power data prediction data set, and homomorphic encrypt the power sample data set and the power data prediction data set to obtain a power data homomorphic encrypted data set; the power data homomorphic encryption data set comprises a plurality of homomorphic encryption sample power data time sequences and a sample power data prediction sequence, and the homomorphic encryption allows calculation in an encryption state without decrypting data; the method comprises the steps of predicting a shared electric power encryption data set to generate an electric power data predicted value, wherein the specific steps are as follows:
Smoothing each encrypted sample power data time sequence in the shared power encrypted data set to obtain a smoothed power data time sequence;
performing wavelet transformation on the smooth power data time sequence to obtain a frequency domain coefficient;
calculating the power data prediction information gain of each frequency domain coefficient, and screening out the frequency domain coefficient corresponding to the highest value of the power data prediction information gain as a power input characteristic;
and predicting the power input characteristics by adopting a support vector machine regression model to obtain a power data predicted value.
In a particular embodiment, the present embodiment time-series L each sample power data in a power sample data set t Smoothing by moving average method to obtain smoothed power data time sequenceThe specific formula is as follows:
wherein L is t A time series of sample power data for a t-th time point;a smoothed power data time series for a t-th time point; k is the radius of the smooth window;
then, the present embodiment will smooth the power data time seriesPerforming wavelet transformation to obtain frequency domain coefficient C j The specific calculation formula of the frequency domain coefficient is as follows:
wherein j represents a wavelet scale; c (C) j Frequency domain coefficients representing a j-th scale; ψ (·) represents the wavelet basis functions; f represents the wavelet period;
The present embodiment calculates each frequency domain coefficient C based on the information gain theory j For the importance of power data prediction, the power data prediction information gain of each frequency domain coefficient is obtained, when the importance of each frequency domain coefficient for power data prediction is calculated, the target variable for power data prediction needs to be determined first, and the initial entropy H (Y) of the target variable is calculated, wherein the initial entropy H (Y) is used for representing the uncertainty of the target variable, the initial entropy can be obtained by calculating the probability distribution of the target variable under different categories, the data set is divided according to the frequency domain coefficient Cj to obtain different subsets, and for each frequency domain coefficient, the conditional entropy H (Y|C j ) Conditional entropy H (Y|C j ) Representing uncertainty of target variable knowing frequency domain coefficients byCalculating information gain IG (C j ) The importance of each frequency domain coefficient to the power data prediction is obtained, and the information gain is defined as the initial entropy minus the conditional entropy: g (C) j )=H(Y)-H(Y|C j ) The present embodiment orders the calculated information gains in order from high to low to determine a number of frequency domain coefficients that are most helpful for the power data prediction task.
Meanwhile, the frequency domain coefficient with the highest importance is selected as the power input characteristic, namely, the frequency domain coefficient corresponding to the highest value of the power data prediction information gain; using a support vector machine regression model as a prediction model, inputting the power input characteristics into the support vector machine regression model to obtain a power data prediction sequence, and forming a power data prediction data set; in this embodiment, a support vector machine regression algorithm is used to perform model training, and the objective of SVM regression is to find a linear or nonlinear function, so as to maximize the interval between the sample point and the target value in the feature space, where in this embodiment, the support vector machine regression model specifically is:
wherein f (x) represents a regression model function of the support vector machine; w (w) q Weights representing the power input characteristics;representing a q-th power input feature; b represents bias; and N represents the number of frequency domain coefficients corresponding to the highest value of the power data prediction information gain.
Specifically, in the frequency domain analysis, by calculating the information gain, the frequency domain coefficient with the highest information gain is selected, the frequency domain coefficient is considered as the important feature capable of affecting the prediction of the power data, meanwhile, the embodiment focuses on the frequency domain components with significant influence in the prediction of the power data by selecting the frequency domain coefficient with the highest importance as the feature, which helps to improve the prediction precision of the model, the support vector machine regression model in the embodiment adopts a linear combination mode to weight and sum the selected frequency domain coefficients to establish a model capable of predicting the power data more accurately, the comprehensive influence of the input features can be represented, the interaction and the association between the input features are captured, the model can model the change of the power data according to the weight of the frequency domain coefficient, each frequency domain coefficient is multiplied by a corresponding weight, the weights represent the importance degree of the prediction of each feature, the bias term is used for adjusting the bias of the model, the bias term can be considered as a prediction reference line, and an additional constant term is introduced into the model to adapt to the bias in the actual data, which helps to reflect the actual situation more accurately. The loss function adopted by the support vector machine regression model in the training stage is as follows:
In the formula, loss represents a loss function of a regression model of the support vector machine; w represents a weight vector supporting a regression model of the vector machine; i 2 Represents the square of the L2 norm; a represents regularization parameters; e represents a tolerance threshold; t represents the t-th time point.
In particular, the method comprises the steps of,representing L2 regularization term to prevent model weight from becoming excessive, thereby avoiding overfitting, C t max(0,|L t -f(x t ) | -e) represents a penalty on the prediction error of each sample, which is a form of a loss function supporting vector machine regression, commonly referred to as "hinge loss" or "Huber loss", for each sample, the actual power data value L is calculated t And model predictive value f (x t ) The difference between the two values is compared with a tolerance threshold e to obtain an error value, max (0, difference) is used to represent a larger value between the maximum error value and 0, which is equivalent to punishing when the difference exceeds the tolerance threshold e, and the penalty function punishs those prediction errors exceeding by accumulating punishment terms of all samplesSamples of the tolerance threshold.
The L2 regularization term controls the complexity of the model by restraining the size of the weight vector, is beneficial to avoiding excessive fitting of data on a training set, improves the generalization capability of the model, can effectively process abnormal values in the data by a hinge loss term, can be regarded as a punishment term for prediction errors, and has no punishment to the model when the prediction errors are smaller than a tolerance threshold value, and the loss term is 0; when the prediction error is greater than the threshold value, the loss gradually increases linearly, so that the model is more concerned with relatively larger errors, the model is helped to better adapt to data, the regularization parameters control the balance between the regularization items and the hinge loss items, and the larger regularization parameters can increase the influence of the hinge loss, so that the model is more concerned with fitting the data; the regularization is emphasized more by smaller regularization parameters, so that the over-fitting is prevented, and therefore, the regularization parameters can be adjusted to find balance between the preference and generalization of the model, in conclusion, the loss function adopted by the support vector machine regression model in the training stage combines the regularization term and the hinge loss term, the complexity of the model is controlled, and the over-fitting is prevented; abnormal values are processed, and robustness of the model is improved; weighing regularization and fitting, and optimizing the performance of the model; and the prediction error is tolerated, and the regression model with stronger generalization capability can be trained by optimizing the loss function according to the sensitivity of the threshold adjustment model to the error.
The differential privacy allocation module 102 is configured to input the power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing, obtain a class probability distribution, and perform privacy budget allocation on each sequence in the power data homomorphic encryption data set according to the class probability distribution, so as to obtain a privacy budget allocation result.
In a specific embodiment, the embodiment sets a power data homomorphic encryption data set D, where the power data homomorphic encryption data set includes N homomorphic encrypted sample power data time sequences and sample power data prediction sequences, and each sequence in the power data homomorphic encryption data set is expressed asWherein (1)>In this embodiment, the multi-layer neural network model includes a plurality of hidden layers, and is provided to include L hidden layers, and the weight matrix of the first hidden layer of the multi-layer neural network model is defined as +.>Wherein h is l The number of neurons that are the hidden layer of the first layer; inputting the homomorphic encryption data set of the power data into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, wherein the category probability distribution comprises the following specific steps of: each sequence in the homomorphic encryption data set of the power data is subjected to feature extraction through each hidden layer of the multi-layer neural network model, random privacy noise and a differential privacy mechanism are introduced into each hidden layer, the random privacy noise is added into the input of the hidden layer to enhance differential privacy, the differential privacy mechanism is deployed in the output result of each hidden layer of the multi-layer neural network model to avoid information leakage, and the activation output of the last hidden layer is mapped into a probability space to obtain category probability distribution; wherein, the process of extracting the characteristics of each sequence in the electric power encryption dataset by each hidden layer of the multi-layer neural network model comprises the following steps:
For the first hidden layer, the output result is:
in the method, in the process of the invention,representing the output result of the ith sequence in the first hidden layer in the homomorphic encryption data set of the power data; w (W) (1) A weight matrix representing a first hidden layer; sigma represents an activation function; x is x i Representing an ith sequence in the homomorphic encryption dataset of the power data; b (1) Representing the bias vector of the first hidden layer.
For the first hidden layer, the output is:
in the method, in the process of the invention,representing the output result of the ith sequence in the first hidden layer in the homomorphic encryption data set of the power data; w (W) (l) A weight matrix representing a first hidden layer; />Representing the output result of the ith sequence in the (l-1) th hidden layer in the homomorphic encryption data set of the power data; sigma represents an activation function; b (l) Representing the bias vector of the hidden layer of the first layer.
The process of inputting noise at each hidden layer is expressed using the following formula:
in the method, in the process of the invention,representing the output result of the ith sequence in the first hidden layer in the homomorphic encryption data set of the power data; />Representing the output result of the ith sequence in the (l-1) th hidden layer in the homomorphic encryption data set of the power data;the function representation will +.>Limited to the interval [ -C, C]An inner part; />Representing a first hidden layer weight matrix W (l) Random privacy noise of (a); c represents a truncation parameter; sigma (sigma) l Representing an activation function of the first hidden layer; Δf represents sensitivity; e represents privacy budget; l represents the total number of hidden layers.
Specifically, the present embodiment uses a method of clipping Noise (Clip Noise) in the process of inputting Noise at each hidden layer, the purpose of clipping Noise is to limit Noise within a predetermined range while protecting data privacy, so as to avoid introducing excessive disturbance, the Noise is distributed from a normal stateIs obtained by sampling, wherein sigma l Is the privacy noise parameter of each hidden layer, by adding noise to the input of each hidden layer, it is difficult for an attacker to accurately infer the original data or individual information, thereby improving the privacy protection level of the data, and it is noted that the privacy noise parameter sigma l The computation of (a) involves the core concept of differential privacy, which aims to limit the sensitivity of individual privacy protection by adding noise, which measures the degree of variation of the model output when small changes are made in the input data, Δf in the formula, which represents sensitivity, can be calculated usually by defining the range of the sensitivity function, and the privacy budget e, which is a parameter used to control the level of privacy protection, determines the amount of noise added to the noise, and by combining the sensitivity and the privacy budget, the privacy noise parameter σ for each hidden layer can be calculated l It specifies how much disturbance to add to the noise while protecting the privacy.
Meanwhile, the embodiment deploys the differential privacy mechanism in each hidden layer output result of the multi-layer neural network model, namely adds laplace noise in the output result of each hidden layer, which is specifically as follows:
wherein Lap represents Laplacian distribution; delta x Representing the privacy parameter range.
Specifically, the laplace distribution is a probability distribution commonly used in differential privacy, which has a peak near zero point, can provide more noise disturbance to increase the privacy protection of data, the laplace noise can be added to the model output to maintain the usability of data analysis while protecting the individual privacy, the process of adding the laplace noise is actually an application of a differential privacy mechanism, the privacy protection of data can be obviously enhanced by adding noise to the output result of each hidden layer, an attacker cannot accurately infer individual data because the analysis result has a certain uncertainty due to the introduction of the noise, and the delta f and delta in the formula x Representing sensitivity and parameter range, respectively, the sensitivity representing the maximum variation range of the model output when the input data is slightly changed, and the sensitivity delta f and the parameter range delta are adjusted x The distribution range of the Laplace noise can be controlled so as to adjust the noise adding intensity, the epsilon in the formula is privacy budget, the noise adding strength is determined by the Laplace noise adding strength, the smaller epsilon value represents stricter privacy protection, but the accuracy of the data analysis result can be reduced, the larger epsilon value allows more noise and can influence the privacy protection to a certain extent, therefore, by adjusting the epsilon value, a balance point can be found between the privacy protection and the data accuracy, the differential privacy mechanism is applied in the output result of each hidden layer, the Laplace noise adding is performed by using the Laplace noise, the process balances the accuracy of the data analysis and the requirement of the privacy protection by adjusting the sensitivity, the parameter range and the privacy budget, and the individual privacy is protected while still providing the privacy protectionThe results were analyzed using the data.
In this embodiment, a method for mapping an output result of a final hidden layer to a class probability distribution by using a Softmax function is specifically provided as follows:
in the method, in the process of the invention,represents the last layer of hidden layer->Mapping the output results of (a) to a category probability distribution; k represents the number of categories; w (w) k A weight vector representing class k; y is i Discrete values representing classes to which an ith sequence in a homomorphic encrypted dataset of power data belongs, in y i Take an integer value between 1 and K.
Specifically, the Softmax function is an activation function commonly used for multi-classification problems, which maps a vector into a probability distribution, the output of the Softmax function is a K-dimensional vector representing the probability of each class, and the probability distribution of each class can be obtained by passing the output of the neural network through the Softmax function, so that the output of the network more accords with the probability distribution of the actual class, and the formula is thatOutput result representing final hidden layer +.>After Softmax function mapping, the probability that the sample i belongs to the category k is obtained, the probability distribution expresses the estimated probability of the network output to different categories, and the confidence of the network to each category can be better explained by mapping the output result into the category probability distribution, so that data analysts can understand the output of the multi-layer neural network model, and more accurate results can be made in practical applicationJudging and deciding; it should be noted that, the mapping process to the class probability distribution is performed on the premise of protecting individual privacy, and even in the output result, the probability of each class is protected by differential privacy, so that the data analyst can make decisions according to probability information, and is limited by privacy protection, the Softmax function converts the output result of the network into the probability distribution of each class, and the original output of the network is subjected to exponential operation and normalization, so that the probability value can be ensured to be 0,1 ]In the range, the sum of probabilities of all categories is 1, so that the output of the multi-layer neural network model has better interpretability and applicability, under the background of applying differential privacy, the output of the final hidden layer is mapped to category probability distribution by using a Softmax function, individual privacy is protected, and even in the output probability distribution, the probability of each category is protected by differential privacy, thereby balancing the requirements of privacy protection and data analysis.
The embodiment is in the weight matrix W (l) Gradient of (2)Adding incremental privacy noise to obtain a composite weight matrix gradient +.>And updating the weight matrix according to the composite weight matrix gradient:
where η represents a learning rate.
Specifically, the gradient of the embodiment in the weight matrixAdding incremental privacy noiseAcoustic, get the gradient of the composite weight matrix +.>The addition of the incremental privacy noise protects gradient information to a certain extent, so that an attacker can hardly accurately infer the change of the weight, and even if the attacker can observe the output of the model, the attacker can hardly accurately restore the weight of the model, thereby enhancing the privacy of the model, and the model is improved by updating the weight matrix>The weight updating is carried out by using the learning rate eta, the process is a core part of the neural network training, the weight updating process is limited by privacy protection due to the introduction of the incremental privacy noise, the learning rate eta determines the stride of each weight updating, the model can be better converged in the training process by adopting the proper learning rate, but the weight concussion or incapability of converging can be caused by the excessive learning rate, so that the proper learning rate is critical to the balance of the privacy protection and the model performance under the background of differential privacy.
In this embodiment, noise is added to each hidden layer, so as to increase the privacy of data, prevent malicious attackers from deducing personal information through analysis, and meanwhile, in the output result of each hidden layer, the differential privacy distribution module introduces a differential privacy protection mechanism, and it should be noted that the noise added in this embodiment may be random, and conform to a certain distribution, such as: the addition of noise helps to blur individual data, limits the accurate inference capability of an analyst on the individual data, makes it more difficult to infer individual information from shared data, and the protection mechanism helps to prevent privacy disclosure and privacy inference attacks, if the data needs to be aggregated to reduce disclosure risk, the sharing display unit may perform aggregation operation, and the aggregation may be simple summation, average value, or the like, or more complex statistical operation, and the aggregation helps to reduce detailed information of the data, so as to further enhance privacy.
In this embodiment, a privacy budget is allocated to each sequence in the homomorphic encrypted data set of power data, so as to ensure personalized adaptation of differential privacy levels, and after the privacy budget is allocated to each sequence in the homomorphic encrypted data set of power data, the privacy budget is updated, where the privacy budget update formula is as follows:
in E-shape new Representing the updated privacy budget.
Specifically, the privacy budget update formula indicates that after each weight update, the privacy budget is reduced by a certain value, because in the differential privacy mechanism, the privacy budget needs to be adjusted according to the added noise to keep the balance of noise control and privacy protection, the adjustment of the privacy budget ensures the adaptability of the differential privacy mechanism, the addition of the noise gradually affects the performance of the model as the weight update proceeds, the size of the noise can be dynamically adjusted by updating the privacy budget, so as to adapt to the training and privacy protection requirements of the model, the addition amount of the noise gradually increases as the privacy budget decreases, so as to protect individual privacy, however, the larger noise may affect the performance of the model and the accuracy of data analysis, the budget update formula can adapt to the change of the weight update and the noise addition by updating the privacy budget, and the differential privacy mechanism can adjust the privacy protection degree according to the progress of the model training and the requirements of the data analysis while protecting the privacy.
The differential privacy distribution module provided by the embodiment is responsible for analyzing the homomorphic encryption data set of the power data by adopting the multi-layer neural network model while protecting privacy, wherein each hidden layer of the multi-layer neural network model performs feature extraction on each sequence in the homomorphic encryption data set of the power data, noise is added to the input of the hidden layers to enhance differential privacy, a differential privacy mechanism is applied to the output of each hidden layer to help control information leakage, and meanwhile, the embodiment uses a Softmax function to map the output result of the final hidden layer to category probability distribution.
In this embodiment, the sharing display module is configured to respond to a received sharing instruction, and share the power data homomorphic encryption dataset according to the privacy budget allocation result;
in a specific embodiment, since the differential privacy distribution module has performed feature extraction, noise addition and differential privacy protection processing on the power data, these processed data are transmitted as input to the sharing display unit to display the predicted value of the power data through the sharing display module, it should be noted that the sharing display unit needs to decrypt the homomorphic encrypted data set of the power data so as to enable subsequent data display and sharing, the decryption is to convert the encrypted data back to the original data form so as to perform subsequent analysis and display, the decryption needs to perform decryption operation by using the corresponding private key, the decrypted power data is ready to be displayed, these data may include the predicted value of the power data and other features, the sharing display unit may use forms such as charts, tables, images, etc. to display these data to users or related personnel according to specific implementation conditions, which may help users understand the predicted result and trend, change, etc. of the power data, before displaying the data, the sharing display unit also needs to consider enhanced sharing, that is to apply additional privacy protection to convert the encrypted data back to the original data form so as to perform subsequent analysis and display, the decrypted power data may include the predicted value of the power data and other privacy protection information, which may not have the privacy protection effect the privacy protection on the shared data may be more than the privacy protection result.
Because the traditional privacy protection method has contradiction between data privacy protection and data sharing, the privacy protection possibly causes data distortion and affects the effect of data analysis, the embodiment of the invention adopts a multi-layer neural network model to perform characteristic extraction and processing on the power data, and simultaneously adds noise in a hidden layer to enhance the data privacy in a differential privacy mode, thereby realizing personalized processing on each sample through a differential privacy mechanism, and effectively improving the prediction capability of the power data on the premise of not damaging the data accuracy by adding the noise; meanwhile, according to the embodiment, by using the feature selection method based on the information gain theory, the frequency domain coefficient with the highest importance is selected as the feature, so that the input dimension is reduced, the calculation complexity is reduced, and the generalization capability and the prediction accuracy of the model are further improved by using the support vector machine regression as the prediction model; because the power data generally has time sequence characteristics, the embodiment of the invention fully mines time sequence information in the data through wavelet transformation and other methods, improves the analysis effect of the data, simultaneously, performs personalized allocation on the privacy budget of each sample, better adapts to the characteristics of the time sequence data, and the differential privacy allocation unit reasonably controls the addition of noise according to the privacy budget of each encrypted sample, thereby realizing the customization of privacy protection, different samples can obtain different privacy protection levels according to the privacy requirements, fully balancing the requirements of data privacy and data analysis, and in conclusion, through the homomorphic encryption, differential privacy and other technologies, the embodiment not only protects the data privacy, but also realizes the accurate prediction and shared display of the power data, and realizes the balance between the data privacy protection and the data analysis.
The embodiment of the invention provides a power data sharing analysis system based on differential privacy, which comprises a data processing module, a differential privacy distribution module and a sharing display module, wherein the data processing module is used for acquiring a plurality of sample power data time sequences to form a power sample data set, predicting the power sample data set to generate a power data prediction data set, and homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encryption data set; the differential privacy distribution module is used for inputting the electric power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain class probability distribution, and carrying out privacy budget distribution on each sequence in the electric power data homomorphic encryption data set according to the class probability distribution to obtain a privacy budget distribution result; and the sharing display module is used for responding to the received sharing instruction and sharing the power data homomorphic encryption data set according to the privacy budget allocation result. Compared with the prior art, the method and the device have the advantages that through the homomorphic encryption, differential privacy and other technologies, the data privacy is protected, the accurate prediction and sharing display of the power data are realized, the balance between the data privacy protection and the data analysis is ensured, meanwhile, the predicted value of the power data is generated through wavelet transformation and other methods, and the time sequence information in the data is fully mined. The power data sharing analysis system based on differential privacy can be applied to practical application scenes such as external joint calculation, privacy data protection, efficient encryption calculation, communication cost saving and the like, the differential privacy technology is adopted, the technical difficulties of negative influence on the usability of high-frequency reading of the intelligent ammeter and the like can be overcome while the personal electricity consumption condition of a user is hidden, and meanwhile, the time correlation and privacy protection of the data are ensured by organically combining the differential privacy technology with deep learning.
In one embodiment, as shown in fig. 2, an embodiment of the present application provides a power data sharing analysis method based on differential privacy, the method including the steps of:
s1, acquiring a plurality of sample power data time sequences to form a power sample data set, predicting according to the power sample data set to generate a power data prediction data set, and homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encryption data set; the power data homomorphic encryption data set comprises a plurality of homomorphic encryption sample power data time sequences and a sample power data prediction sequence;
s2, inputting the electric power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, and carrying out privacy budget allocation on each sequence in the electric power data homomorphic encryption data set according to the category probability distribution to obtain a privacy budget allocation result;
s3, responding to the received sharing instruction, and sharing the homomorphic encryption data set of the power data according to the privacy budget allocation result.
It should be noted that, the sequence number of each process does not mean that the execution sequence of each process is determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Specific limitation regarding a differential privacy-based power data sharing analysis method may be referred to above as limitation of a differential privacy-based power data sharing analysis system, and will not be described herein. Those of ordinary skill in the art will appreciate that the various modules and steps described in connection with the disclosed embodiments of the application may be implemented in hardware, software, or a combination of both. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application provides a power data sharing analysis method based on differential privacy, which adopts a multi-layer neural network model to perform feature extraction and processing on power data, and adds noise in a hidden layer at the same time, so that the data privacy is enhanced in a differential privacy mode, thereby realizing personalized processing of each sample through a differential privacy mechanism, different samples can obtain different privacy protection levels according to privacy requirements, fully balancing the requirements of data privacy and data analysis, realizing the customization of privacy protection, generating a power data prediction value through wavelet transformation, a support vector machine regression model and other methods, fully mining time sequence information in the power data, and further improving the generalization capability and prediction accuracy of the model.
FIG. 3 is a diagram of a computer device including a memory, a processor, and a transceiver connected by a bus, according to an embodiment of the present application; the memory is used to store a set of computer program instructions and data and the stored data may be transferred to the processor, which may execute the program instructions stored by the memory to perform the steps of the above-described method.
Wherein the memory may comprise volatile memory or nonvolatile memory, or may comprise both volatile and nonvolatile memory; the processor may be a central processing unit, a microprocessor, an application specific integrated circuit, a programmable logic device, or a combination thereof. By way of example and not limitation, the programmable logic device described above may be a complex programmable logic device, a field programmable gate array, general purpose array logic, or any combination thereof.
In addition, the memory may be a physically separate unit or may be integrated with the processor.
It will be appreciated by those of ordinary skill in the art that the structure shown in FIG. 3 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be implemented, and that a particular computer device may include more or fewer components than those shown, or may combine some of the components, or have the same arrangement of components.
In one embodiment, an embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.
The power data sharing system based on the differential privacy realizes prediction and homomorphic encryption on a power sample data set through a data processing module; the privacy budget of each encryption sample is distributed through the differential privacy distribution module, and the addition of noise is reasonably controlled, so that the customization of privacy protection is realized; the sharing display module is used for sharing the homomorphic encryption data set of the power data, so that the data privacy protection is improved, and meanwhile, the data accuracy and the time correlation can be maintained.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., SSD), etc.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above.
The foregoing examples represent only a few preferred embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the application. It should be noted that modifications and substitutions can be made by those skilled in the art without departing from the technical principles of the present application, and such modifications and substitutions should also be considered to be within the scope of the present application. Therefore, the protection scope of the patent of the application is subject to the protection scope of the claims.

Claims (10)

1. The utility model provides a power data sharing analysis system based on difference privacy which characterized in that: the system comprises a data processing module, a differential privacy distribution module and a sharing display module;
the data processing module is used for obtaining a plurality of sample power data time sequences to form a power sample data set, predicting according to the power sample data set to generate a power data prediction data set, and homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encryption data set; the power data homomorphic encryption data set comprises a plurality of homomorphic encryption sample power data time sequences and a sample power data prediction sequence;
The differential privacy distribution module is used for inputting the electric power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, and carrying out privacy budget distribution on each sequence in the electric power data homomorphic encryption data set according to the category probability distribution to obtain a privacy budget distribution result;
and the sharing display module is used for responding to the received sharing instruction and sharing the homomorphic encryption data set of the power data according to the privacy budget allocation result.
2. The differential privacy-based power data sharing analysis system of claim 1, wherein the multi-layer neural network model comprises a plurality of hidden layers; inputting the homomorphic encryption data set of the power data into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, wherein the category probability distribution comprises the following specific steps of:
each hidden layer of the multi-layer neural network model is used for carrying out feature extraction on each sequence in the homomorphic encryption data set of the power data, random privacy noise and a differential privacy mechanism are introduced into each hidden layer, and the activation output of the last hidden layer is mapped into a probability space to obtain category probability distribution; the random privacy noise is added into the input of the hidden layer, and the differential privacy mechanism is deployed in each hidden layer output result of the multi-layer neural network model.
3. The differential privacy-based power data sharing analysis system of claim 2, wherein the formula for adding the random privacy noise to the input of the hidden layer is expressed as:
in the method, in the process of the invention,representing the output result of the ith sequence in the first hidden layer in the homomorphic encryption data set of the power data;representing the output result of the ith sequence in the (l-1) th hidden layer in the homomorphic encryption data set of the power data;the function representation will +.>Limited to the interval [ -C, C]An inner part; />Random privacy noise representing the first hidden layer weight matrix; c represents a truncation parameter; sigma (sigma) l Representing an activation function of the first hidden layer; Δf represents sensitivity; e represents privacy budget; l represents the total number of hidden layers;
the weight matrix W (l) Gradient of (2)Adding incremental privacy noise to obtain a composite weight matrix gradient +.>And updating the weight matrix according to the composite weight matrix gradient:
in which W is (l) Representing a weight matrix;representing the gradient of the weight matrix; />Representing a composite weight matrix gradient; η represents the learning rate.
4. The power data sharing analysis system according to claim 3, wherein the deploying the differential privacy mechanism in each hidden layer output result of the multi-layer neural network model is specifically:
Wherein Lap represents Laplacian distribution; delta x Representing the privacy parameter range.
5. The differential privacy-based power data sharing analysis system of claim 1, wherein the predicting from the power sample dataset generates a power data prediction dataset, specifically:
smoothing each sample power data time sequence in the power sample data set to obtain a smoothed power data time sequence;
performing wavelet transformation on the smooth power data time sequence to obtain a frequency domain coefficient;
calculating the power data prediction information gain of each frequency domain coefficient, and screening out the frequency domain coefficient corresponding to the highest value of the power data prediction information gain as a power input characteristic;
and predicting the power input characteristics by adopting a support vector machine regression model to obtain corresponding power data prediction values, thereby forming a power data prediction data set.
6. The differential privacy-based power data sharing analysis system of claim 5, wherein the support vector machine regression model is specifically:
wherein f (x) represents a regression model function of the support vector machine;w q weights representing the power input characteristics; Representing a q-th power input feature; b represents bias; n represents the number of frequency domain coefficients corresponding to the highest value of the power data prediction information gain;
the loss function adopted by the support vector machine regression model in the training stage is as follows:
in the formula, loss represents a loss function of a regression model of the support vector machine; w represents a weight vector supporting a regression model of the vector machine; i 2 Represents the square of the L2 norm; a represents regularization parameters; e represents a tolerance threshold; t represents the t-th time point; l (L) t A time series of sample power data representing a t-th time point.
7. The differential privacy-based power data sharing analysis system of claim 1, wherein the frequency domain coefficients are calculated as:
wherein,
wherein C is j Frequency domain coefficients representing a j-th scale;a smoothed power data time sequence representing a t-th point in time; ψ (·) represents the wavelet basis functions; f represents the wavelet period; k represents the radius of the smoothing window.
8. A differential privacy-based power data sharing analysis method, the method comprising the steps of:
acquiring a plurality of sample power data time sequences to form a power sample data set, predicting according to the power sample data set to generate a power data prediction data set, and homomorphic encrypting the power sample data set and the power data prediction data set to obtain a power data homomorphic encryption data set; the power data homomorphic encryption data set comprises a plurality of homomorphic encryption sample power data time sequences and a sample power data prediction sequence;
Inputting the electric power data homomorphic encryption data set into a pre-established multi-layer neural network model for differential privacy enhancement processing to obtain category probability distribution, and carrying out privacy budget allocation on each sequence in the electric power data homomorphic encryption data set according to the category probability distribution to obtain a privacy budget allocation result;
and responding to the received sharing instruction, and sharing the power data homomorphic encryption data set according to the privacy budget allocation result.
9. A computer device, characterized by: comprising a processor and a memory, the processor being connected to the memory, the memory being for storing a computer program, the processor being for executing the computer program stored in the memory to cause the computer device to perform the method of claim 8.
10. A computer-readable storage medium, characterized by: the computer readable storage medium has stored therein a computer program which, when executed, implements the method of claim 8.
CN202311411585.1A 2023-10-27 2023-10-27 Power data sharing analysis system and method based on differential privacy Pending CN117235770A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311411585.1A CN117235770A (en) 2023-10-27 2023-10-27 Power data sharing analysis system and method based on differential privacy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311411585.1A CN117235770A (en) 2023-10-27 2023-10-27 Power data sharing analysis system and method based on differential privacy

Publications (1)

Publication Number Publication Date
CN117235770A true CN117235770A (en) 2023-12-15

Family

ID=89086153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311411585.1A Pending CN117235770A (en) 2023-10-27 2023-10-27 Power data sharing analysis system and method based on differential privacy

Country Status (1)

Country Link
CN (1) CN117235770A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117892357A (en) * 2024-03-15 2024-04-16 大连优冠网络科技有限责任公司 Energy big data sharing and distribution risk control method based on differential privacy protection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117892357A (en) * 2024-03-15 2024-04-16 大连优冠网络科技有限责任公司 Energy big data sharing and distribution risk control method based on differential privacy protection
CN117892357B (en) * 2024-03-15 2024-05-31 国网河南省电力公司经济技术研究院 Energy big data sharing and distribution risk control method based on differential privacy protection

Similar Documents

Publication Publication Date Title
Tran et al. Differentially private and fair deep learning: A lagrangian dual approach
Mireshghallah et al. Shredder: Learning noise distributions to protect inference privacy
JP7002404B2 (en) Neural network that discovers latent factors from data
US20230297847A1 (en) Machine-learning techniques for factor-level monotonic neural networks
JP2016531513A (en) Method and apparatus for utility-aware privacy protection mapping using additive noise
CN117235770A (en) Power data sharing analysis system and method based on differential privacy
Liu et al. Stolenencoder: stealing pre-trained encoders in self-supervised learning
Liu et al. Face image publication based on differential privacy
CN116996272A (en) Network security situation prediction method based on improved sparrow search algorithm
CN113254988A (en) High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment
US20230046601A1 (en) Machine learning models with efficient feature learning
CN115956244A (en) Apparatus and method for secure private data aggregation
CN116595553A (en) Encryption method of intelligent electric meter of Internet of things with differential privacy protection
Liu [Retracted] Privacy Protection Technology Based on Machine Learning and Intelligent Data Recognition
WO2022199612A1 (en) Learning to transform sensitive data with variable distribution preservation
US20220027711A1 (en) System and method for mitigating generalization loss in deep neural network for time series classification
WO2023107134A1 (en) Explainable machine learning based on time-series transformation
CN114547686A (en) High-dimensional mass data release privacy protection method
Mandala et al. PSV-GWO: Particle swarm velocity aided GWO for privacy preservation of data
Zhang et al. A Differential privacy image publishing method based on wavelet transform
Nobi et al. Adversarial attacks in machine learning based access control
Hızal et al. IoT-based Smart Home Security System with Machine Learning Models
Liu et al. Research on fingerprint image differential privacy protection publishing method based on wavelet transform and singular value decomposition technology
Heo et al. Personalized DP-SGD using Sampling Mechanisms
US20240106627A1 (en) Computer-implemented method for providing an encrypted dataset providing a global trained function, computer-implemented method for recovering personal information, computer system and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination