CN116595553A - Encryption method of intelligent electric meter of Internet of things with differential privacy protection - Google Patents

Encryption method of intelligent electric meter of Internet of things with differential privacy protection Download PDF

Info

Publication number
CN116595553A
CN116595553A CN202310563375.8A CN202310563375A CN116595553A CN 116595553 A CN116595553 A CN 116595553A CN 202310563375 A CN202310563375 A CN 202310563375A CN 116595553 A CN116595553 A CN 116595553A
Authority
CN
China
Prior art keywords
data
noise
privacy
differential
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310563375.8A
Other languages
Chinese (zh)
Inventor
曹献炜
常兴智
张军
王再望
党政军
谭忠
马强
林福平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningxia LGG Instrument Co Ltd
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310563375.8A priority Critical patent/CN116595553A/en
Publication of CN116595553A publication Critical patent/CN116595553A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • G06F18/15Statistical pre-processing, e.g. techniques for normalisation or restoring missing data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6263Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S40/00Systems for electrical power generation, transmission, distribution or end-user application management characterised by the use of communication or information technologies, or communication or information technology specific aspects supporting them
    • Y04S40/20Information technology specific aspects, e.g. CAD, simulation, modelling, system security

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses an encryption method of an intelligent electric meter of the Internet of things with differential privacy protection, which comprises the following steps: STEP1, preprocessing operations such as data preprocessing, and the like, so as to ensure the quality and usability of data; STEP2, clustering analysis: clustering the preprocessed data by using a K-means algorithm, and dividing the data into different groups; 1. maintaining the usefulness and accuracy of the data: the technology adopts methods such as probability distribution fitting, an exponential weighted moving average method and the like to reduce the influence of noise on data to the minimum extent. This allows the data to remain highly accurate and useful, allowing statistical analysis and data processing to a certain extent. 2. Balance of data quality and availability: the technology balances the relation between the privacy protection of the data and the data quality and the availability through the data preprocessing and noise adjusting method. The data preprocessing operation and the noise adjustment method ensure that the data has good quality and usability while protecting privacy.

Description

Encryption method of intelligent electric meter of Internet of things with differential privacy protection
Technical Field
The invention relates to the technical field of Internet of things electric meters, in particular to an encryption method of an Internet of things intelligent electric meter with differential privacy protection.
Background
The intelligent electric meter of the Internet of things is an electric meter capable of monitoring and recording the use condition of electric energy in real time, and can realize functions of remote monitoring, remote control, data analysis and the like through connection with the Internet. However, with the increasing importance of big data and personal privacy protection, privacy protection for a large amount of electric energy usage data generated in the internet of things smart meter becomes particularly critical. Differential privacy is a technology for protecting personal data privacy, and is widely applied to intelligent electric meters of the Internet of things.
Differential privacy is the process of noise-adding personal data such that the true information of a particular individual cannot be inferred from the data. The electric energy usage data collected in the intelligent electric meter of the Internet of things is subjected to noise addition, so that the privacy of a user is protected. The noise may be random noise or noise generated according to a particular privacy preserving algorithm. After noise is added, even if an attacker acquires the electric energy use data, the original user information cannot be accurately restored. In order to further enhance privacy protection, the intelligent electric meter of the internet of things can aggregate the electric energy use data. The aggregation may be a statistical analysis of a set of user data or an aggregation of data into multiple intervals, thereby reducing the likelihood of an attacker acquiring individual information.
Meanwhile, differential privacy is a special differential privacy technology, and individual data privacy is protected by introducing noise and disturbance. Differential privacy is applied to the intelligent electric meter of the Internet of things, and tiny disturbance can be carried out on the electric energy use data so as to protect the privacy of a user. Through the differential privacy technology, even if an attacker acquires partial data, the electric energy use condition of an individual cannot be accurately deduced.
In summary, the intelligent electric meter of the internet of things uses the differential privacy protection technology, and through means of noise addition, data aggregation, differential privacy, anonymization and the like, the privacy information of users is protected, and meanwhile, effective collection and analysis of electric energy use data are realized. The privacy protection technologies can effectively reduce the possibility that an attacker obtains individual information of the user, and simultaneously ensure the accuracy and the usability of the electric energy use data.
However, through long-term work and research by the inventor, the following technical problems exist in the conventional differential privacy protection technology, and the technical problems need to be solved:
1. data quality and availability: conventional techniques may face problems with data quality and availability during data preprocessing and noise addition. Since conventional methods typically use fixed noise parameters or random noise additions, the strength of the noise may not be able to accommodate different data characteristics and privacy requirements, resulting in reduced or unusable data quality.
2. Risk of privacy disclosure: noise addition in conventional techniques may present a privacy disclosure risk. The manner in which noise is added may not be sufficient to preserve individual privacy, possibly resulting in the data still having the possibility of identifying sensitive information, thereby increasing the risk of privacy disclosure.
3. Noise distortion: the conventional method may introduce a large noise distortion during the noise addition process. Excessive noise addition can cause distortion of the data, making it difficult to extract useful information from the data, reducing the usability and accuracy of the data.
4. Data timing characteristics processing: the conventional technique may not sufficiently consider timing characteristics of data when processing time-series data. The smoothing processing and the time sequence analysis of the time sequence data are significant to the intelligent ammeter data of the Internet of things, but the traditional method may not fully utilize the time sequence characteristics of the data for processing.
Therefore, an encryption method of the intelligent electric meter of the Internet of things with differential privacy protection is provided.
Disclosure of Invention
In view of this, the embodiment of the invention hopes to provide an encryption method of the internet of things intelligent ammeter with differential privacy protection, so as to solve or alleviate the technical problems existing in the prior art, and at least provide a beneficial choice;
The technical scheme of the embodiment of the invention is realized as follows:
first aspect
An encryption method of an internet of things intelligent ammeter with differential privacy protection comprises the following steps:
STEP1, preprocessing operations such as data preprocessing, and the like, so as to ensure the quality and usability of data;
STEP2, clustering analysis: clustering the preprocessed data by using a K-means algorithm, and dividing the data into different groups;
STEP3, principal component analysis: applying a Principal Component Analysis (PCA) method to the data within each cluster group, reducing the data dimension and retaining the main features;
STEP4, probability distribution fitting: carrying out probability distribution fitting on the dimensionality reduced data in each cluster group so as to adjust the noise distribution;
STEP5, adding differential privacy noise: adding Laplacian or Gaussian noise to the data in each cluster group according to the fitted probability distribution and privacy budget;
STEP6, exponentially weighted moving average: combining the data added with noise with an exponential weighted moving average method, and smoothing the time sequence data;
STEP7, differential data release: the differential value of the noise data is calculated and the differential data is issued instead of the original data, so that the risk of privacy leakage is reduced.
STEP8, data sharing and access control: designing a safe data sharing and access control mechanism, and ensuring that the privacy of a user is protected in the whole data processing flow;
STEP9, periodically evaluating and optimizing the designed differential privacy protection method, and checking the privacy protection degree and the data availability; the method can be adjusted to enhance the protection effect if necessary.
In the above embodiment, the following embodiments are described: in the above embodiment of the encryption method of the internet of things intelligent ammeter with differential privacy protection: the method comprises the following steps: in STEP1, data preprocessing operations such as data cleaning, denoising, and outlier processing are performed to ensure the quality and usability of the input data. Various data preprocessing techniques may be used, such as data cleaning algorithms, filtering algorithms, and anomaly detection algorithms.
In STEP2, the K-means algorithm is adopted to perform cluster analysis on the preprocessed data. The K-means algorithm divides the data into K different groups such that the similarity of data points within each group is maximized and the group-to-group similarity is minimized. The data can be divided into different cluster groups, and a basis is provided for the subsequent steps.
In STEP3, a Principal Component Analysis (PCA) method is applied to the data within each cluster group. PCA is a dimension-reduction technique that maps raw data to a low-dimensional space through linear transformation while preserving the main features. PCA can reduce data dimension, eliminate redundant information and extract main components of data.
In STEP4, probability distribution fitting is performed on the dimensionality reduced data within each cluster group. By applying a probability distribution model, such as a laplace distribution or a gaussian distribution, to the reduced-dimension data, the distribution of noise can be adjusted to better adapt to the characteristics of the data.
In STEP5, differential privacy noise is added to the data within each cluster group according to the probability distribution and privacy budget of the fit. The size of the noise may be determined from the privacy budget using either laplace noise or gaussian noise. The addition of noise makes it impossible to accurately infer the original data from the noise data, thereby protecting user privacy.
In STEP6, the data to which noise has been added is combined with an exponential weighted moving average method, and time-series data is smoothed. The exponential weighted moving average method considers the time dependence of the data, can smooth the data, reduce the influence of noise on the data, and improve the usability of the data.
In STEP7, the differential value of the noise data is calculated, and differential data is distributed instead of the original data. By publishing the differential data, the risk of revealing privacy can be further reduced, as an attacker cannot directly restore the original data from the differential data.
In STEP8, a secure data sharing and access control mechanism is designed. Ensuring that only authorized users can access the protected data and securely manage the sharing of the data. Encryption algorithms and access control policies may be employed to protect the confidentiality and integrity of data. The encryption algorithm can ensure that even if the data is acquired by an unauthorized user during the data transmission and storage process, the original data cannot be decrypted. Meanwhile, the access control strategy can limit the access right to the data, and only authorized users can acquire specific data, so that the security of the data is ensured.
In STEP9, the differential privacy preserving method is periodically evaluated and optimized. By evaluating the effects of the privacy preserving method, including the degree of privacy preservation and the availability of data, the actual performance of the method can be known and potential problems can be found. If necessary, the method can be adjusted and optimized to improve privacy protection effect and data availability so as to meet actual application requirements.
Wherein in one of the above embodiments: in (a):
in STEP2, the K-means algorithm is used:
STEP2.1, initializing a clustering center: randomly selecting K data points as initial cluster centers c_1, c_2, c_k;
STEP2.2, assign each data point to the nearest cluster center: calculating the distance between each data point and all the cluster centers, and distributing the data points to the cluster center closest to the data point;
STEP2.3, updating a clustering center: and recalculating the average value of each cluster as a new cluster center according to the distribution result.
STEP2.4, repeating STEPs STEP 2.2-STEP 2.3 until no change occurs in the cluster center.
In the above embodiment, in the STEP2, the clustering analysis is performed by using the K-means algorithm in the above embodiment: the method comprises the following steps:
1.1. initializing a clustering center: k data points were randomly selected as initial cluster centers. These initial cluster centers may be selected randomly from the dataset, or by other methods. The selection of the initial cluster center has an impact on the clustering results and thus requires careful selection.
1.2. Each data point is assigned to the nearest cluster center: for each data point, the distance between it and all cluster centers is calculated and the data point is assigned to the cluster center closest to it. Common distance measurement methods include euclidean distance, manhattan distance, and the like. Thus, each data point is assigned to a cluster center, forming an initial cluster result.
1.3. Updating a clustering center: and recalculating the average value of each cluster as a new cluster center according to the distribution result of the data points. The mean value of all data points in each cluster is calculated for each cluster, and a new cluster center is obtained. This process will update the location of the cluster center.
1.4. Repeating steps 2 and 3: steps 2 and 3 are repeated until no more changes occur in the cluster center or a predetermined number of iterations is reached. In each iteration, the clustering result is gradually optimized by reassigning data points and updating the clustering center, so that the clustering center more accurately represents the data points in the clusters.
Through multiple iterations, the K-means algorithm continuously adjusts the position of the clustering center until the clustering center is stable. The finally obtained clustering result can be used as the division of a data set, and a foundation is provided for subsequent data processing and privacy protection.
Wherein in one of the above embodiments: in (a):
in STEP3, principal Component Analysis (PCA):
STEP3.1, calculating a covariance matrix Cov (X) of the data matrix X;
STEP3.2, performing eigenvalue decomposition on the covariance matrix Cov (X) to obtain eigenvalues λ_1, λ_2, λ_n and corresponding eigenvectors v_1, v_2, v_n;
STEP3.3, selecting feature vectors corresponding to the first k feature values to form a projection matrix P;
STEP3.4, multiplying the original data matrix X by the projection matrix P to obtain a data matrix Y after dimension reduction.
In the above embodiment, the following embodiments are described: in STEP3, the Principal Component Analysis (PCA) is performed in the above-described embodiment: the method comprises the following steps:
3.1. calculating covariance matrix Cov (X) of data matrix X: the covariance matrix Cov (X) is calculated using the preprocessed data matrix X as input. The covariance matrix reflects the linear correlation between the data and is the basis for PCA analysis.
3.2. Eigenvalue decomposition is performed on covariance matrix Cov (X): eigenvalue decomposition is performed on covariance matrix Cov (X) to obtain eigenvalues λ_1, λ_2, & gt, λ_n and corresponding eigenvectors v_1, v_2, & gt, v_n. The eigenvalues represent the variance of the data over the individual principal components and the eigenvectors represent the direction of the principal components.
3.3. Selecting feature vectors corresponding to the first k feature values to form a projection matrix P: and selecting the eigenvectors corresponding to the first k largest eigenvalues according to the eigenvalue sizes to form a projection matrix P. The principal components corresponding to these eigenvectors represent the most important variance contributions in the data.
3.4. Multiplying the original data matrix X by the projection matrix P to obtain a dimensionality reduced data matrix Y: and multiplying the original data matrix X by the projection matrix P to obtain a dimensionality-reduced data matrix Y. Y=xp, where Y is the reduced dimension data matrix, X is the original data matrix, and P is the projection matrix. The dimension Y of the data matrix after dimension reduction is lower, the main characteristics are reserved, and the dimension of the data is reduced.
Through PCA dimension reduction, the raw data can be represented in a lower dimensional space for subsequent processing and analysis. The data after dimension reduction can better reflect the structure and the characteristics of the data, reduce redundant information and provide more effective data for subsequent probability distribution fitting and privacy protection.
Wherein in one of the above embodiments: in (a):
STEP5.1, determining a proportion parameter b (Laplace) of the Laplace noise or a standard deviation sigma (Gaussian) of the Gaussian noise according to the privacy budget epsilon and the data sensitivity;
STEP5.2, generating a corresponding amount of laplace or gaussian noise for the data within each cluster group;
STEP5.3, adding the generated noise to the data.
In the above embodiment, in the STEP5, the laplace or gaussian noise is added according to the privacy budget and the data sensitivity in the above embodiment: the method comprises the following steps:
5.1. The scaling parameter b of the Laplace noise (Laplace) or the standard deviation σ of the Gaussian noise (Gaussian) is determined from the privacy budget epsilon and the data sensitivity: the scaling parameter b of the laplace noise or the standard deviation sigma of the gaussian noise is determined by mathematical derivation or statistical analysis, depending on the given privacy budget epsilon and data sensitivity. The privacy budget epsilon is a parameter that controls the intensity of noise and the degree of privacy protection, and the sensitivity of data represents the sensitivity of the data.
5.2. Generating a corresponding amount of laplace or gaussian noise for the data within each cluster group: based on the amount of data within each cluster group, a corresponding amount of Laplacian or Gaussian noise is generated for each data point. The generated noise is characteristic of a laplace or gaussian distribution and has appropriate scale parameters.
5.3. Adding the generated noise to the data: the generated laplace or gaussian noise is added to the data within each cluster group. Noise may be added to the corresponding data points in a point-by-point addition. Thus, each data point is added with corresponding noise, so that the aim of privacy protection is fulfilled.
By adding noise to the data, individual features in the data can be blurred so that the original data cannot be accurately inferred from the noisy data, thereby protecting the privacy of the user. The Laplace noise has better privacy protection effect, and the intensity of the noise can be controlled more accurately under the condition of privacy budget permission. Gaussian noise is also used in some cases, which has the characteristics of continuity and smoothness, and is suitable for certain data distribution situations.
Wherein in one of the above embodiments: in (a):
STEP6: exponentially Weighted Moving Average (EWMA):
setting: smoothing parameter α (0 < α < 1);
for each data point x_t in the time series;
calculating a weighted moving average: y_t=α x_t+ (1- α) y_ (t-1).
In the above embodiment, in the STEP6, the Exponential Weighted Moving Average (EWMA) method is as in the above embodiment: the method comprises the following steps:
6.1. setting a smoothing parameter alpha: first, a smoothing parameter α is set, where 0< α <1. The smoothing parameter α determines the weight distribution between the current data point and the past weighted moving average. A larger alpha value indicates a higher weight to the current data point and a smaller alpha value indicates a higher weight to the past weighted moving average.
6.2. For each data point x_t in the time series: for each data point x_t in the time series, the following calculation steps are performed.
6.3. Calculating a weighted moving average: the weighted moving average is calculated by using the formula y_t=α x_t+ (1- α) y_ (t-1). Where y_t represents the weighted moving average of the current data point x_t and y_ (t-1) represents the weighted moving average of the previous data point.
By weighted averaging the current data point with a past weighted moving average, the EWMA method can smooth time series data and eliminate the effects of noise. The weighted moving average method allows closer data points to be weighted higher in the calculation, while farther data points have lower weights.
The Exponential Weighted Moving Average (EWMA) method makes the smoothing of data points more flexible by dynamically adjusting the weighting factors. A larger smoothing parameter a will result in faster response speed and less hysteresis, while a smaller a will result in a smoothing result that is more important to the past data.
Wherein in one of the above embodiments:
STEP7, differential data release:
STEP7.1, calculating a differential value of noise data: Δx_t=x_ (t+1) -x_t;
STEP7.2, issue differential data Δx_t instead of raw data.
In the above embodiment, in STEP7, the differential data is distributed in the above embodiment: the method comprises the following steps:
7.1. calculating a differential value of the noise data: for each data point x_t, a difference value Δx_t between it and the next data point x_ (t+1) is calculated. The differential value represents the variation between adjacent data points.
7.2. The differential data Δx_t is published instead of the raw data: the calculated differential data deltax_t is taken as the published data instead of the original data x_t. The differential data reflects the amount of variation between adjacent data points, reducing the risk of privacy disclosure by distributing the differential data instead of the original data.
Only the differences between the data are published through differential data publication, and the original data is not directly published. Thus, the risk of disclosure of individual privacy can be effectively reduced, because the differential data is insufficient to restore the specific value of the original data.
Second aspect
A computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions that, when executed by the processor, cause the processor to perform the encryption method as described above.
Such a computer device is a device specifically designed to perform the encryption method described above. It comprises the following main components:
(1) A processor: computer devices are provided with processors for executing program instructions and processing data. The processor may be a general purpose Central Processing Unit (CPU) or a dedicated cryptographic processor, depending on the design and use of the device.
(2) A memory: memory coupled to the processor is a computer device for storing program instructions and data. The memory may include Random Access Memory (RAM), read Only Memory (ROM), flash memory, etc., for storing program instructions required for the encryption method and input and output data.
(3) Program instructions: program instructions required to perform the encryption method are stored in the memory. These instructions may be software instructions written in a programming language or may be hardware instructions that are resident in the device.
When executing program instructions, the processor performs the encryption method described above according to the logic and operational requirements of the instructions. The processor can read data from the memory according to the requirements of the instructions, and perform operations such as data processing, encryption operation, noise addition and the like according to algorithms and logic. The purpose of such a computer device is to provide a dedicated hardware or software platform to efficiently perform the encryption method described above. The method can be applied to various scenes, such as intelligent electric meters of the Internet of things, data centers, cloud computing and the like, so as to ensure privacy protection and security of data.
Third aspect of the invention
A storage medium storing program instructions capable of implementing the encryption method described above.
Such a storage medium is a medium for storing program instructions capable of realizing the encryption method described above. It may be a storage device or medium of various types, such as a hard disk drive, a solid state drive, a flash memory drive, an optical disk, etc.
In such a storage medium, program instructions encoding the encryption method described above are contained. These instructions may be represented in binary form and stored in a particular file format in a storage medium. The program instructions may implement the described encryption method by reading data in a storage medium and executing it by a processor of a computer system.
The program instructions in the storage medium may include instructions required for the specific implementation of encryption algorithms, data preprocessing operations, cluster analysis, principal component analysis, probability distribution fitting, noise addition methods, exponentially weighted moving average methods, and the like. The instructions are organized in a particular logic and order of execution to implement a complete encryption methodology. Through such a storage medium, program instructions of the encryption method can be conveniently transmitted, distributed, and installed into a computer apparatus or system. It provides an efficient way for the user to perform the encryption method described above on an appropriate hardware or software platform to protect the privacy and security of the data. The storage medium has wide application range, and can be used for scenes such as intelligent electric meters of the Internet of things, cloud computing environments, data centers and the like so as to provide efficient and safe encryption functions. Meanwhile, the portability and easy updating of the storage medium make the implementation and updating of the encryption method more flexible and convenient.
Compared with the prior art, the invention has the beneficial effects that:
1. maintaining the usefulness and accuracy of the data: the technology adopts methods such as probability distribution fitting, an exponential weighted moving average method and the like to reduce the influence of noise on data to the minimum extent. This allows the data to remain highly accurate and useful, allowing statistical analysis and data processing to a certain extent.
2. Balance of data quality and availability: the technology balances the relation between the privacy protection of the data and the data quality and the availability through the data preprocessing and noise adjusting method. The data preprocessing operation and the noise adjustment method ensure that the data has good quality and usability while protecting privacy.
3. Is suitable for time series data: the technology of the invention carries out smoothing treatment on the time sequence data by an exponential weighting moving average method, and fully utilizes the time sequence characteristics of the data. The technology of the invention has good adaptability and effect on processing time series data in the application of the intelligent electric meter of the internet of things and the like.
4. High-efficient protection individual privacy: the technology of the invention adopts a differential privacy protection method, and the individual privacy is protected by introducing noise into the data. The method can reduce privacy disclosure risk to the greatest extent and ensure confidentiality of individual sensitive information.
5. Flexibility and personalization: the technology adopts a personalized noise adding method, and determines noise parameters according to privacy budget and data sensitivity. The noise adding method enables noise adding to be adjusted according to different data characteristics and privacy requirements, and flexibility of privacy protection is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of the method of the present application;
FIG. 2 is a schematic diagram of a C++ control program (first portion) according to the present application;
FIG. 3 is a schematic diagram of a C++ control program (second portion) according to the present application;
FIG. 4 is a schematic diagram of a C++ control program (third portion) according to the present application.
Detailed Description
In order that the above objects, features and advantages of the application will be readily understood, a more particular description of the application will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the application, whereby the application is not limited to the specific embodiments disclosed below;
In the prior art, the conventional internet of things smart meter technology using differential privacy protection has the following disadvantages compared with the provided technology: data quality and usability problems, privacy exposure risk, noise distortion and insufficient processing of the timing characteristics of the data. The conventional method may face problems of data quality and usability in data preprocessing and noise addition, and the strength of noise may not be suitable for different data characteristics and privacy requirements. Furthermore, noise addition may present a risk of privacy disclosure, and data may still reveal sensitive information. Conventional techniques may introduce large noise distortions, resulting in data distortions and reduced accuracy. For time series data, the conventional method may not sufficiently consider the timing characteristics of the data, resulting in insufficient timing analysis and smoothing; for this reason, referring to fig. 1, the present embodiment provides a related technical solution to solve the above technical problems: an encryption method of an internet of things intelligent ammeter with differential privacy protection comprises the following steps:
STEP1, preprocessing operations such as data preprocessing, and the like, so as to ensure the quality and usability of data;
STEP2, clustering analysis: clustering the preprocessed data by using a K-means algorithm, and dividing the data into different groups;
STEP3, principal component analysis: applying a Principal Component Analysis (PCA) method to the data within each cluster group, reducing the data dimension and retaining the main features;
STEP4, probability distribution fitting: carrying out probability distribution fitting on the dimensionality reduced data in each cluster group so as to adjust the noise distribution;
STEP5, adding differential privacy noise: adding Laplacian or Gaussian noise to the data in each cluster group according to the fitted probability distribution and privacy budget;
STEP6, exponentially weighted moving average: combining the data added with noise with an exponential weighted moving average method, and smoothing the time sequence data;
STEP7, differential data release: the differential value of the noise data is calculated and the differential data is issued instead of the original data, so that the risk of privacy leakage is reduced.
STEP8, data sharing and access control: designing a safe data sharing and access control mechanism, and ensuring that the privacy of a user is protected in the whole data processing flow;
STEP9, periodically evaluating and optimizing the designed differential privacy protection method, and checking the privacy protection degree and the data availability; the method can be adjusted to enhance the protection effect if necessary.
In the scheme, the implementation mode of the encryption method of the intelligent electric meter of the internet of things for differential privacy protection comprises the following steps: in STEP1, data preprocessing operations such as data cleaning, denoising, and outlier processing are performed to ensure the quality and usability of the input data. Various conventional data preprocessing techniques may be used, such as data cleaning algorithms, filtering algorithms, and anomaly detection algorithms. Such techniques are conventional, common pretreatment techniques and are therefore not described in detail herein.
In STEP2, the K-means algorithm is adopted to perform cluster analysis on the preprocessed data. The K-means algorithm divides the data into K different groups such that the similarity of data points within each group is maximized and the group-to-group similarity is minimized. The data can be divided into different cluster groups, and a basis is provided for the subsequent steps.
In STEP3, a Principal Component Analysis (PCA) method is applied to the data within each cluster group. PCA is a dimension-reduction technique that maps raw data to a low-dimensional space through linear transformation while preserving the main features. PCA can reduce data dimension, eliminate redundant information and extract main components of data.
In STEP4, probability distribution fitting is performed on the dimensionality reduced data within each cluster group. By applying a probability distribution model, such as a laplace distribution or a gaussian distribution, to the reduced-dimension data, the distribution of noise can be adjusted to better adapt to the characteristics of the data.
In STEP5, differential privacy noise is added to the data within each cluster group according to the probability distribution and privacy budget of the fit. The size of the noise may be determined from the privacy budget using either laplace noise or gaussian noise. The addition of noise makes it impossible to accurately infer the original data from the noise data, thereby protecting user privacy.
In STEP6, the data to which noise has been added is combined with an exponential weighted moving average method, and time-series data is smoothed. The exponential weighted moving average method considers the time dependence of the data, can smooth the data, reduce the influence of noise on the data, and improve the usability of the data.
In STEP7, the differential value of the noise data is calculated, and differential data is distributed instead of the original data. By publishing the differential data, the risk of revealing privacy can be further reduced, as an attacker cannot directly restore the original data from the differential data.
In STEP8, a secure data sharing and access control mechanism is designed. Ensuring that only authorized users can access the protected data and securely manage the sharing of the data. Encryption algorithms and access control policies may be employed to protect the confidentiality and integrity of data. The encryption algorithm can ensure that even if the data is acquired by an unauthorized user during the data transmission and storage process, the original data cannot be decrypted. Meanwhile, the access control strategy can limit the access right to the data, and only authorized users can acquire specific data, so that the security of the data is ensured.
In STEP9, the differential privacy preserving method is periodically evaluated and optimized. By evaluating the effects of the privacy preserving method, including the degree of privacy preservation and the availability of data, the actual performance of the method can be known and potential problems can be found. If necessary, the method can be adjusted and optimized to improve privacy protection effect and data availability so as to meet actual application requirements.
Specifically, the principle of the encryption method is based on the core idea of differential privacy, and the privacy of an individual is protected by adding noise and dimension reduction processing. The specific principle is as follows:
First, in the data preprocessing stage (STEP 1), the quality and usability of the input data are ensured by operations such as data cleaning, denoising, outlier processing, and the like. These preprocessing operations may improve the accuracy of subsequent cluster analysis and principal component analysis.
In the cluster analysis stage (STEP 2), the preprocessed data is clustered by adopting a K-means algorithm. The K-means algorithm divides the data into different cluster groups by minimizing the square error within the group and maximizing the distance between groups. This may divide the data into highly similar groups, providing a more accurate data set for subsequent processing.
Then, in the principal component analysis stage (STEP 3), the PCA method is applied to the data within each cluster group. PCA maps the original data to a low-dimensional space through linear transformation, and retains the main characteristics of the data. The dimension reduction process is helpful for reducing the dimension of the data, eliminating redundant information and extracting key features of the data.
In the probability distribution fitting stage (STEP 4), probability distribution fitting is performed on the dimensionality reduced data in each cluster group. By fitting a suitable probability distribution model, such as a laplacian distribution or a gaussian distribution, the distribution of noise can be adjusted to better adapt to the characteristics of the data.
Then, in the differential privacy noise addition stage (STEP 5), laplace or gaussian noise is added to the data within each cluster group according to the probability distribution and privacy budget of the fit. The addition of noise prevents an attacker from accurately deducing the original data, thereby protecting the privacy of the user. By adding an appropriate amount of noise, differential privacy ensures protection of individual data while maintaining the usefulness and usability of the data as much as possible.
In the stage of the exponential weighted moving average method (STEP 6), the data to which noise has been added is combined with the exponential weighted moving average method, and the time series data is smoothed. The exponential weighted moving average method considers the time dependence of the data, and can eliminate the influence of noise on the data by carrying out weighted average on the data, thereby improving the availability and the accuracy of the data.
In the differential data distribution stage (STEP 7), differential values of noise data are calculated and differential data, not raw data, is distributed. By publishing the differential data, an attacker cannot directly obtain the original data, so that the risk of revealing privacy is further reduced. Differential data protects individual privacy information while still allowing some statistical analysis and data processing.
In order to ensure the security of data, a data sharing and access control mechanism (STEP 8) is designed. These mechanisms employ encryption algorithms and access control policies to protect the confidentiality and integrity of data. The encryption algorithm ensures that during data transmission and storage, even if the data is acquired by an unauthorized user, the original data cannot be decrypted. The access control strategy limits the access right of the data, and only authorized users can acquire specific data, so that the safety and privacy protection of the data are ensured.
Finally, in the evaluation and optimization stage (STEP 9), the designed differential privacy protection method is periodically evaluated. By evaluating the degree of privacy protection and the availability of data, the actual performance of the method can be known and potential problems can be found. If necessary, the method can be adjusted and optimized to improve privacy protection effect and usability of data so as to meet actual application requirements. The aim at this stage is to continually improve the protection scheme, ensuring the effectiveness and applicability of privacy protection techniques.
In summary, the encryption method of the internet of things intelligent ammeter with differential privacy protection comprehensively utilizes various technical means to realize privacy protection and security of electric energy use data and ensure availability and quality of the data through the steps of data preprocessing, cluster analysis, principal component analysis, probability distribution fitting, noise addition, smoothing processing, differential data release, data sharing and access control, evaluation and optimization and the like.
In some embodiments of the application, for STEP2:
in STEP2, the K-means algorithm is used:
STEP2.1, initializing a clustering center: randomly selecting K data points as initial cluster centers c_1, c_2, c_k;
STEP2.2, assign each data point to the nearest cluster center: calculating the distance between each data point and all the cluster centers, and distributing the data points to the cluster center closest to the data point;
STEP2.3, updating a clustering center: and recalculating the average value of each cluster as a new cluster center according to the distribution result.
STEP2.4, repeating STEPs STEP 2.2-STEP 2.3 until no change occurs in the cluster center.
Exemplary: the following two-dimensional data points are provided:
X={(1,1),(1,2),(2,2),(2,3),(10,10),(10,11),(11,11),(11,12)}
in this exemplary presentation, it is desirable to divide these data points into two clusters (k=2). The following is the execution of the K-means algorithm:
(1) Initializing a clustering center: two data points were randomly selected as initial cluster centers. For example, (1, 1) and (10, 10) are chosen as the initial cluster centers C1 and C2.
(2) Each data point is assigned to the nearest cluster center:
(3) The closest data point to (1, 1): (1,1), (1,2), (2,2), (2,3)
(4) Data points closest to (10, 10): (10,10), (10,11), (11,11), (11,12)
(5) Updating a clustering center: and recalculating the average value of each cluster as a new cluster center according to the distribution result.
(6) New cluster center C1: ((1+1+2+2)/4, (1+2+2+3)/4) = (1.5, 2)
(7) New cluster center C2: ((10+10+11+11)/4, (10+11+12)/4) = (10.5,11)
Steps 2 and 3 are repeated until no more change in the cluster center occurs. In this example, the cluster centers have converged so no further iteration is needed.
So far, the data points have been divided into two clusters:
cluster 1: (1,1), (1,2), (2,2), (2,3)
Clustering 2: (10,10), (10,11), (11,11), (11,12)
The core principle of the K-means algorithm is to minimize the sum of the distances of each data point from its belonging cluster center by iteratively updating the cluster center. In each iteration, the algorithm assigns data points to the nearest cluster center, and then updates the cluster center according to the assignment result. This process continues until the cluster center is no longer changing, i.e., converging. Finally, the algorithm divides the data points into K clusters such that the data points within the clusters are similar to each other, while the data point differences between the clusters are large.
In this scheme, in STEP2, the implementation mode of performing cluster analysis by adopting the K-means algorithm comprises the following STEPs:
1.1. Initializing a clustering center: k data points were randomly selected as initial cluster centers. These initial cluster centers may be selected randomly from the dataset, or by other methods. The selection of the initial cluster center has an impact on the clustering results and thus requires careful selection.
1.2. Each data point is assigned to the nearest cluster center: for each data point, the distance between it and all cluster centers is calculated and the data point is assigned to the cluster center closest to it. Common distance measurement methods include euclidean distance, manhattan distance, and the like. Thus, each data point is assigned to a cluster center, forming an initial cluster result.
1.3. Updating a clustering center: and recalculating the average value of each cluster as a new cluster center according to the distribution result of the data points. The mean value of all data points in each cluster is calculated for each cluster, and a new cluster center is obtained. This process will update the location of the cluster center.
1.4. Repeating steps 2 and 3: steps 2 and 3 are repeated until no more changes occur in the cluster center or a predetermined number of iterations is reached. In each iteration, the clustering result is gradually optimized by reassigning data points and updating the clustering center, so that the clustering center more accurately represents the data points in the clusters.
Through multiple iterations, the K-means algorithm continuously adjusts the position of the clustering center until the clustering center is stable. The finally obtained clustering result can be used as the division of a data set, and a foundation is provided for subsequent data processing and privacy protection.
Specifically, the K-means algorithm implements cluster analysis based on a distance metric between data points and an adjustment of cluster centers. The principle can be summarized as follows:
in the stage of initializing the cluster centers, each cluster is assigned a center point by randomly selecting K data points as the initial cluster centers.
In the stage of assigning each data point to the nearest cluster center, the distance between each data point and all cluster centers is calculated, the cluster center closest to the distance is selected, and the data point is assigned to the cluster center.
And in the stage of updating the cluster centers, the average value of the data points in each cluster is recalculated according to the distribution result to be used as a new cluster center. This process adjusts the location of the cluster center to better represent the data points within the cluster.
By repeatedly executing steps 2 and 3, the K-means algorithm is iterated until no more changes occur in the cluster center or a predetermined number of iterations is reached. In the iterative process, the data points are reassigned according to the distance from the clustering center, and the clustering center updates the positions according to the reassigned data points.
The core idea of the K-means algorithm is to implement cluster analysis by minimizing intra-group flatness errors and maximizing inter-group distances. Intra-group square error refers to the sum of squares of the distances between each data point and its cluster center, with the goal of bringing the data points within a group closer to each other. And inter-group distance refers to the distance between the centers of different clusters, with the goal of keeping the data points between different clusters farther away from each other.
The optimization objective of the K-means algorithm is to optimize the locations of the cluster centers by iterating so that the data points within a group are as close to each other as possible, while the data points between different clusters are as far apart from each other as possible. And gradually converging the clustering result to a local optimal solution by continuously repeating the process of updating the clustering center.
Finally, the clustering result generated by the K-means algorithm can be used for grouping data, and a foundation is provided for the following steps of principal component analysis, noise addition, privacy protection and the like. The clustering result can classify data points with higher similarity into one category, and provide more accurate and effective data sets for subsequent processing. Meanwhile, the clustering result also provides a foundation for subsequent privacy protection, and personalized privacy protection measures can be carried out aiming at different clustering groups.
In some embodiments of the present application,
in STEP3, principal Component Analysis (PCA):
STEP3.1, calculating a covariance matrix Cov (X) of the data matrix X;
STEP3.2, performing eigenvalue decomposition on the covariance matrix Cov (X) to obtain eigenvalues λ_1, λ_2, λ_n and corresponding eigenvectors v_1, v_2, v_n;
STEP3.3, selecting feature vectors corresponding to the first k feature values to form a projection matrix P;
STEP3.4, multiplying the original data matrix X by the projection matrix P to obtain a data matrix Y after dimension reduction.
In this aspect, in STEP3, an embodiment of Principal Component Analysis (PCA) includes the STEPs of:
3.1. calculating covariance matrix Cov (X) of data matrix X: the covariance matrix Cov (X) is calculated using the preprocessed data matrix X as input. The covariance matrix reflects the linear correlation between the data and is the basis for PCA analysis.
3.2. Eigenvalue decomposition is performed on covariance matrix Cov (X): eigenvalue decomposition is performed on covariance matrix Cov (X) to obtain eigenvalues λ_1, λ_2, & gt, λ_n and corresponding eigenvectors v_1, v_2, & gt, v_n. The eigenvalues represent the variance of the data over the individual principal components and the eigenvectors represent the direction of the principal components.
3.3. Selecting feature vectors corresponding to the first k feature values to form a projection matrix P: and selecting the eigenvectors corresponding to the first k largest eigenvalues according to the eigenvalue sizes to form a projection matrix P. The principal components corresponding to these eigenvectors represent the most important variance contributions in the data.
3.4. Multiplying the original data matrix X by the projection matrix P to obtain a dimensionality reduced data matrix Y: and multiplying the original data matrix X by the projection matrix P to obtain a dimensionality-reduced data matrix Y. Y=xp, where Y is the reduced dimension data matrix, X is the original data matrix, and P is the projection matrix. The dimension Y of the data matrix after dimension reduction is lower, the main characteristics are reserved, and the dimension of the data is reduced.
Through PCA dimension reduction, the raw data can be represented in a lower dimensional space for subsequent processing and analysis. The data after dimension reduction can better reflect the structure and the characteristics of the data, reduce redundant information and provide more effective data for subsequent probability distribution fitting and privacy protection.
Exemplary: the intelligent electric meter system of the Internet of things is provided, and electricity consumption data of 10 users in 24 hours per hour are collected. For simplicity, this embodiment considers only data of two users (user a and user B). Raw data are as follows (unit: kWh):
UserA [0.5,0.6,0.4,0.8,2.0,2.5,0.9,1.2,1.0,0.7,0.3,0.2,0.1,0.6,0.8,1.5,2.0,2.2,1.2,0.8,0.4,0.3,0.2,0.1]
User B [0.6,0.7,0.5,0.9,2.2,2.8,1.1,1.4,1.2,0.8,0.4,0.3,0.2,0.7,0.9,1.6,2.1,2.4,1.4,0.9,0.5,0.4,0.3,0.2]
The following is a process for applying Principal Component Analysis (PCA) to these data:
(1) Constructing a raw data matrix X, wherein each row represents a user, and each column represents an hour of electricity consumption:
X=[[0.5,0.6,0.4,0.8,2.0,2.5,0.9,1.2,1.0,0.7,0.3,0.2,0.1,0.6,0.8,1.5,2.0,2.2,1.2,0.8,0.4,0.3,0.2,0.1],[0.6,0.7,0.5,0.9,2.2,2.8,1.1,1.4,1.2,0.8,0.4,0.3,0.2,0.7,0.9,1.6,2.1,2.4,1.4,0.9,0.5,0.4,0.3,0.2]]
(2) A covariance matrix Cov (X) of the data matrix X is calculated.
And carrying out eigenvalue decomposition on the covariance matrix Cov (X) to obtain eigenvalues and corresponding eigenvectors. In this example, the present embodiment has only two users, so there are only two eigenvalues and eigenvectors.
The main components are selected: in this example, the present embodiment may select the feature vector corresponding to the maximum feature value as the main component.
And multiplying the original data matrix X by the projection matrix P to obtain a dimensionality-reduced data matrix Y. In this example, the projection matrix P has only one eigenvector. After multiplying by the projection matrix, the present embodiment reduces the dimension of the 24-hour power consumption data to a scalar value. The present embodiment achieves data dimension reduction by Principal Component Analysis (PCA), i.e., compressing 24 hours of power consumption data into a representative value. This helps the present embodiment better observe and analyze the difference in power usage behavior between users.
It should be noted that the example here contains data of only two users, and that the actual situation may involve more users; but the greater the number of users, the more pronounced and better the effect of PCA. Because in the intelligent ammeter scene of the internet of things, the PCA can help to better analyze the electricity consumption behavior of the user and mine potential electricity consumption modes, thereby realizing intelligent power grid management and scheduling. Meanwhile, the cost of data storage and transmission can be reduced by reducing the data dimension, and the data processing efficiency is improved.
Specifically, principal Component Analysis (PCA) is a commonly used dimension reduction technique that maps raw data to a lower dimensional space through linear transformation, while preserving the main features of the data. The principle can be summarized as follows:
in calculating the covariance matrix Cov (X) of the data matrix X, diagonal elements of the covariance matrix represent variances of the respective dimensions, and non-diagonal elements represent covariances between the different dimensions. The covariance matrix reflects the statistical relationship of the data, and correlation between the data can be revealed by analyzing the covariance matrix.
In the eigenvalue decomposition phase, eigenvalues λ_1, λ_2, λ_n and corresponding eigenvectors v_1, v_2, v_n are obtained by eigenvalue decomposition of the covariance matrix Cov (X). The eigenvalues represent the variance of the data in the direction of the corresponding eigenvector, i.e. the degree of contribution of the principal component. The feature vector represents the direction of the principal component, i.e., the projection of the data on the dimension-reduced coordinate axis.
When the feature vectors corresponding to the first k feature values are selected to form a projection matrix P, the feature vector corresponding to the first k largest feature values is selected according to the size of the feature values. The principal components corresponding to these feature vectors represent the most significant variance contributions in the data. By selecting larger eigenvalues, the main component with the most information content in the data is reserved.
And finally, multiplying the original data matrix X by the projection matrix P to obtain a dimensionality-reduced data matrix Y. This step projects the raw data in the principal component direction to obtain reduced-dimension data. The dimension Y of the data matrix after dimension reduction is lower, and the dimension of the data is reduced by reserving main data characteristics, so that the representation and the processing of the data are simplified.
Principal component analysis finds the principal modes of variation of data by analyzing the covariance structure of the data and represents it as a set of principal components that are orthogonal to each other. Therefore, the method can reduce the data dimension and simultaneously retain important information in the data, and provides a basis for subsequent data processing and privacy protection.
In some embodiments of the present application,
STEP5.1, determining a proportion parameter b (Laplace) of the Laplace noise or a standard deviation sigma (Gaussian) of the Gaussian noise according to the privacy budget epsilon and the data sensitivity;
STEP5.2, generating a corresponding amount of laplace or gaussian noise for the data within each cluster group;
STEP5.3, adding the generated noise to the data.
In this scheme, in STEP5, an embodiment of adding laplace or gaussian noise according to privacy budget and data sensitivity includes the STEPs of:
5.1. the scaling parameter b of the Laplace noise (Laplace) or the standard deviation σ of the Gaussian noise (Gaussian) is determined from the privacy budget epsilon and the data sensitivity: the scaling parameter b of the laplace noise or the standard deviation sigma of the gaussian noise is determined by mathematical derivation or statistical analysis, depending on the given privacy budget epsilon and data sensitivity. The privacy budget epsilon is a parameter that controls the intensity of noise and the degree of privacy protection, and the sensitivity of data represents the sensitivity of the data.
5.2. Generating a corresponding amount of laplace or gaussian noise for the data within each cluster group: based on the amount of data within each cluster group, a corresponding amount of Laplacian or Gaussian noise is generated for each data point. The generated noise is characteristic of a laplace or gaussian distribution and has appropriate scale parameters.
5.3. Adding the generated noise to the data: the generated laplace or gaussian noise is added to the data within each cluster group. Noise may be added to the corresponding data points in a point-by-point addition. Thus, each data point is added with corresponding noise, so that the aim of privacy protection is fulfilled.
By adding noise to the data, individual features in the data can be blurred so that the original data cannot be accurately inferred from the noisy data, thereby protecting the privacy of the user. The Laplace noise has better privacy protection effect, and the intensity of the noise can be controlled more accurately under the condition of privacy budget permission. Gaussian noise is also used in some cases, which has the characteristics of continuity and smoothness, and is suitable for certain data distribution situations.
Exemplary: in this example, the present embodiment will simulate how laplace or gaussian noise is combined with differential data distribution to protect user privacy in an internet of things smart meter. This embodiment uses the following two users' power consumption per hour data (unit: kWh) over 24 hours:
UserA [0.5,0.6,0.4,0.8,2.0,2.5,0.9,1.2,1.0,0.7,0.3,0.2,0.1,0.6,0.8,1.5,2.0,2.2,1.2,0.8,0.4,0.3,0.2,0.1]
User B [0.6,0.7,0.5,0.9,2.2,2.8,1.1,1.4,1.2,0.8,0.4,0.3,0.2,0.7,0.9,1.6,2.1,2.4,1.4,0.9,0.5,0.4,0.3,0.2]
The following is a procedure using the differential data distribution method:
(1) Calculating the difference value of the power consumption data per hour of each user:
user A difference [0.1, -0.2,0.4,1.2,0.5, -1.6,0.3, -0.2, -0.3, -0.4, -0.1, -0.1,0.5,0.2,0.7,0.5,0.2, -1.0, -0.4, -0.4, -0.1, -0.1, -0.1]
User B difference [0.1, -0.2,0.4,1.3,0.6, -1.7,0.3, -0.2, -0.4, -0.4, -0.1, -0.1,0.5,0.2,0.7,0.5,0.3, -1.0, -0.5, -0.4, -0.1, -0.1, -0.1]
(2) A laplace or gaussian noise is added to each difference. For simplicity of presentation, this embodiment uses Laplacian noise.
Let the privacy budget parameter epsilon be 0.1, the sensitivity delta be 1 (the largest possible variation between meter readings), the noise scale factor b=delta/epsilon=1/0.1=10.
The difference after user a added noise is [0.1+lap (0, 10), -0.2+lap (0, 10), 0.4+lap (0, 10) ]
The difference after noise addition for user B is [0.1+lap (0, 10), -0.2+lap (0, 10), 0.4+lap (0, 10) ]
Wherein Lap (0, 10) represents a Laplacian noise with 0 as the mean and 10 as the scale factor. This is merely meant to indicate that in practice a random laplace noise needs to be generated for each difference.
(3) Reconstructing power consumption data from the difference after adding noise:
user a reconstructs the data [0.5,0.6+lap (0, 10), 0.4+lap (0, 10) +lap (0, 10) ]
User B reconstructs the data [0.6,0.7+lap (0, 10), 0.5+lap (0, 10) +lap (0, 10) ]
By combining laplace or gaussian noise with differential data distribution, electricity consumption data can be distributed while protecting user privacy. Because of the noise introduced, it is difficult for an attacker to accurately infer the original power usage data.
It should be noted that the example here contains data of only two users, and that the actual situation may involve more users. Furthermore, to better protect privacy, it may be considered to use different privacy budget parameters ε and sensitivity Δ values, as well as combine Laplacian noise with Gaussian noise.
Specifically, the principle of adding laplace or gaussian noise according to privacy budget and data sensitivity is based on the concept of differential privacy and the principle of privacy protection algorithm.
The scaling parameter b of the laplace noise or the standard deviation sigma of the gaussian noise is determined from the privacy budget epsilon and the data sensitivity. The privacy budget epsilon is a parameter controlling the noise strength and the degree of privacy protection, and the data sensitivity indicates the sensitivity of the data. The scaling parameter b of the laplace noise or the standard deviation σ of the gaussian noise is determined by mathematical derivation or statistical analysis from a given privacy budget and data sensitivity. This allows the intensity and distribution of noise to be adjusted according to privacy requirements and data characteristics.
A corresponding amount of laplace or gaussian noise is generated for the data within each cluster group. Based on the amount of data within each cluster group, a corresponding amount of noise is generated for each data point. The generated noise is characteristic of a laplace or gaussian distribution and has appropriate scale parameters. This ensures that each data point has its corresponding noise value.
The generated noise is added to the data. The generated laplacian or gaussian noise is added to each data point by point-wise addition. In this way, each data point is added with corresponding noise, thereby achieving privacy protection of the data. The addition of noise makes it impossible to accurately infer original data from noise data, and the privacy of individuals is protected.
By adding laplace or gaussian noise, the privacy of the data can be protected to some extent and the recognizability of the identity of the individual reduced. The Laplace noise has the characteristic of tail redistribution, and is suitable for processing discrete data; and Gaussian noise has the characteristics of continuity and smoothness, and is suitable for processing continuous data. The addition of noise is closely related to privacy budget and data sensitivity, and by controlling the strength of the noise, a suitable balance between privacy protection and data availability can be found.
In some embodiments of the application, STEP6: exponentially Weighted Moving Average (EWMA):
setting: smoothing parameter α (0 < α < 1);
for each data point x_t in the time series;
calculating a weighted moving average: y_t=α x_t+ (1- α) y_ (t-1).
In this aspect, in STEP6, an embodiment of the Exponentially Weighted Moving Average (EWMA) includes the STEPs of:
6.1. Setting a smoothing parameter alpha: first, a smoothing parameter α is set, where 0< α <1. The smoothing parameter α determines the weight distribution between the current data point and the past weighted moving average. A larger alpha value indicates a higher weight to the current data point and a smaller alpha value indicates a higher weight to the past weighted moving average.
6.2. For each data point x_t in the time series: for each data point x_t in the time series, the following calculation steps are performed.
6.3. Calculating a weighted moving average: the weighted moving average is calculated by using the formula y_t=α x_t+ (1- α) y_ (t-1). Where y_t represents the weighted moving average of the current data point x_t and y_ (t-1) represents the weighted moving average of the previous data point.
By weighted averaging the current data point with a past weighted moving average, the EWMA method can smooth time series data and eliminate the effects of noise. The weighted moving average method allows closer data points to be weighted higher in the calculation, while farther data points have lower weights. The Exponential Weighted Moving Average (EWMA) method makes the smoothing of data points more flexible by dynamically adjusting the weighting factors. A larger smoothing parameter a will result in faster response speed and less hysteresis, while a smaller a will result in a smoothing result that is more important to the past data.
Exemplary: in this example, the present embodiment will simulate how to combine the laplace or gaussian noise with the K-means clustering algorithm to protect user privacy in the internet of things smart meter. This embodiment uses the following four users' power consumption per hour data (unit: kWh) over 24 hours:
UserA [0.5,0.6,0.4,0.8,2.0,2.5,0.9,1.2,1.0,0.7,0.3,0.2,0.1,0.6,0.8,1.5,2.0,2.2,1.2,0.8,0.4,0.3,0.2,0.1]
User B [0.6,0.7,0.5,0.9,2.2,2.8,1.1,1.4,1.2,0.8,0.4,0.3,0.2,0.7,0.9,1.6,2.1,2.4,1.4,0.9,0.5,0.4,0.3,0.2]
User C [1.0,1.1,0.8,1.6,4.0,5.0,1.8,2.4,2.0,1.4,0.6,0.4,0.2,1.2,1.6,3.0,4.0,4.4,2.4,1.6,0.8,0.6,0.4,0.2]
User D [1.2,1.4,1.0,1.8,4.4,5.6,2.2,2.8,2.4,1.6,0.8,0.6,0.4,1.4,1.8,3.2,4.2,4.8,2.8,1.8,1.0,0.8,0.6,0.4]
The following is a procedure using the K-means clustering algorithm:
(1) And putting the electricity consumption data of all the users into one data set. Each data point contains 24 dimensions, corresponding to 24 hours of power usage, respectively.
(2) Initializing a K-means clustering algorithm. In this example, the present embodiment assumes k=2, i.e., the present embodiment attempts to divide users into two groups. Two data points were chosen as the initial centroid. The present embodiment herein selects the power usage data for user a and user C as the initial centroid.
(3) Each data point (user) is assigned to the nearest centroid. In this example, the power usage data for user A and user B are closer, so they are assigned to centroid A; the power usage data of user C and user D are closer so they are assigned to centroid C.
(4) The centroid of each group is recalculated. In this example, the new centroid of group a is the average of the power usage data for user a and user B; the new centroid of group C is the average of the power usage data for user C and user D.
Repeating the steps 3 and 4 until the centroid is no longer changed or a predetermined number of iterations is reached. In this example, the present embodiment may find that the centroid has stabilized and no longer changes.
Now, this embodiment has divided users into two groups. Within each group, this embodiment may add laplace or gaussian noise, respectively, to protect user privacy. For example, for simplicity, this embodiment uses Laplacian noise. Let the privacy budget parameter epsilon be 0.1, the sensitivity delta be 1 (the largest possible variation between meter readings), the noise scale factor b=delta/epsilon=1/0.1=10. The present embodiment may add laplace noise to the power consumption data of each user within each group, respectively.
By combining Laplace or Gaussian noise with the K-means clustering algorithm, the specific embodiment can release electricity consumption data while protecting user privacy. Because of the noise introduced, it is difficult for an attacker to accurately infer the original power usage data. Meanwhile, noise is added in each group, so that the privacy can be protected, the statistical characteristics of the power consumption data can be reserved as much as possible, and valuable information is provided for intelligent power grid management.
It should be noted that the example here contains only four users' data, and that more users may be involved in practice. Furthermore, to better protect privacy, it may be considered to use different privacy budget parameters ε and sensitivity Δ values, as well as combine Laplacian noise with Gaussian noise. Finally, in practical applications, further processing of the data, such as data aggregation or dimension reduction, may be required to improve data processing efficiency and reduce data storage and transmission costs.
Specifically, an Exponential Weighted Moving Average (EWMA) is a time-series analysis method that performs a weighted average on data points to smooth time-series data. The principle can be summarized as follows:
In the stage of setting the smoothing parameter alpha, a proper smoothing parameter alpha is selected according to actual requirements and application scenes. This parameter determines the weight distribution between the current data point and the past weighted moving average. A larger alpha value indicates a higher weight to the current data point and a smaller alpha value indicates a higher weight to the past weighted moving average.
In the calculate weighted moving average phase, for each data point x_t in the time series, a weighted moving average y_t of the current data point is calculated by using the weighted moving average formula y_t=α x_t+ (1- α) y_ (t-1).
In this formula, α x_t represents the weighted value of the current data point, and (1- α) y_ (t-1) represents the past weighted moving translational average. The weighted moving average of the current data point is obtained by adding the weighted value of the current data point to the weighted value of the past weighted moving average.
The exponentially weighted moving average method assigns higher weights to newer data points and lower weights to older data points when calculating the weighted moving average. This way of weight distribution makes the weighted moving average more sensitive to the variation of newer data, thus realizing a smoothing process of time-series data. A larger smoothing parameter a makes the weighted moving average respond faster to changes in the data, and a smaller a makes the weighted moving average more important to past data and the smoothing effect more pronounced.
Short-term fluctuations and noise in the data can be removed by smoothing the time series data by an exponential weighted moving average method, and long-term trends and overall changes are highlighted. This helps to improve the readability of the data, reduce the effects of outliers, and provide a more reliable data basis for subsequent differential data distribution and privacy protection.
In some embodiments of the present application,
STEP7, differential data release:
STEP7.1, calculating a differential value of noise data: Δx_t=x_ (t+1) -x_t;
STEP7.2, issue differential data Δx_t instead of raw data.
In this aspect, in STEP7, the implementation of differential data distribution includes the following STEPs:
7.1. calculating a differential value of the noise data: for each data point x_t, a difference value Δx_t between it and the next data point x_ (t+1) is calculated. The differential value represents the variation between adjacent data points.
7.2. The differential data Δx_t is published instead of the raw data: the calculated differential data deltax_t is taken as the published data instead of the original data x_t. The differential data reflects the amount of variation between adjacent data points, reducing the risk of privacy disclosure by distributing the differential data instead of the original data.
Exemplary:
in this example, the present embodiment will simulate how to reduce the dimensionality of user power usage data in an internet of things smart meter using Principal Component Analysis (PCA), thereby improving data processing efficiency and reducing data storage and transmission costs. Assume that this embodiment has the following four users per hour power usage data (unit: kWh) over 24 hours:
UserA [0.5,0.6,0.4,0.8,2.0,2.5,0.9,1.2,1.0,0.7,0.3,0.2,0.1,0.6,0.8,1.5,2.0,2.2,1.2,0.8,0.4,0.3,0.2,0.1]
User B [0.6,0.7,0.5,0.9,2.2,2.8,1.1,1.4,1.2,0.8,0.4,0.3,0.2,0.7,0.9,1.6,2.1,2.4,1.4,0.9,0.5,0.4,0.3,0.2]
User C [1.0,1.1,0.8,1.6,4.0,5.0,1.8,2.4,2.0,1.4,0.6,0.4,0.2,1.2,1.6,3.0,4.0,4.4,2.4,1.6,0.8,0.6,0.4,0.2]
User D [1.2,1.4,1.0,1.8,4.4,5.6,2.2,2.8,2.4,1.6,0.8,0.6,0.4,1.4,1.8,3.2,4.2,4.8,2.8,1.8,1.0,0.8,0.6,0.4]
The following is a procedure for dimension reduction using PCA:
(1) And putting the electricity consumption data of all the users into one data set. Each data point contains 24 dimensions, corresponding to 24 hours of power usage, respectively.
(2) And (5) carrying out centering treatment on the data. The average value for each dimension is calculated and the corresponding average value is subtracted. This is to ensure that the data has a mean value of 0 across the dimensions.
(3) A covariance matrix is calculated. Covariance matrices are used to represent the relationships between the dimensions. In this example, this embodiment will result in a covariance matrix of 24x 24.
(4) Eigenvalues and eigenvectors of the covariance matrix are calculated. The eigenvalues represent the variance of the data in a particular direction, while the eigenvectors represent the corresponding directions. In this example, this embodiment will result in 24 eigenvalues and corresponding 24 eigenvectors.
(5) And selecting the characteristic value and the characteristic vector according to the number of the principal components which are required to be reserved. In general, this embodiment will retain feature vectors corresponding to larger feature values because they contain more information in the data. For example, in this example, the present embodiment may choose to retain the first two larger eigenvalues and their corresponding eigenvectors to reduce the data from 24 dimensions to 2 dimensions.
(6) The original data is projected into a new low-dimensional space using the retained feature vectors. Multiplying the data after the centering processing by the selected eigenvector matrix to obtain the data after the dimension reduction.
Now, this embodiment has reduced the original data from 24-dimensional to 2-dimensional, can store the data with less memory space, and has higher efficiency in processing and transmission. However, the dimension reduction process may result in partial information loss, and thus a balance needs to be found between the degree of dimension reduction and the information retention. In practical applications, it may also be considered to add laplace or gaussian noise to the data before dimension reduction to protect user privacy. Thus, even if the reduced-size data is compromised, it is still difficult for an attacker to accurately infer the original data.
Only the differences between the data are published through differential data publication, and the original data is not directly published. Thus, the risk of disclosure of individual privacy can be effectively reduced, because the differential data is insufficient to restore the specific value of the original data.
Specifically, the principle of differential data distribution is based on the concept of differential privacy and the principle of privacy protection. In the phase of calculating the differential value of the noise data, differential data is obtained by calculating the differential value Δx_t=x_ (t+1) -x_t between adjacent data points. The differential value reflects the variation between data points and does not relate to a specific raw data value.
In the differential data distribution stage, the calculated differential data deltax_t is taken as distributed data, and the original data x_t is not directly distributed. By only releasing the differential data, specific values of the original data are hidden, and the risk of disclosure of individual privacy is reduced.
The core idea of differential data distribution is to protect individual privacy by distributing the amount of change in the data, not the absolute value of the original data. Since the differential data cannot directly and reversely deduce the original data, an attacker cannot accurately restore sensitive information of an individual. At the same time, the differential data may still be subject to some statistical analysis and data processing, thereby preserving the usefulness and usability of the data.
By means of differential data release, leakage risk to individual privacy can be reduced to a certain extent, and a safer and privacy-protected data basis is provided for subsequent data sharing and access control.
In the above specific embodiment, the following technical matters may be further introduced:
STEP8, data sharing and access control:
STEP8.1, design a secure data sharing mechanism: based on the data released by the differential data, a safe data sharing mechanism is designed, and the safety of the data in the transmission and storage processes is ensured. This may include protecting the data using encryption techniques, transmitting the data using a secure communication protocol, and restricting access rights to the data using access control policies.
STEP8.2, design access control mechanism: and formulating an access control strategy to control the access authority of the data. This may be achieved by an authentication and authorization mechanism, ensuring that only authorized users or entities can access the data. Access control may be managed based on roles, permissions, or other access rules to ensure that data is shared only between legitimate and trusted users.
STEP8.3, ensure user privacy protection: when designing the data sharing and access control mechanism, the privacy of the user is ensured to be fully protected. This may be achieved by techniques and measures such as anonymization, data desensitization, access auditing, etc. The personal identity and sensitive information of the user should be properly protected from being acquired by malicious users or unauthorized entities.
By designing a safe data sharing mechanism and an access control mechanism, the safety and privacy protection of data in the sharing and access processes can be ensured. The encryption technology can protect confidentiality of data, and the safe communication protocol can prevent data from being stolen or tampered in the transmission process. The access control mechanism can limit the access rights of the data, and ensure that only legal users can acquire the data. Meanwhile, the privacy of the user is fully considered, and the personal identity and sensitive information are properly protected so as to prevent privacy disclosure and abuse.
STEP9, evaluation and optimization:
STEP9.1, effect of the periodic evaluation privacy preserving method: and periodically evaluating the designed differential privacy protection method, and checking the privacy protection degree and the data availability. This may be achieved by using methods of privacy metrics, privacy attack analysis, experimental assessment, etc.
STEP9.2, check privacy protection degree: and evaluating whether the adopted privacy protection method reaches the expected privacy protection degree. The degree of privacy protection can be assessed by measuring indicators of privacy exposure risk, privacy loss, information entropy, etc.
STEP9.3, optimizing privacy preserving method: and according to the evaluation result, the designed differential privacy protection method is adjusted and optimized to improve the protection effect. Depending on the outcome of the evaluation, different optimization strategies may be adopted, including adjusting the allocation of privacy budgets, improving the method of noise addition, optimizing data sharing and access control mechanisms, etc. Periodic evaluation and optimization is an important step in maintaining the effectiveness and adaptability of differential privacy preserving methods. With the continuous evolution of technology and attack means, privacy protection methods need to be constantly optimized and improved to deal with new privacy attacks and threats. By periodically evaluating the effect of the privacy preserving method, potential vulnerabilities and improvement spaces can be discovered and targeted optimization can be performed, so that the privacy preserving effect and the usability of data are improved.
In summary, by designing the secure data sharing mechanism and the access control mechanism, the security and privacy protection of the data in the sharing and access processes can be ensured. The effect of the privacy protection method is evaluated regularly, and the effect of privacy protection can be improved according to the evaluation result optimizing method. The steps and the measures together form the background technology of the intelligent electric meter of the Internet of things using differential privacy protection, so that the privacy of users is protected, and the availability and the safety of data are ensured.
Summarizing:
STEP1 data collection
Principle of: the intelligent electric meter of the Internet of things is connected to a power grid and the Internet to collect and transmit electricity utilization data of users in real time. In order to protect the privacy of the user, this embodiment needs to process these data to avoid revealing sensitive information.
STEP2 adding Laplacian or Gaussian noise
Principle of: to protect the privacy of the data, this embodiment may be implemented by adding laplace or gaussian noise to the data. In this way, even if an attacker acquires the processed data, it is difficult to extract the original data or the sensitive information of the user from it. This is a protection method based on differential privacy.
STEP3 Exponentially Weighted Moving Average (EWMA) smoothing
Principle of: EWMA is a common method for smoothing time series data, which can reduce short-term fluctuation in the data, so that the data is more stable. This can reduce the influence of noise data on the privacy preserving effect while reducing the recognizability of the processed data.
STEP 4K-means clustering grouping
Principle of: the K-means clustering algorithm groups similar data points together. Thus, the present embodiment can add noise to each packet separately to achieve more efficient privacy protection. The packets may also reduce the likelihood of an attacker extracting raw data or user-sensitive information from the processed data.
STEP5 Principal Component Analysis (PCA) dimension reduction
Principle of: PCA is a dimension reduction method that reduces the dimensionality of data by preserving the major components in the data. Reducing the data dimension helps to reduce the complexity of the data while increasing the privacy preserving effect to some extent because some sensitive information may be lost during the dimension reduction process.
STEP6 fitting probability distribution
Principle of: fitting the probability distribution of the data can find the probability distribution that best matches the original data distribution. According to the fitting result, the specific embodiment can adjust the distribution of noise so as to better protect the privacy of data. This can improve the reliability and confidentiality of data processing.
STEP7 differential data Release
Principle of: differential data distribution is one way to protect privacy by distributing data differences rather than raw data. Thus, even if an attacker acquires the processed data, the original data is difficult to restore, so that better privacy protection is realized.
STEP8 data processing
Principle of: in the above steps, the present embodiment processes the original data, including adding noise, smoothing, clustering, dimension reduction, fitting probability distribution, and differential data distribution. These processing methods aim to protect user privacy while preserving as much useful information of the data as possible. After the data is processed, the embodiment can use the processed data for further analysis and application, for example, for optimizing power grid dispatching, realizing intelligent home, and the like.
STEP9 authentication and assessment
Principle of: in order to ensure that the privacy preserving method implemented in this embodiment is effective, this embodiment requires verification and evaluation of the processed data. This includes assessing the availability of the data (e.g., whether the processed data is still able to meet the application requirements) and the degree of privacy protection (e.g., whether an attacker is still able to extract sensitive information from the processed data). The results of verification and evaluation can help the present embodiment further optimize and adjust the protection method to achieve better privacy protection.
Summarizing: according to the data processing method of the intelligent electric meter of the Internet of things based on differential privacy protection, the steps of adding noise, smoothing, clustering grouping, dimension reduction, fitting probability distribution, differential data release and the like are carried out, so that the user privacy is protected, and meanwhile useful information of data is reserved as far as possible. The method can effectively cope with potential attackers, and reduces the risk of sensitive information leakage.
Summarizing, aiming at the related problems in the prior art, the encryption method of the intelligent electric meter of the internet of things based on the differential privacy protection provided by the embodiment adopts the following technical means or characteristics to realize solving:
1. data quality and availability problems: by performing a preprocessing operation of quality and availability of the data in the data preprocessing stage, it is ensured that the data has good quality and availability before the encryption method is performed. This may include preprocessing steps such as data cleansing, outlier processing, data population, etc., to improve the accuracy and reliability of the data.
2. Risk of privacy disclosure: by using the principle and method of differential privacy, appropriate noise parameters are determined according to privacy budget and data sensitivity in the noise adding stage, and noise is adjusted according to probability distribution fitting so as to maximally protect individual privacy. The personalized noise adding strategy can reduce privacy disclosure risk and improve the privacy protection degree of data.
3. Noise distortion: the techniques of this embodiment can adjust and smooth noise based on the characteristics and trends of the data by probability distribution fitting and exponentially weighted moving average. The probability distribution fitting can be performed with noise adjustment according to the distribution characteristics of the data to reduce the distortion of the data. The exponential weighted moving average method can smooth time series data, preserve important characteristics of the data and reduce the influence of noise on the data.
4. Processing time sequence characteristics of data: the exponential weighted moving average method in the technique of the present embodiment takes into consideration the characteristics of time-series data, and performs smoothing processing on the data according to the principle of exponential weighted average. This allows the timing characteristics of the data to be fully utilized while maintaining the accuracy and usability of the data.
In general, the technology provided by the specific embodiment solves the defects of data quality and usability problems, privacy leakage risks, noise distortion, insufficient processing of time sequence characteristics of data and the like of the traditional technology in principle by combining methods of data preprocessing, differential privacy protection, probability distribution fitting, an exponential weighted moving average method and the like.
In some embodiments of the present application, please refer to fig. 2-4 in combination: the encryption method of the internet of things intelligent ammeter with differential privacy protection provided by the embodiment of the application is shown in the figure, when in practical application, the encryption method drives or controls the encryption method, and the encryption method is stored in a central control computer or a storage medium of the internet of things for running, and the principle is as follows:
the C++ program line of sight is added with Laplace noise, K-means clustering, EWMA smoothing, PCA dimension reduction, data after release processing and the like. While the program uses some external library: (1) Eigen, (2) OpenCV, and (3) Dlib to achieve some key functions. The following is an illustration of the principle of each key function:
(1) add_play_noise: the purpose of this function is to add laplace noise to the input data matrix. It receives a data matrix and two parameters (delta and epsilon), then uses the laplace distribution to generate noise, adds the noise to the input data, and finally returns the noise added data matrix. This helps to achieve privacy protection because the addition of noise can protect sensitive information in the original data.
(2) ewma_smoothening: this function is used to perform an Exponentially Weighted Moving Average (EWMA) smoothing process on the input data. It receives a data matrix and a smoothing factor alpha. EWMA smoothing is a commonly used time series data smoothing technique that assigns past observations to different weights such that the most recent observations are weighted more heavily and the older observations are weighted less heavily. This function returns the smoothed data matrix.
(3) pca: this function reduces the dimension of the input data based on a PCA (principal component analysis) algorithm. It receives a data matrix and a target dimension. By computing the covariance matrix, eigenvalues, and eigenvectors, PCA can find a low-dimensional space such that the projection of the data onto this space can preserve the maximum variance. This helps reduce the dimensionality of the data while retaining the primary information of the data. The function returns the reduced-dimension data matrix.
(4) kmeans_classification: this function groups the input data using a K-means clustering algorithm. It receives a data matrix and a number of clusters. K-means clustering is an unsupervised learning algorithm by iteratively assigning data to K clusters (groups) and updating the center of each cluster. Eventually, the algorithm will converge to a locally optimal solution. This helps group similar data points together to achieve better privacy protection inside the cluster. The function returns the grouped data matrix.
The technical features of the above-described embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above-described embodiments may not be described, however, they should be considered as the scope of the present description as long as there is no contradiction between the combinations of the technical features.
Example 1
In order that the above-recited embodiments of the invention may be understood in detail, a more particular description of the invention, briefly summarized below, may be had by way of example. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, so that the invention is not limited to the embodiments disclosed below.
In this embodiment, all the structures and principles of the encryption method of the internet of things smart meter for differential privacy protection provided by the foregoing specific embodiments are used as implementation manners, and an application scenario is shown, where the structures and principles of the encryption method of the internet of things smart meter for differential privacy protection provided by the foregoing specific embodiments are adopted for carrying out application deduction description and display, where:
the embodiment is provided with the following users and electricity consumption information:
there are 4 users per hour power usage data (unit: kWh) over 24 hours:
UserA [0.5,0.6,0.4,0.8,2.0,2.5,0.9,1.2,1.0,0.7,0.3,0.2,0.1,0.6,0.8,1.5,2.0,2.2,1.2,0.8,0.4,0.3,0.2,0.1]
User B [0.6,0.7,0.5,0.9,2.2,2.8,1.1,1.4,1.2,0.8,0.4,0.3,0.2,0.7,0.9,1.6,2.1,2.4,1.4,0.9,0.5,0.4,0.3,0.2]
User C [1.0,1.1,0.8,1.6,4.0,5.0,1.8,2.4,2.0,1.4,0.6,0.4,0.2,1.2,1.6,3.0,4.0,4.4,2.4,1.6,0.8,0.6,0.4,0.2]
User D [1.2,1.4,1.0,1.8,4.4,5.6,2.2,2.8,2.4,1.6,0.8,0.6,0.4,1.4,1.8,3.2,4.2,4.8,2.8,1.8,1.0,0.8,0.6,0.4]
The present example now operates according to STEP1 to STEP9 provided in the detailed description:
STEP 1. First, an Exponentially Weighted Moving Average (EWMA) of the power consumption of each user is calculated. Here, the present embodiment selects the weight α=0.3. And after EWMA is calculated, smoothed electricity consumption data are obtained.
STEP2, adding Laplace or Gaussian noise to the smoothed power consumption data. In this example, the present embodiment chooses to add laplace noise. The privacy budget parameter epsilon is set to 0.1, the sensitivity delta is 1, and the noise scale factor b=delta/epsilon=1/0.1=10. Laplace noise is added to the power consumption data of each user.
STEP3, the data added with noise is subjected to grouping processing by using a K-means clustering algorithm. In this example, the present embodiment selects to divide user data into 2 groups. After using K-means clustering, this embodiment can obtain two clusters. For data within each cluster, the present embodiment may again add laplace or gaussian noise, where the present embodiment continues to use laplace noise. Similarly, the privacy budget parameter ε is set to 0.1, the sensitivity Δ is 1, and the noise scale factor b=Δ/ε=1/0.1=10. Laplace noise is added to the data within each cluster.
STEP4, probability distribution fitting is carried out on the data after noise addition. The present embodiment may be fitted here using a normal distribution or other distribution. Based on the fitting result, the distribution of noise can be adjusted so that the added noise more conforms to the distribution of actual data.
STEP5, the data added with noise is subjected to differential processing with the original data. And calculating the difference value between the data added with noise and the original data, and distributing the difference data. In this way, even if an attacker obtains differential data, it is difficult to accurately infer the original data.
STEP 6. The differential data is further smoothed using an Exponentially Weighted Moving Average (EWMA). Here, the present embodiment may select the weight α=0.3. And after EWMA is calculated, smoothed differential data are obtained.
STEP7, performing Principal Component Analysis (PCA) dimension reduction on the smoothed differential data. In this example, the present embodiment may choose to reduce the data from 24 dimensions to 2 dimensions. This will reduce the cost of data storage and transmission while improving data processing efficiency.
STEP8, classifying the difference data after dimension reduction by using a Support Vector Machine (SVM) or other classification algorithms. This may help the present embodiment to understand the power usage patterns of different users and provide useful information for smart grid management.
STEP9, evaluation and optimization of the classification result. The classification model can be evaluated by using indexes such as accuracy, recall rate and the like, and model parameters can be adjusted according to the evaluation result so as to improve the classification performance.
Through the steps, privacy protection processing of the data of the intelligent ammeter of the Internet of things is achieved. In the whole process, the embodiment protects the privacy of the user by means of adding Laplace noise, clustering, differential data release and the like. These methods make it difficult to accurately infer the original user power usage information even if the attacker obtains the processed data. The embodiment also improves the data processing efficiency through means of PCA dimension reduction, data classification and the like while protecting privacy, and provides useful information for intelligent power grid management.
The above examples merely illustrate embodiments of the invention that are specific and detailed for the relevant practical applications, but are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Example two
In order that the above-recited embodiments of the invention may be understood in detail, a more particular description of the invention, briefly summarized below, may be had by way of example. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, so that the invention is not limited to the embodiments disclosed below.
In order to demonstrate the defending effect of the encryption method provided by the present invention, in the second embodiment, the parameter setting and scene simulation of the first embodiment are continuously used, and a malignant condition is introduced:
it is assumed that there is a malicious attacker who tries to infer the original power consumption information of the user from the compromised processed data. The attacker obtains the data processed in STEP1 to STEP9, including the distributed differential data, the data after dimension reduction, the classification result, and the like.
A malicious attacker may attempt to recover the original data using the following method:
(1) Based on the differential data and the dimension reduced data, attempting to reconstruct the original data;
(2) Analyzing the electricity utilization behavior of the user by using the classification result and other public information such as weather, holidays and the like;
(3) Using the known power usage information for some users, attempts are made to infer the power usage of other users.
However, since the present embodiment employs various privacy protection means, it is difficult for an attacker to accurately infer the original data. The reasons are as follows:
(1) In adding laplace noise, the present embodiment adjusts the noise intensity with a privacy budget parameter ε of 0.1 and a noise scaling factor b=Δ/ε=1/0.1=10. This means that an attacker cannot accurately extract the original data from the noise data. Even if an attacker tries to recover the data using statistical methods, there will be a large error in the recovered result due to the presence of noise.
(2) By using K-means clustering and differential data distribution, the embodiment further protects user privacy. An attacker may try to infer the user's power usage behavior from the clustering result and the differential data, but it is difficult for the attacker to get accurate information because the data has been added with noise.
(3) The Principal Component Analysis (PCA) dimension reduction process also helps to preserve user privacy. The reduced-dimension data loses a part of information, making it difficult for an attacker to restore the original data through low-dimension data. Even if an attacker tries to make inferences in combination with other information sources, they cannot accurately obtain the original electricity usage information.
(4) Even if an attacker obtains the original electricity consumption information of part of users, the attacker still cannot accurately infer the electricity consumption of other users because the privacy protection method adopted by the embodiment can independently add noise to the data of each user.
The above examples merely illustrate embodiments of the invention that are specific and detailed for the relevant practical applications, but are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (10)

1. The encryption method of the intelligent electric meter of the Internet of things with differential privacy protection comprises an intelligent electric meter system of the Internet of things, and is characterized by comprising the following steps:
STEP1, preprocessing initial data of an intelligent ammeter system of the Internet of things;
STEP2, clustering analysis: clustering the preprocessed data, and dividing the data of the intelligent ammeter system of the Internet of things into different groups;
STEP3, principal component analysis: applying a Principal Component Analysis (PCA) method to data in each cluster group of the intelligent ammeter system of the Internet of things, and reserving main characteristics;
STEP4, probability distribution fitting: performing probability distribution fitting on the dimensionality reduced data in each cluster group of the intelligent electric meter system of the Internet of things;
STEP5, adding differential privacy noise: adding Laplacian or Gaussian noise to the data in each cluster group according to the fitted probability distribution and privacy budget;
STEP6, exponentially weighted moving average: combining the data added with noise with an exponential weighted moving average method, and smoothing the time sequence data;
STEP7, differential data release: and calculating the differential value of the noise data and issuing the differential data.
2. An encryption method according to claim 1, characterized in that: in STEP 2:
STEP2.1, initializing a clustering center: randomly selecting K data points as initial cluster centers c_1, c_2, c_k;
STEP2.2, assign each data point to the nearest cluster center: calculating the distance between each data point and all the cluster centers, and distributing the data points to the cluster center closest to the data point;
STEP2.3, updating a clustering center: and recalculating the average value of each cluster as a new cluster center according to the distribution result.
3. An encryption method according to claim 2, characterized in that: STEP2.4, repeat STEP2.2 to STEP2.3 until no change occurs in the cluster center.
4. An encryption method according to claim 2, characterized in that: in STEP3, principal component analysis: STEP3.1, calculating a covariance matrix Cov (X) of the data matrix X;
STEP3.2, performing eigenvalue decomposition on the covariance matrix Cov (X) to obtain eigenvalues λ_1, λ_2, λ_n and corresponding eigenvectors v_1, v_2, v_n;
STEP3.3, selecting feature vectors corresponding to the first k feature values to form a projection matrix P;
STEP3.4, multiplying the original data matrix X by the projection matrix P to obtain a data matrix Y after dimension reduction.
5. The encryption method according to any one of claims 1 to 4, characterized in that: in STEP 5:
STEP5.1, determining a proportion parameter b of the Laplacian noise or a standard deviation sigma of the Gaussian noise according to the privacy budget epsilon and the data sensitivity;
STEP5.2, generating a corresponding amount of laplace or gaussian noise for the data within each cluster group;
STEP5.3, adding the generated noise to the data.
6. The encryption method according to claim 5, characterized in that: STEP6, exponentially weighted moving average:
setting: smoothing parameter α (0 < α < 1);
for each data point x_t in the time series;
calculating a weighted moving average: y_t=α x_t+ (1- α) y_ (t-1).
7. The encryption method according to claim 5, characterized in that: STEP7, differential data release:
STEP7.1, calculating a differential value of noise data: Δx_t=x_ (t+1) -x_t;
STEP7.2, issue differential data Δx_t.
8. The encryption method according to claim 7, characterized in that: in STEP1, the collected initial data is subjected to washing, outlier processing, and missing value filling.
9. A computer device comprising a processor, a memory coupled to the processor, the memory having stored therein program instructions that, when executed by the processor, cause the processor to perform the encryption method of any one of claims 1-8.
10. A storage medium storing program instructions enabling the encryption method according to any one of claims 1 to 8.
CN202310563375.8A 2023-05-18 2023-05-18 Encryption method of intelligent electric meter of Internet of things with differential privacy protection Pending CN116595553A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310563375.8A CN116595553A (en) 2023-05-18 2023-05-18 Encryption method of intelligent electric meter of Internet of things with differential privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310563375.8A CN116595553A (en) 2023-05-18 2023-05-18 Encryption method of intelligent electric meter of Internet of things with differential privacy protection

Publications (1)

Publication Number Publication Date
CN116595553A true CN116595553A (en) 2023-08-15

Family

ID=87593272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310563375.8A Pending CN116595553A (en) 2023-05-18 2023-05-18 Encryption method of intelligent electric meter of Internet of things with differential privacy protection

Country Status (1)

Country Link
CN (1) CN116595553A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117892357A (en) * 2024-03-15 2024-04-16 大连优冠网络科技有限责任公司 Energy big data sharing and distribution risk control method based on differential privacy protection

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117892357A (en) * 2024-03-15 2024-04-16 大连优冠网络科技有限责任公司 Energy big data sharing and distribution risk control method based on differential privacy protection
CN117892357B (en) * 2024-03-15 2024-05-31 国网河南省电力公司经济技术研究院 Energy big data sharing and distribution risk control method based on differential privacy protection

Similar Documents

Publication Publication Date Title
JP7376593B2 (en) Security system using artificial intelligence
JP7060619B2 (en) Biometric identification system and method
US11741132B2 (en) Cluster-based scheduling of security operations
CN108763954B (en) Linear regression model multidimensional Gaussian difference privacy protection method and information security system
KR20060023533A (en) Method and system for authentication of a physical object
Chen et al. Deep secure quantization: On secure biometric hashing against similarity-based attacks
US8601553B1 (en) Techniques of imposing access control policies
US10083194B2 (en) Process for obtaining candidate data from a remote storage server for comparison to a data to be identified
CN116595553A (en) Encryption method of intelligent electric meter of Internet of things with differential privacy protection
KR20160044425A (en) Secure data storage based on physically unclonable functions
CN113254988B (en) High-dimensional sensitive data privacy classified protection publishing method, system, medium and equipment
US11675921B2 (en) Device and method for secure private data aggregation
CN115913643A (en) Network intrusion detection method, system and medium based on countermeasure self-encoder
CN117521117A (en) Medical data application security and privacy protection method and system
CN117235770A (en) Power data sharing analysis system and method based on differential privacy
Bringer et al. Practical identification with encrypted biometric data using oblivious ram
CN112367396B (en) Method and device for determining sample characteristic quantile in distributed cluster
KR102258910B1 (en) Method and System for Effective Detection of Ransomware using Machine Learning based on Entropy of File in Backup System
Fu et al. A Blockchain-Based Federated Random Forest Approach for Power-Related Data Collaborative Analysis
Gao et al. Fragile Model Watermark for integrity protection: leveraging boundary volatility and sensitive sample-pairing
CN116821879B (en) Visual system role management system
CN117688613B (en) Differential privacy protection method for time sequence release of classified tasks
Huong et al. Anomaly detection enables cybersecurity with machine learning techniques
Hong et al. A Blockchain‐Integrated Divided‐Block Sparse Matrix Transformation Differential Privacy Data Publishing Model
CN114697463A (en) Encryption transmission method and image sensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240220

Address after: 25 Guangming Road, Yinchuan (National) economic and Technological Development Zone, Ningxia Hui Autonomous Region

Applicant after: NINGXIA LONGJI NINGGUANG INSTRUMENT Co.,Ltd.

Country or region after: China

Address before: No. 035 (065), Community 1, Lihuazui Village, Zhangjiayuan Township, Tongxin County, Wuzhong City, Ningxia, 751100

Applicant before: Ma Xian

Country or region before: China