CN109495476B

CN109495476B - Data stream differential privacy protection method and system based on edge calculation

Info

Publication number: CN109495476B
Application number: CN201811379012.4A
Authority: CN
Inventors: 张尧学; 刘峻丞; 任炬; 胥楚贵
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2018-11-19
Filing date: 2018-11-19
Publication date: 2020-11-20
Anticipated expiration: 2038-11-19
Also published as: CN109495476A

Abstract

The invention discloses a data stream differential privacy protection method and a data stream differential privacy protection system based on edge calculation, wherein the method comprises the following steps: s1, edge equipment receives characteristic data which is acquired by terminal equipment and subjected to characteristic extraction through a preset encoder; s2, aggregating the characteristic data and adding disturbance noise; s3, performing characteristic reconstruction on the characteristic data added with the disturbance noise through a preset decoder to obtain reconstructed data; the encoder and the decoder are obtained by training the same self-encoder. The method has the advantages of small service response delay, high service quality, high system throughput, small calculation load of each edge device, small data transmission quantity between the user and the edge device, high privacy protection degree and the like.

Description

Data stream differential privacy protection method and system based on edge calculation

Technical Field

The invention relates to the field of edge computing, in particular to a data stream differential privacy protection method and system based on edge computing.

Background

With the advent of the information age, the information technology industry has rapidly developed. The Internet is one of the fastest-growing information technology industries, and is an indispensable part of various fields, because it can provide diversified services to users. With the increasing variety and explosive increase of the number of Internet terminal devices and the remarkable increase of the Quality of Service (QoS) and diversification demand of users, the Internet also faces many challenges today. Among them, how to process a large amount of data in the Internet, how to guarantee real-time performance of services, and how to ensure security of users are three main challenges.

Cloud computing, as a computing model based on the Internet, provides on-demand scalable services through centralized computing and storage. However, with the growth of terminal devices and data volume and the rapid development of ubiquitous network technology, not only a large amount of network bandwidth is occupied by transferring computation to the cloud, but also the delay of service requests and responses is increased, and particularly in terms of application support sensitive to delay, cloud computing featuring centralized computation has been difficult to meet the development requirements of these technologies and applications. Thereby promoting the rising of calculation modes such as edge calculation, fog calculation and the like. In principle, edge computing and fog computing have similar ideas, and the purpose is to enable computing to be closer to users, namely cloud computing is expanded from a centralized large data center to a network edge which is close to the users, so that the defects of network bottleneck, high delay and the like of the traditional cloud computing are overcome, and the service request response speed and the user experience of end users are improved. Technically, the cloud computing and edge computing relieve the computing pressure of the cloud by deploying a special server or a small and medium-sized computing center at the edge of a network close to a user, and improve the QoS of the user service. By utilizing edge calculation, the requirements of users can be better met under the scenes of large data volume and high real-time requirement.

In the past cloud computing, data needs to be stored in the cloud end and then processed, and therefore the time for responding to a service request is increased. If the mode of storing before processing is also adopted in the edge calculation, although the response time can be shortened by using the calculation closer to the user, the method is still not a good solution in the scene with higher real-time requirement. Therefore, if data can be processed during data transmission, the service request response time is greatly shortened, which is a real-time data flow solution based on edge calculation. Kafka, as a distributed streaming platform (a distributed streaming platform), can provide real-time data stream processing capability for edge devices. It possesses three key characteristics: (1) being able to publish and subscribe to streaming data; (2) the ability to securely store streaming data in a cluster with distributed, reproducible, and fault tolerant mechanisms; (3) the arriving stream data can be processed in time. These three characteristics are required for a streaming platform. In Kafka, topic is an abstraction of a group of messages, or a classification of messages. In a typical Producer Consumer model, the Producer may send messages to a topic, which are stored in a Kafka server called brokers, and the Consumer may then subscribe to the topic and consume them from the brokers.

Although the use of edge computing and real-time processing of data streams can provide benefits for the analysis of data, edge computing, like other traditional computing models, also faces serious safety issues. As in mobile applications, many online services rely on personal data collected from users, which can enhance the utility of mobile applications, providing personalized services to users, such as ad-pushing, purchasing preferences, etc., but which can also be used by malicious attackers to infer sensitive information about users, such as gender inference, location tracking, speaker identification, etc. From the user's perspective, the user desires to expose as little private information as possible, i.e., as little user personal data as possible is collected. From the service provider's perspective, it is desirable to collect more user personal data to provide better service. Obviously, there is an essential contradiction between the two. Therefore, how to balance the usability of the collected information and the security of the user privacy is a matter that needs to be carefully considered.

The techniques adopted in the existing schemes for protecting user privacy mainly include: anonymization processing, data conversion, data encryption, differential privacy and the like, but even if the technologies are adopted, the current scheme still has the following defects:

1. currently, even if edge computing has a portion of the computing dropped from the cloud to an edge device near the user, the service with high real-time requirements cannot be met.

2. Edge computing faces the problem of security, i.e., the data handled by the edge device involves a contradiction between data availability and privacy security.

3. At present, most of privacy protection modes adopt centralized data cleaning (privacy removal) to limit the throughput of a system, and the requirement of low-delay service cannot be met.

4. There is a conflict between edge device computing power and security policies.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides the data flow difference privacy protection method and the data flow difference privacy protection system based on the edge computing, which have the advantages of small service response delay, high service quality, high system throughput, small computing load of each edge device, small data transmission quantity between a user and the edge device and high privacy protection degree.

In order to solve the technical problems, the technical scheme provided by the invention is as follows: a data flow differential privacy protection method based on edge calculation comprises the following steps:

s1, edge equipment receives characteristic data which is acquired by terminal equipment and subjected to characteristic extraction through a preset encoder;

s2, aggregating the characteristic data and adding disturbance noise;

s3, performing characteristic reconstruction on the characteristic data added with the disturbance noise through a preset decoder to obtain reconstructed data;

the encoder and the decoder are obtained by training the same self-encoder.

Further, in step S1, the feature data is acquired by the terminal device in one acquisition time window by using a preset acquisition time window as a unit, and is obtained after feature extraction is performed by a preset encoder.

Further, step S2 specifically includes: and the input layer node in the edge device aggregates the received characteristic data acquired by the terminal devices in the first time window according to a preset first time window, calculates the disturbance noise budget of each characteristic data, and adds disturbance noise to the characteristic data according to the disturbance noise budget.

Further, the disturbance noise budget is calculated and determined according to equation (1):

in the formula (1), n is the total number of terminal devices and n is the preset total privacy budget_kIs at presentInputting the number of terminal devices connected to the nodes of the layer,_kfor the privacy budget of the current input layer node, β_iRepresenting the proportion of each feature in the privacy budget of the current input level node within the current first time window, d representing the dimension of the feature,

representing the average correlation degree of the ith input feature in the current first time window of the current input layer node, namely taking the current feature as a central point, calculating the average Euclidean distance between adjacent features, f_jRepresenting the jth characteristic value within the current first time window of the current input level node,_ia privacy budget for an ith input feature within a current first time window of a current input layer node.

Further, adding disturbance noise to the feature data according to equation (2):

f_i'＝f_i+Lap(Δh₀/_i) (2)

in the formula (2), f_i' feature value after adding disturbance noise, f_iTo add the eigenvalues before the disturbance noise, Δ h₀Identifying global sensitivity, Lap (-) is a Laplace distribution,_ia privacy budget for an ith input feature within a current first time window of a current input layer node.

Further, step S3 specifically includes: and the output layer node in the edge device receives and aggregates the characteristic data provided by the input layer node after disturbance noise is added according to a preset second time window, and performs characteristic reconstruction on the characteristic data received in the second time window through a preset decoder to obtain reconstructed data.

A data stream differential privacy protection system based on edge computing, comprising an edge device for: receiving characteristic data acquired by terminal equipment and subjected to characteristic extraction through a preset encoder; aggregating the characteristic data and adding disturbance noise; performing feature reconstruction on the feature data added with the disturbance noise through a preset decoder to obtain reconstructed data; the encoder and the decoder are obtained by training the same self-encoder.

Further, the edge device includes an input layer node, where the input layer node is configured to aggregate the received feature data acquired by each terminal device in a first preset time window, calculate a disturbance noise budget for each feature data, and add disturbance noise to the feature data according to the disturbance noise budget.

in the formula (1), n is the total number of terminal devices and n is the preset total privacy budget_kThe number of terminal devices connected to the current input layer node,_kfor the privacy budget of the current input layer node, β_iRepresenting the proportion of each feature in the privacy budget of the current input level node within the current first time window, d representing the dimension of the feature,

f_i'＝f_i+Lap(Δh₀/_i) (2)

in the formula (2), f_i' feature value after adding disturbance noise, f_iTo add the eigenvalues before the disturbance noise, Δ h₀Identifying global sensitivity, Lap (-) as LapralaThe distribution of the number of lines of the channel,_ia privacy budget for an ith input feature within a current first time window of a current input layer node.

Further, the edge device includes an output layer node, where the output layer node is configured to receive and aggregate the feature data, provided by the input layer node, after the disturbance noise is added according to a preset second time window, and perform feature reconstruction on the feature data received in the second time window through a preset decoder, so as to obtain reconstructed data.

And further, the system also comprises a terminal device, wherein the terminal device is used for collecting data according to a preset collection time window as a unit, performing characteristic extraction on the data in the collection time window according to a preset encoder to obtain characteristic data, and providing the characteristic data for the edge device.

Compared with the prior art, the invention has the advantages that:

1. according to the invention, the acquisition time window is set on the terminal equipment, data are acquired according to the acquisition time window, characteristic extraction is carried out, and the data are transmitted to the edge equipment for subsequent processing, the input layer node of the edge equipment receives the characteristic data sent by the terminal equipment accessed to the node according to the first time window, disturbance noise is added to each characteristic data through a self-adaptive algorithm, the output layer node of the edge equipment receives the characteristic data of the input layer node added with the disturbance noise according to the second time window, reconstruction is carried out through a decoder to obtain reconstructed data, the reconstructed data are provided for other systems to use, and the reconstructed data cannot obtain sensitive information of a user.

2. The edge device is provided with a plurality of input layer nodes, each input layer node is connected with a plurality of terminal devices, and the characteristic data of the accessed terminal devices are processed.

3. The terminal equipment aligns the acquired data through Hash, extracts the characteristics through the encoder of the terminal equipment, and transmits the characteristic data to the input layer node of the edge equipment, so that the data transmission quantity between the terminal equipment of a user and the input layer node of the edge equipment is reduced, and the waste of network bandwidth is reduced; and the encoder and the decoder are two parts in the same self-encoder, and are loaded on the terminal equipment after being trained in advance, so that the encoder does not need to be trained by the terminal equipment, and the requirement on the processing capacity of the terminal equipment is also reduced.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention.

FIG. 2 is a system architecture diagram according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a self-encoder architecture according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.

As shown in fig. 1, the method for protecting data stream differential privacy based on edge calculation according to this embodiment includes: s1, edge equipment receives characteristic data which is acquired by terminal equipment and subjected to characteristic extraction through a preset encoder; s2, aggregating the characteristic data and adding disturbance noise; s3, performing characteristic reconstruction on the characteristic data added with the disturbance noise through a preset decoder to obtain reconstructed data; the encoder and the decoder are obtained by training the same self-encoder.

In this embodiment, the feature data in step S1 is obtained by the terminal device collecting the feature data in one collection time window by using a preset collection time window as a unit, and performing feature extraction by using a preset encoder. Step S2 specifically includes: and the input layer node in the edge device aggregates the received characteristic data acquired by the terminal devices in the first time window according to a preset first time window, calculates the disturbance noise budget of each characteristic data, and adds disturbance noise to the characteristic data according to the disturbance noise budget.

In this embodiment, the disturbance noise budget is calculated and determined according to equation (1):

In the present embodiment, disturbance noise is added to the feature data according to equation (2):

f_i'＝f_i+Lap(Δh₀/_i) (2)

In this embodiment, step S3 specifically includes: and the output layer node in the edge device receives and aggregates the characteristic data provided by the input layer node after disturbance noise is added according to a preset second time window, and performs characteristic reconstruction on the characteristic data received in the second time window through a preset decoder to obtain reconstructed data.

In this embodiment, the edge device includes an input layer node, where the input layer node is configured to aggregate, according to a preset first time window, the feature data acquired by each terminal device and received in the first time window, calculate a disturbance noise budget for each feature data, and add disturbance noise to the feature data according to the disturbance noise budget.

representing the average degree of correlation of the ith input feature in the current first time window of the current input layer node, namely taking the current feature as a central point, calculating the average degree of correlation between adjacent featuresMean Euclidean distance, f_jRepresenting the jth characteristic value within the current first time window of the current input level node,_ia privacy budget for an ith input feature within a current first time window of a current input layer node.

f_i'＝f_i+Lap(Δh₀/_i) (2)

In this embodiment, the edge device includes an output layer node, where the output layer node is configured to receive and aggregate feature data, which is provided by the input layer node and to which the disturbance noise is added, according to a preset second time window, and perform feature reconstruction on the feature data received in the second time window through a preset decoder, so as to obtain reconstructed data.

In this embodiment, the system further includes a terminal device, where the terminal device is configured to collect data according to a preset collection time window as a unit, perform feature extraction on the data in the collection time window according to a preset encoder, obtain feature data, and provide the feature data to the edge device.

In this embodiment, 10000 pieces of actual data in an application scenario of a taxi taking in a city are taken as an example for explanation, and the experimental data includes 17 fields: the cloud terminal comprises a media (value md5 of a vehicle binding identifier), a hack _ license (value md5 of a taxi driving license binding identifier), a pickup _ datatime (time of getting on the vehicle of a passenger), a drop _ datatime (time of getting off the vehicle of the passenger), a trip _ time _ in _ games (time of taking a bus), a trip _ distance (driving distance), a fare _ amount, a surcharge (extra fee), an mta _ tax, a tip _ amount (small fee), a tolls _ amount (passage fee), a total _ amount (total amount of all fees) and the like, wherein the query purpose of the cloud terminal is to count the sum of the car taking fees in each time window. Since the cloud needs to query the total sum of the bus cost, fields related to time and cost need to be reserved: pickup _ datatime, dropoff _ datatime, fare _ estimate, subcharge, mta _ tax, tip _ estimate, tolls _ estimate, and total _ estimate.

In the application scenario of this embodiment, the system architecture is as shown in fig. 2, and includes a plurality of terminal devices (smart phones) and edge devices composed of a plurality of PCs, and the switch and the high-speed network are used to implement mutual communication. The edge device comprises a plurality of input layer nodes and an output layer node, each input layer node is in network connection with a plurality of terminal devices, receives the characteristic data sent by the terminal devices, aggregates the characteristic data, and adds disturbance noise (differential disturbance and differential privacy disturbance). The output layer nodes are connected with the input layer nodes and used for receiving the data connected with the input layers and added with the disturbance noise, performing aggregation and feature reconstruction, and outputting the reconstructed data after the feature reconstruction so as to provide the reconstructed data for other equipment and systems (such as cloud). The terminal device and the input layer node of the edge device are in a many-to-one relationship, that is, one terminal device corresponds to one input layer node, and one input layer node corresponds to a plurality of terminal devices. And data stream transmission is carried out between the terminal equipment and the input layer node of the edge equipment and between the input layer node and the output layer node of the edge equipment.

In the application scenario of this embodiment, the terminal device has data acquisition and feature extraction functions in software, and transmits feature data to the input layer node of the edge device by calling the API of the edge device platform. The edge device is composed of kafka in software to form a distributed computing framework, data are stored in kafka brokers, logic nodes of the edge device correspond to topic in kafka, corresponding tasks (task) are executed after data streams flow through topic, data stream aggregation is executed by an input layer node, differential privacy disturbance is added in a self-adaptive mode, and data stream aggregation and feature reconstruction are executed by an output layer node. Through the above processes, the data stream output by the edge device meets the definition of the differential privacy, and the transparency of the sensitive information to cloud analysis is ensured.

In the application scenario of the embodiment, the application of the self-encoder relates to the terminal device and the edge device, and the non-complete self-encoder is preferably adopted in view of the reduction of the data volume and the addition of the differential privacy disturbance. The encoder in the non-self-contained encoder can achieve the effect similar to Principal Component Analysis (Principal Component Analysis), and extract the main features in the data. In an embodiment of the present invention, it is preferable to use a non-self-contained encoder architecture as shown in fig. 3, where the encoder has 4 layers of neurons (not containing input layers) and the number of each layer of neurons is (6, 5, 3, 3), and the decoder has 4 layers of neurons (not containing input layers) and the number of each layer of neurons is (3, 4, 5, 8). The self-encoder training adopts an off-line training mode, namely, a data set is used for training the self-encoder in advance to obtain a trained incomplete self-encoder.

In the application scenario of this embodiment, the trained encoder neuron (i.e. encoder) of the incomplete self-encoder runs in the terminal device, and the decoder neuron (i.e. decoder) runs on the last logical node (i.e. output layer of the edge device) of the edge device as shown in fig. 2, so as to reconstruct the feature. By opening the encoder and decoder to the end device and the edge device, the amount of data transmitted can be reduced. In order to secure the user data, the present invention preferably employs adding disturbance noise for the feature data on the edge device when adding disturbance noise satisfying the differential privacy.

In the application scenario of this embodiment, after the reserved fields are determined, the non-complete auto-encoder needs to be trained, in order to enable the selected field data to be input into the auto-encoder for training, each field needs to be converted into a string with a fixed length of k bits, that is, in this embodiment, each field is aligned by using a hash algorithm to obtain a string with k bits, the fields of each message in the data set are hash-aligned and then form a new row of records, each aligned record is combined into a message matrix through matrix operation, each message matrix is also combined into a final training set matrix through matrix operation, and finally, the training set is input into the non-complete auto-encoder for training, where the loss function is L (x, g (f (z (x))), where L () usually adopts a mean square error function, and g (-) is a decoder, f (-) is the encoder and z (-) is the hash alignment operation. The encoder in the trained self-encoder runs in each terminal device, namely each terminal device has a copy of the neural network model of the encoder, and the decoder runs in the edge device, namely, only one copy of the decoder runs on the edge device.

In the application scenario of this embodiment, as shown in fig. 2, the terminal device is a smart phone, and the steps of data acquisition, feature extraction, and the like are all implemented by software. The whole data acquisition and feature extraction process of the smart phone takes a preset acquisition time window as a unit, and the acquisition time windows among different mobile phones are asynchronously executed, namely, communication coordination is not needed among the mobile phones. The specific process is as follows: for a certain mobile phone, in an acquisition time window, data to be acquired is continuously acquired and cached at a smaller time interval, and only relevant fields needing to be reserved are cached when the data are acquired. Considering the processing performance constraint of the mobile phone, in this embodiment, a batch processing mode is preferably adopted, the cached data is processed according to the batch, when the cached data amount reaches a batch size, the hash alignment is immediately performed on the data of the batch, and the aligned data is input into the encoder neural network extraction feature; and when the acquisition time is equal to or exceeds the threshold value of the acquisition time window, carrying out hash alignment operation and feature extraction regardless of whether the amount of the remaining acquired data meets one batch, and finally, sending all feature data extracted in the current acquisition time window to the terminal equipment.

In the application scenario of the embodiment, the edge device is composed of a plurality of PCs, and is used for receiving feature data transmitted by different terminal devices, adding disturbance noise (differential disturbance and differential privacy disturbance) to the feature data to meet the differential privacy, and finally reconstructing the feature data to facilitate subsequent analysis of the cloud. Since the edge device is not a high-performance computer in the cloud, there is a limit in performance and storage capacity. Therefore, in the embodiment, the edge device adopts a distributed computing framework, that is, a kafka data stream processing framework is deployed on a plurality of PCs, the kafka framework is based on a zookeeper framework, the zookeeper framework is a centralized service and is used for maintaining configuration information, naming, providing distributed synchronization and providing group services, the kafka can realize distributed storage and redundant backup of data by using the zookeeper, the redundant number of data can be set by using a zookeeper configuration file, the problem of storage capacity limitation of a single device is solved, and the problem caused by device performance limitation is well solved by using the kafka to realize data stream distributed processing. In this embodiment, the edge device implements distributed computation and stream data processing by using a kafka data stream framework, as shown in fig. 2, kafka topic corresponds to logical nodes of the edge device one to one, a node in the edge device for receiving a data stream of a terminal device is an input layer node, and is used for adding aggregation and differential privacy disturbance to the data stream, a node in the edge device for outputting data is an output layer node, and is used for aggregating data streams output by the input layer node, and is responsible for aggregation of the data streams and reconstruction of feature data, similar to the terminal device, the input layer node and the output layer node each have their own corresponding time windows, that is, asynchronous and identical time windows are provided between the input layer nodes, that is, first time windows of the input layer nodes are identical but not synchronously executed, only one output layer node is provided, and a second time window of the output layer node is independent from the input layer node, i.e. the first time window and the second time window are independent of each other. It should be noted that the edge device shown in fig. 2 is not a physical architecture, but a logical architecture, that is, a distributed data stream processing platform is physically formed by multiple PCs together, and does not have the hierarchical architecture shown in fig. 2.

In the application scenario of this embodiment, a plurality of terminal devices (smartphones) wirelessly transmit data after feature extraction to topic corresponding to an input layer node of an edge device through a kafka producer api interface, each logic node of the input layer continuously receives and caches a feature data stream sent from the smartphone in a first time window, the feature data streams of different smartphones are extracted, aggregated and cached by using the kafka streams api, and when the time spent in the process is equal to or greater than the first time window threshold, data reconstructed subsequently is enhancedAccording to a formula, calculating the self-adaptive disturbance noise budget of the characteristic value received by the current logic node_iAdding disturbance noise into the cached data, combining the data with the disturbance noise into a new data stream, and then transmitting the new data stream to the topic corresponding to the node of the output layer; and the output layer node also aggregates the feature data streams after the disturbance is added to different logic nodes of the input layer in a second time window of the output layer node, obtains and caches specific data in the data streams by utilizing the kafka consumer api, converts the cached data in the second time window into a matrix form to be input into a decoder neural network of a training model before for feature reconstruction when the time spent in the process is equal to or greater than a second time window threshold value, and finally outputs the matrix form to a remote cloud server for data analysis.

In the application scenario of this embodiment, how to add the differential privacy disturbance noise affects the security and usability of the feature reconstructed data. In the prior art, the same disturbance is added to each feature value, however, the reality shows that the contribution of each feature value to the decoder output is not the same, so that in the embodiment, the disturbance noise is added by using an adaptive algorithm, under the condition of ensuring security (fixing the total privacy budget), as many disturbances as possible are added to the features with small influence on the feature reconstruction data, and as few disturbances as possible are added to the features with large influence, thereby improving the usability of the reconstruction data. By adopting the formulas of the formula (1) and the formula (2), disturbance noise is added to the characteristic data, and the safety of the data can be well ensured.

The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims

1. A data flow differential privacy protection method based on edge calculation is characterized in that:

s2, aggregating the characteristic data and adding disturbance noise, and specifically comprising the following steps: according to a preset first time window, the input layer nodes in the edge device aggregate the received characteristic data collected by the terminal devices in the first time window, calculate disturbance noise budgets of the characteristic data, and add disturbance noise to the characteristic data according to the disturbance noise budgets;

the encoder and the decoder are obtained by training the same self-encoder;

the disturbance noise budget is calculated and determined according to the formula (1):

2. The edge-computation-based data flow differential privacy protection method of claim 1, wherein:

in step S1, the feature data is acquired by the terminal device in one acquisition time window by using a preset acquisition time window as a unit, and is obtained after feature extraction is performed by a preset encoder.

3. The edge-computation-based data flow differential privacy protection method of claim 2, wherein: adding disturbance noise to the feature data according to equation (2):

f_i'＝f_i+Lap(Δh₀/_i) (2)

4. The edge-computation-based data flow differential privacy protection method of claim 3, wherein: step S3 specifically includes: and the output layer node in the edge device receives and aggregates the characteristic data provided by the input layer node after disturbance noise is added according to a preset second time window, and performs characteristic reconstruction on the characteristic data received in the second time window through a preset decoder to obtain reconstructed data.

5. A data flow differential privacy protection system based on edge computation, characterized by: comprising an edge device for: receiving characteristic data acquired by terminal equipment and subjected to characteristic extraction through a preset encoder; aggregating the characteristic data and adding disturbance noise, specifically comprising: the edge device comprises an input layer node, wherein the input layer node is used for aggregating the received characteristic data collected by the terminal devices in a first time window according to a preset first time window, calculating disturbance noise budget of each characteristic data, and adding disturbance noise to the characteristic data according to the disturbance noise budget; performing feature reconstruction on the feature data added with the disturbance noise through a preset decoder to obtain reconstructed data; the encoder and the decoder are obtained by training the same self-encoder;

6. The edge-computation-based data-flow differential privacy protection system of claim 5, wherein: adding disturbance noise to the feature data according to equation (2):

f_i'＝f_i+Lap(Δh₀/_i) (2)

7. The edge-computation-based data-flow differential privacy protection system of claim 6, wherein: the edge device comprises an output layer node, wherein the output layer node is used for receiving and aggregating the characteristic data which is provided by the input layer node and added with the disturbance noise according to a preset second time window, and performing characteristic reconstruction on the characteristic data received in the second time window through a preset decoder to obtain reconstructed data.

8. The edge-computation-based data stream differential privacy protection system of any one of claims 5 to 7, wherein:

the edge device is used for acquiring data according to a preset acquisition time window as a unit, performing characteristic extraction on the data in the acquisition time window according to a preset encoder to obtain characteristic data and providing the characteristic data for the edge device.