CN109495476A - A kind of data flow difference method for secret protection and system based on edge calculations - Google Patents
A kind of data flow difference method for secret protection and system based on edge calculations Download PDFInfo
- Publication number
- CN109495476A CN109495476A CN201811379012.4A CN201811379012A CN109495476A CN 109495476 A CN109495476 A CN 109495476A CN 201811379012 A CN201811379012 A CN 201811379012A CN 109495476 A CN109495476 A CN 109495476A
- Authority
- CN
- China
- Prior art keywords
- characteristic
- time window
- current
- preset
- turbulent noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/04—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
- H04L63/0428—Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/06—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
- H04L9/0643—Hash functions, e.g. MD5, SHA, HMAC or f9 MAC
Abstract
The invention discloses a kind of data flow difference method for secret protection and system based on edge calculations, method include: the characteristic that the reception of S1. edge device is acquired by terminal device, and obtained after preset encoder carries out feature extraction;S2. it polymerize the characteristic and adds turbulent noise;S3. feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, obtains reconstruct data;The encoder and the decoder are the encoder and decoder obtained after being trained to same self-encoding encoder.It is delayed with service response small, the advantages that service quality is high, and throughput of system is high, and the computational load of each edge device is small, and the volume of transmitted data between user and edge device is small, and secret protection degree is high.
Description
Technical field
The present invention relates to edge calculations field more particularly to a kind of data flow difference secret protection sides based on edge calculations
Method and system.
Background technique
With the arrival of information age, IT industry is fast-developing.Internet is as information with fastest developing speed
One of technical industry can provide diversified service for user, it has also become indispensable a part in each field.With mutual
The explosive increase of networked terminals device category increased with quantity and user are to service quality (Quality of
Service, QoS) and diversified demand significant growth, nowadays Internet is also faced with lot of challenges.Wherein, how to handle
Mass data, the safety for how guaranteeing the real-time serviced and how ensuring user, are three faced in Internet
Significant challenge.
Cloud computing is calculated by centralization as a kind of calculating mode based on Internet and storage offer can expand on demand
The service of exhibition.However, with the growth of terminal device and data volume and the fast development of ubiquitous network network technology, it will be in calculating
It passes to cloud not only to need to occupy a large amount of network bandwidth, but also also adds the delay of service request and response, especially to prolonging
When sensitive application support aspect, the cloud computing characterized by centralization calculates has been difficult to meet the hair of these technologies and application
Exhibition demand.Thus edge calculations have been promoted, mist calculating etc. calculates the rise of mode.It is, in principle, that edge calculations and mist calculate
With similar thought, purpose is all to make to calculate closer to user, i.e., extends cloud computing from the large-scale data center of centralization
To from the closer network edge of user distance, the disadvantages of with this network bottleneck and high latency to overcome previous cloud computing, improve
The service request response speed and user experience of terminal user.Technically, mist calculates and edge calculations pass through apart from user
The calculating pressure of cloud is alleviated at closer network edge deployment-specific server or middle-size and small-size calculating center, improves user service
QoS.The demand of user can be better meet under the scene that data volume is big, requirement of real-time is high using edge calculations.
In previous cloud computing, data need first to store to cloud to be handled again, be will increase service request in this way and is rung
The time answered.If also using the mode for first storing reprocessing in edge calculations, although utilizing " closer to the calculating of user "
The response time can be shortened, but be not still a good solution under the higher scene of requirement of real-time.Cause
This will greatly shorten the service request response time, this mode is just if data can be handled in data transmission procedure
It is the real-time stream solution based on edge calculations.Kafka is as a distributive type processing platform (a
Distributed streaming platform), real-time stream processing capacity can be provided for edge device.It has three
A key characteristic: (1) it can issue and subscribe to flow data;(2) flow data can be securely stored in distribution, can be answered
In the cluster of system and fault tolerant mechanism;(3) flow data of arrival can be handled in time.These three characteristics are a Stream Processing platforms
Institute is prerequisite.In Kafka, topic is abstracted to one group of message, or perhaps the classification to message.Common
In producer consumer model, Producer can send a message to a topic, these message are stored in referred to as
In the Kafka server of brokers, subsequent Consumer can subscribe to the topic and consume these message from brokers.
Although the real-time processing using edge calculations and data flow can bring benefit, edge calculations for the analysis of data
Severe safety problem is also faced with as other traditional calculations modes.Such as in mobile application, many online services are relied on
In the personal data collected from user, the practicability of mobile application is can be enhanced in these data, provides personalized clothes for user
Business, such as advertisement pushing, purchase preference etc., but these personal data can be equally used to be inferred to user by malicious attacker
Sensitive information, such as gender deduction, location tracking, speaker's identity identify etc..From the point of view of user, user wishes to expose
More fewer better, i.e., as the few as possible collection users personal data of privacy information.From the point of view of ISP, it is desirable to collect
More users personal datas provide better service.Obviously, there is essential contradictions between the two.Therefore, such as
The availability of information is collected in what tradeoff and the safety of privacy of user is one and needs the problem of carefully considering.
The technology used in the scheme of existing protection privacy of user specifically includes that anonymization processing, data conversion, data add
Close and difference privacy etc., even if scheme still has following deficiency at present using these technologies:
1, currently, still can not even if edge calculations make a part calculate the edge device for transferring to close user from cloud
Meet the service of high real-time requirements.
2, edge calculations are faced with the problem of safety, i.e. the data of edge device processing are related to availability of data and privacy
The contradiction of safety.
3, centralized data cleansing (going privacy) the limitation throughput of system that secret protection mode uses mostly at present, nothing
Method meets the needs of low delay service.
4, there are contradictions between edge device computing capability and security strategy.
Summary of the invention
The technical problem to be solved in the present invention is that, for technical problem of the existing technology, the present invention provides one
Kind service response delay is small, and service quality is high, and throughput of system is high, and the computational load of each edge device is small, user and edge
Volume of transmitted data between equipment is small, the high data flow difference method for secret protection based on edge calculations of secret protection degree and is
System.
In order to solve the above technical problems, technical solution proposed by the present invention are as follows: a kind of data flow based on edge calculations is poor
Divide method for secret protection, comprising:
S1. edge device reception is acquired by terminal device, and the spy obtained after preset encoder carries out feature extraction
Levy data;
S2. it polymerize the characteristic and adds turbulent noise;
S3. feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, obtains weight
Structure data;
The encoder and the decoder are the encoder and decoder obtained after being trained to same self-encoding encoder.
Further, it is single that characteristic described in step S1, which is the terminal device according to preset acquisition time window,
Position, collects in an acquisition time window, and the characteristic by obtaining after the progress feature extraction of preset encoder.
Further, step S2 is specifically included: the input layer in the edge device is according to preset first time
Window, and the characteristic by each terminal device acquisition received in the first time window is polymerize,
And the turbulent noise budget of each characteristic is calculated, it is that characteristic addition disturbance is made an uproar according to the turbulent noise budget
Sound.
Further, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute
The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window
When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer
Current first time window in i-th of input feature vector average degree of correlation, i.e., the point centered on current signature calculates adjacent spy
Average Euclidean distance between sign, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiFor
The privacy budget of i-th of input feature vector in the current first time window of current input layer.
Further, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0/εi) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark
Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th
The privacy budget of a input feature vector.
Further, step S3 is specifically included: the output node layer in the edge device is according to preset second time
Window reception and the characteristic polymerizeing after the addition turbulent noise that the input layer provides, and pass through preset decoder
Feature reconstruction is carried out to the characteristic received in second time window, obtains reconstruct data.
A kind of data flow difference intimacy protection system based on edge calculations, including edge calculations equipment, are used for: receive by
Terminal device acquisition, and the characteristic obtained after preset encoder carries out feature extraction;It polymerize the characteristic simultaneously
Add turbulent noise;Feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, is obtained
Reconstruct data;The encoder and the decoder are the encoder obtained after being trained to same self-encoding encoder and decoding
Device.
Further, the edge device includes input layer, and the input layer is used for according to preset first
Time window, and the characteristic by each terminal device acquisition received in the first time window is gathered
It closes, and calculates the turbulent noise budget of each characteristic, be characteristic addition disturbance according to the turbulent noise budget
Noise.
Further, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute
The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window
When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer
The average degree of correlation of i-th of input feature vector, i.e., the point centered on current signature calculate adjacent feature in current first time window
Between average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiTo work as
The privacy budget of i-th of input feature vector in the current first time window of preceding input layer.
Further, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0/εi) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark
Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th
The privacy budget of a input feature vector.
Further, the edge device includes output node layer, and the output node layer is used for according to preset second
Time window receives and polymerize the characteristic after the addition turbulent noise that the input layer provides, and passes through preset solution
Code device carries out feature reconstruction to the characteristic received in second time window, obtains reconstruct data.
It further, further include terminal device, the terminal device is used to according to preset acquisition time window be unit
Data are acquired, and the data in the acquisition time window are subjected to feature extraction, obtained feature according to preset encoder
Data, and it is supplied to the edge device.
Compared with the prior art, the advantages of the present invention are as follows:
1, the present invention acquires data according to acquisition time window by setting acquisition time window on the terminal device,
Feature extraction is carried out, and sends edge device to and carries out subsequent processing, the input layer of edge device is according to first time window
Mouth accesses characteristic transmitted by the terminal device of the node to receive, and is the addition of each characteristic by adaptive algorithm
Turbulent noise, the output node layer of edge device receive the spy after input layer addition turbulent noise according to the second time window
Data are levied, and are reconstructed to obtain reconstruct data by decoder, reconstruct data are supplied to other systems and are used, after reconstruct
Data will be unable to obtain the sensitive information of user, in this way, can effectively reduce the response delay of edge device, mention
High service quality, is effectively protected the privacy of user.
2, edge device of the invention has multiple input layers, and each input layer and multiple terminal devices connect
It connects, the characteristic of the terminal device accessed is handled, by using this distributed processing mode, improve side
The throughput of system of edge equipment reduces the computational load of the input layer of each edge device, also ensures entire edge
Computing system it is stable.
3, collected data are aligned by terminal device of the invention through Hash, then carry out feature by the encoder of itself
After extraction to, then characteristic is sent to the input layer of edge device, reduces the terminal device and edge device of user
Input layer between volume of transmitted data, reduce the waste of network bandwidth;Also, encoder and decoder be it is same from
Two parts in encoder, reload after training in advance to terminal device, do not need terminal device and carry out to encoder
Training, also reduces the requirement to the processing capacity of terminal device.
Detailed description of the invention
Fig. 1 is the flow diagram of the specific embodiment of the invention.
Fig. 2 is the system architecture schematic diagram of the specific embodiment of the invention.
Fig. 3 is the self-encoding encoder configuration diagram of the specific embodiment of the invention.
Specific embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and
It limits the scope of the invention.
As shown in Figure 1, the data flow difference method for secret protection based on edge calculations of the present embodiment, comprising: the edge S1.
Equipment reception is acquired by terminal device, and the characteristic obtained after preset encoder carries out feature extraction;S2. it polymerize institute
It states characteristic and adds turbulent noise;S3. by preset decoder to it is described addition turbulent noise after characteristic into
Row feature reconstruction obtains reconstruct data;The encoder and the decoder are obtained after being trained to same self-encoding encoder
Encoder and decoder.
In the present embodiment, characteristic described in step S1 is the terminal device according to preset acquisition time window
It for unit, collects in an acquisition time window, and the feature by being obtained after the progress feature extraction of preset encoder
Data.Step S2 is specifically included: the input layer in the edge device, and will be described according to preset first time window
The received characteristic by each terminal device acquisition is polymerize in first time window, and calculates each characteristic
According to turbulent noise budget, according to the turbulent noise budget be the characteristic add turbulent noise.
In the present embodiment, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute
The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window
When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer
The average degree of correlation of i-th of input feature vector, i.e., the point centered on current signature calculate adjacent feature in current first time window
Between average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiTo work as
The privacy budget of i-th of input feature vector in the current first time window of preceding input layer.
In the present embodiment, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0/εi) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark
Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th
The privacy budget of a input feature vector.
In the present embodiment, step S3 is specifically included: the output node layer in the edge device is according to preset second
Time window receives and polymerize the characteristic after the addition turbulent noise that the input layer provides, and passes through preset solution
Code device carries out feature reconstruction to the characteristic received in second time window, obtains reconstruct data.
A kind of data flow difference intimacy protection system based on edge calculations, including edge calculations equipment, are used for: receive by
Terminal device acquisition, and the characteristic obtained after preset encoder carries out feature extraction;It polymerize the characteristic simultaneously
Add turbulent noise;Feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, is obtained
Reconstruct data;The encoder and the decoder are the encoder obtained after being trained to same self-encoding encoder and decoding
Device.
In the present embodiment, the edge device includes input layer, and the input layer is used for according to preset
First time window, and by the characteristic by each terminal device acquisition received in the first time window into
Row polymerization, and the turbulent noise budget of each characteristic is calculated, it is characteristic addition according to the turbulent noise budget
Turbulent noise.
Further, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute
The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window
When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer
The average degree of correlation of i-th of input feature vector, i.e., the point centered on current signature calculate adjacent feature in current first time window
Between average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiTo work as
The privacy budget of i-th of input feature vector in the current first time window of preceding input layer.
In the present embodiment, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0/εi) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark
Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th
The privacy budget of a input feature vector.
In the present embodiment, the edge device includes output node layer, and the output node layer is used for according to preset
Second time window receives and polymerize the characteristic after the addition turbulent noise that the input layer provides, and by default
Decoder feature reconstruction is carried out to the characteristic that receives in second time window, obtain reconstruct data.
It in the present embodiment, further include terminal device, the terminal device is for being according to preset acquisition time window
Unit acquires data, and the data in the acquisition time window are carried out feature extraction according to preset encoder, obtains
Characteristic, and it is supplied to the edge device.
In the present embodiment, using city call a taxi application scenarios 10000 real data as carrying out for experimental data
Illustrate, experimental data includes 17 fields: medallion (with vehicle binding logo md5 value), hack_license are (with taxi
The md5 value of driving license binding logo), pickup_datetime (passenger loading time), dropoff_datetime (passenger getting off car
Time), trip_time_in_secs (riding time), trip_distance (running distance), fare_amount (expense gold
Volume), surcharge (surcharge), mta_tax (tax), tip_amount (tip), tolls_amount (pass cost),
The purpose of total_amount (all total costs) etc., the inquiry in cloud is expense summation of riding in each time window of statistics.
Need to inquire due to cloud is expense summation by bus, it is therefore desirable to be retained with time and costs related field: pickup_
datetime、dropoff_datetime、fare_amount、surcharge、mta_tax、tip_amount、tolls_
Amount and total_amount.
In the application scenarios of the present embodiment, system architecture is as shown in Fig. 2, include multiple terminal devices (smart phone)
With the edge device being made of multiple pc machine, it is in communication with each other using interchanger and the realization of high speed cable.Edge device includes multiple
Input layer and an output node layer, each input layer and multiple terminal devices are connected to the network, receiving terminal apparatus
The characteristic of transmission, and characteristic is polymerize, and add turbulent noise (difference disturbance, the disturbance of difference privacy).It is defeated
Node layer is connect with input layer out, for receiving the data after turbulent noise is added in each input layer connection, and is polymerize
And feature reconstruction, and the reconstruct data after feature reconstruction are exported, to be supplied to other equipment, system (such as cloud) uses.Terminal
Equipment and edge device input layer are many-to-one relationships, i.e. the corresponding input layer of a terminal device, one
Input layer corresponds to multiple terminal devices.Between terminal device and the input layer of edge device and edge device
The mode that data flow is all made of between input layer and output node layer is transmitted.
In the application scenarios of the present embodiment, terminal device has data acquisition and feature extraction functions on software, leads to
Cross the input layer for calling the API transmission feature data of edge device platform to edge device.Edge device on software by
Kafka forms distributed computing framework, and wherein data are stored in kafka brokers, and the logical node of edge device is corresponding
Topic in kafka, data flow just execute corresponding task (task) after flowing through topic, and it is poly- that input layer executes data flow
Merge adaptive addition difference privacy disturbance, output node layer executes data flow polymerization and feature reconstruction.Pass through above mistake
The data flow of journey, edge device output meets the definition of difference privacy, it is ensured that the transparency that sensitive information analyzes cloud.
In the application scenarios of the present embodiment, the application of self-encoding encoder is related to terminal device and edge device, concerning data
The addition of the reduction of amount and the disturbance of difference privacy, it is preferred to use non-complete self-encoding encoder.Encoder in non-complete self-encoding encoder
It can achieve the effect that similar principal component analysis (Principal Component Analysis), extract the main spy in data
Sign.In embodiments of the present invention, it is preferred to use non-complete self-encoding encoder framework as shown in Figure 3, wherein encoder has 4 layers
Neuron (does not include input layer), and the number of every layer of neuron is (6,5,3,3), and there are decoder 4 layers of neuron (not include defeated
Enter layer), the number of every layer of neuron is (3,4,5,8).The training of self-encoding encoder uses off-line training mode, i.e., uses data in advance
Collection is trained self-encoding encoder, obtains trained non-complete self-encoding encoder.
In the application scenarios of the present embodiment, the encoder neuron (i.e. encoder) of trained non-complete self-encoding encoder
It operates in terminal device, decoder neuron (i.e. decoder) operates in the last one logic section of the edge device such as Fig. 2
On point (i.e. the output layer of edge device), for feature to be reconstructed.Terminal is placed on by separating encoder and decoder
Equipment and edge device can reduce the data volume of transmission.In order to protect the safety of user data, meet difference privacy in addition
Turbulent noise when, present invention preferably uses be characterized on edge device data addition turbulent noise.
In the application scenarios of the present embodiment, after reserved field has been determined, needs to train non-complete self-encoding encoder, be
Enable the field data of selection to input self-encoding encoder to be trained, needing for each field to be converted into regular length is k ratio
Special string is in the present embodiment aligned each field using hash algorithm to obtain the string of k bit, every in data set
New record, every record by alignment pass through matrix operation group to group to the field of message in a row after Hash alignment
A message matrix is synthesized, every message matrix is combined into final training set matrix also by matrix operation, finally by the instruction
Practice the above-mentioned non-complete self-encoding encoder of collection input to be trained, loss function is L (x, g (f (z (x)))), and wherein L () is usually adopted
With mean square deviation function, g () is decoder, and f () is encoder, and z () is Hash alignment operation.The self-encoding encoder trained
In encoder operate in each terminal device, i.e., each terminal device has the pair of an encoder neural network model
This, decoder operates in edge device, i.e., one and only one decoder copy operates on edge device.
In the present embodiment application scenarios, as shown in Fig. 2, terminal device is smart phone, data acquisition and feature are taken out
Take and etc. by software realization.The entire data of smart phone acquire and feature extraction process is with preset acquisition time
Window is unit, and the acquisition time window between different mobile phones is asynchronous execution, i.e., is not necessarily to communication-cooperation between each mobile phone.Tool
The process of body are as follows: for some mobile phone, in an acquisition time window, to the data that acquire of needs with a lesser time
Interval spans are acquired and are cached, and only caching needs the relevant field retained when acquiring data.In view of the process performance system of mobile phone
About, in the present embodiment preferably by way of batch processing, the data of caching are handled according to batch, when the number of caching
As soon as reach a batch size according to amount, Hash alignment carried out to the data of this batch at once, and by the number after alignment
According to input coding device neural network extraction feature;When acquisition time equals or exceeds acquisition time window threshold value, regardless of remaining
Whether the data volume of acquisition meets a batch, all carries out Hash alignment operation and carries out feature extraction, finally, by currently adopting
All characteristics extracted in collection time window are sent to terminal device.
In the present embodiment application scenarios, edge device is made of multiple pc machine, and the effect of edge device is to receive difference
The characteristic that terminal device transmits, and to characteristic addition turbulent noise (difference disturbs, the disturbance of difference privacy)
To meet difference privacy, characteristic is finally reconstructed to the subsequent analysis so as to cloud.Since edge device is not the high property in cloud
Energy computer, therefore limited in performance and memory capacity.For this purpose, in the present embodiment, edge device is using distributed
Computational frame disposes kafka Data Stream Processing frame that is, on multiple pc machine, and kafka frame is based on zookeeper frame, and
Zookeeper frame is a centralized service, for safeguarding configuration information, name, providing distributed synchronization and offer group
The distributed storage and redundancy backup of data may be implemented using zookeeper, configured by zookeeper by service, kafka
File can set redundancy number of data etc., solve the problems, such as the limitation of single equipment memory capacity, and kafka is utilized to realize
The data flow distributed treatment problem that then very good solution equipment performance restriction band is come.In the present embodiment, edge device is utilized
Kafka data flow framework realizes distributed computing and flow data processing, as shown in Fig. 2, kafka topic and edge device
Logical node corresponds, and the node that receiving terminal apparatus data flow is used in edge device is input layer, is used for logarithm
According to the addition that the polymerization of stream and difference privacy disturb, the node of edge device output data is output node layer, defeated for polymerizeing
Enter the data flow of node layer output, is responsible for the polymerization of data flow and the reconstruct of characteristic, input layer section similar with terminal device
Point and output node layer respectively have between oneself corresponding time window namely input layer and have the asynchronous and identical time
The first time window of window, i.e. input layer is all the same, but and asynchronous execution, output node layer only one, and
Export the second time window of node layer and input layer is unrelated namely first time window and the second time window it is mutually only
It is vertical.It should be noted that edge device shown in Fig. 2 is not physical structure, but logical architecture, i.e., it is physically multiple pc
Machine has collectively constituted distributed traffic processing platform, does not have layer architecture shown in Fig. 2.
In the application scenarios of the present embodiment, more terminal devices (smart phone) are connect by kafka producer api
The corresponding topic of input layer that data after feature extraction are wirelessly transmitted to edge device by mouth, each of input layer are patrolled
It collects node and persistently receives the characteristic data flow sent from smart phone and caching in a first time window, utilize kafka
Streams api, which extracts the characteristic data flow of different intelligent mobile phone, to be polymerize and caches, be equal to when the time that this process is spent or
When greater than first time window threshold value, in order to enhance the availability that later reconstitution comes out data, it is calculated from the formula and currently patrols
Collect the adaptive turbulent noise budget ε for the characteristic value that node receivesiAnd turbulent noise is added into the data of caching, it makes an uproar adding
Data be combined into new data flow, be then transmitted to the corresponding topic of output node layer;Node layer is exported equally certainly
The characteristic data flow after the addition disturbance of input layer Different Logic node is polymerize in oneself the second time window, utilizes kafka
Consumer api obtains the specific data in data flow and caches, and is equal to or more than for the second time when the time that this process is spent
When window threshold value, the data cached in current second time window are converted to the decoding of training pattern before matrix form input
Feature reconstruction is carried out in device neural network, finally, being output to progress data analysis in the Cloud Server of distal end.
In the application scenarios of the present embodiment, difference privacy turbulent noise how is added and affects feature reconstruction coming out data
Safety and availability.Common practice is that identical disturbance is added to each characteristic value in the prior art, however real feelings
The contribution that condition shows that not each characteristic value exports decoder is identical, therefore, uses adaptive algorithm in the present embodiment
Turbulent noise is added, under conditions of guaranteeing safety (fixed total privacy budget), feature reconstruction data influence is contributed small
Feature adds disturbance as much as possible, and adds disturbance as small as possible to big feature is influenced, and improve reconstruct data can
The property used.By using the formula of above-mentioned formula (1) and formula (2), turbulent noise is added to characteristic, can be very good to guarantee data
Safety.
Above-mentioned only presently preferred embodiments of the present invention, is not intended to limit the present invention in any form.Although of the invention
It has been disclosed in a preferred embodiment above, however, it is not intended to limit the invention.Therefore, all without departing from technical solution of the present invention
Content, technical spirit any simple modifications, equivalents, and modifications made to the above embodiment, should all fall according to the present invention
In the range of technical solution of the present invention protection.
Claims (12)
1. a kind of data flow difference method for secret protection based on edge calculations, it is characterised in that:
S1. edge device reception is acquired by terminal device, and the characteristic obtained after preset encoder carries out feature extraction
According to;
S2. it polymerize the characteristic and adds turbulent noise;
S3. feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, obtains reconstruct number
According to;
The encoder and the decoder are the encoder and decoder obtained after being trained to same self-encoding encoder.
2. the data flow difference method for secret protection according to claim 1 based on edge calculations, it is characterised in that:
It according to preset acquisition time window is unit that characteristic described in step S1, which is the terminal device, is collected
In one acquisition time window, and the characteristic by being obtained after the progress feature extraction of preset encoder.
3. the data flow difference method for secret protection according to claim 2 based on edge calculations, it is characterised in that: step
S2 is specifically included: the input layer in the edge device is according to preset first time window, and by the first time
The received characteristic by each terminal device acquisition is polymerize in window, and calculates the disturbance of each characteristic
Noise budget is that the characteristic adds turbulent noise according to the turbulent noise budget.
4. the data flow difference method for secret protection according to claim 3 based on edge calculations, it is characterised in that: described
Turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkThe end connected by current input layer
End equipment number, εkFor the privacy budget of current input layer, βiIndicate that each feature is when current in current first time window
Shared ratio in the privacy budget of input layer, d indicate the dimension of feature,Indicate current the of current input layer
The average degree of correlation of i-th of input feature vector in one time window, i.e., the point centered on current signature, calculates between adjacent feature
Average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiCurrently to input
The privacy budget of i-th of input feature vector in the current first time window of node layer.
5. the data flow difference method for secret protection according to claim 4 based on edge calculations, it is characterised in that: according to
Formula (2) is that the characteristic adds turbulent noise:
fi'=fi+Lap(Δh0/εi) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark is complete
Office's susceptibility, Lap () are laplacian distribution, εiIt is defeated for i-th in the current first time window of current input layer
Enter the privacy budget of feature.
6. the data flow difference method for secret protection according to claim 5 based on edge calculations, it is characterised in that: step
S3 is specifically included: the output node layer in the edge device receives according to preset second time window and polymerize the input
Characteristic after the addition turbulent noise that node layer provides, and it is inscribed to second time window by preset decoder
The characteristic received carries out feature reconstruction, obtains reconstruct data.
7. a kind of data flow difference intimacy protection system based on edge calculations, it is characterised in that: including edge calculations equipment, use
In: the characteristic for receiving and being acquired by terminal device, and obtained after preset encoder carries out feature extraction;It polymerize the spy
Sign data simultaneously add turbulent noise;Feature weight is carried out to the characteristic after the addition turbulent noise by preset decoder
Structure obtains reconstruct data;The encoder and the decoder are the encoders obtained after being trained to same self-encoding encoder
And decoder.
8. the data flow difference intimacy protection system according to claim 7 based on edge calculations, it is characterised in that: described
Edge device includes input layer, and the input layer is used for according to preset first time window, and by described first
The received characteristic by each terminal device acquisition is polymerize in time window, and calculates each characteristic
Turbulent noise budget is that the characteristic adds turbulent noise according to the turbulent noise budget.
9. the data flow difference intimacy protection system according to claim 8 based on edge calculations, it is characterised in that: described
Turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkThe end connected by current input layer
End equipment number, εkFor the privacy budget of current input layer, βiIndicate that each feature is when current in current first time window
Shared ratio in the privacy budget of input layer, d indicate the dimension of feature,Indicate current the of current input layer
The average degree of correlation of i-th of input feature vector in one time window, i.e., the point centered on current signature, calculates between adjacent feature
Average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiCurrently to input
The privacy budget of i-th of input feature vector in the current first time window of node layer.
10. the data flow difference intimacy protection system according to claim 9 based on edge calculations, it is characterised in that: root
It is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0/εi) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark is complete
Office's susceptibility, Lap () are laplacian distribution, εiIt is defeated for i-th in the current first time window of current input layer
Enter the privacy budget of feature.
11. the data flow difference intimacy protection system according to claim 10 based on edge calculations, it is characterised in that: institute
Stating edge device includes output node layer, and the output node layer according to preset second time window for receiving and polymerizeing institute
Characteristic after the addition turbulent noise of input layer offer is provided, and by preset decoder to second time window
The characteristic received in mouthful carries out feature reconstruction, obtains reconstruct data.
12. special according to the described in any item data flow difference intimacy protection systems based on edge calculations of claim 7 to 11
Sign is:
It further include terminal device, the terminal device is used to be unit acquisition data according to preset acquisition time window, and will
Data in the acquisition time window carry out feature extraction, obtained characteristic according to preset encoder, and are supplied to
The edge device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811379012.4A CN109495476B (en) | 2018-11-19 | 2018-11-19 | Data stream differential privacy protection method and system based on edge calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811379012.4A CN109495476B (en) | 2018-11-19 | 2018-11-19 | Data stream differential privacy protection method and system based on edge calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109495476A true CN109495476A (en) | 2019-03-19 |
CN109495476B CN109495476B (en) | 2020-11-20 |
Family
ID=65696894
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811379012.4A Active CN109495476B (en) | 2018-11-19 | 2018-11-19 | Data stream differential privacy protection method and system based on edge calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109495476B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110213036A (en) * | 2019-06-17 | 2019-09-06 | 西安电子科技大学 | Based on the storage of Internet of Things mist calculating-edge calculations secure data and calculation method |
CN110300159A (en) * | 2019-06-10 | 2019-10-01 | 华侨大学 | A kind of sensing cloud data safety low cost storage method based on edge calculations |
CN110443063A (en) * | 2019-06-26 | 2019-11-12 | 电子科技大学 | The method of the federal deep learning of self adaptive protection privacy |
CN111082997A (en) * | 2019-12-30 | 2020-04-28 | 西安电子科技大学 | Network function arrangement method based on service identification in mobile edge computing platform |
CN111222532A (en) * | 2019-10-23 | 2020-06-02 | 西安交通大学 | Edge cloud collaborative deep learning model training method with classification precision maintenance and bandwidth protection |
CN111401272A (en) * | 2020-03-19 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Face feature extraction method, device and equipment |
CN111914285A (en) * | 2020-06-09 | 2020-11-10 | 深圳大学 | Geographical distributed graph calculation method and system based on differential privacy |
CN112541593A (en) * | 2020-12-06 | 2021-03-23 | 支付宝(杭州)信息技术有限公司 | Method and device for jointly training business model based on privacy protection |
CN112541574A (en) * | 2020-12-03 | 2021-03-23 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting business prediction method and device |
CN114070950A (en) * | 2020-07-30 | 2022-02-18 | 北京市商汤科技开发有限公司 | Image processing method and related device and equipment |
CN116049840A (en) * | 2022-07-25 | 2023-05-02 | 荣耀终端有限公司 | Data protection method, device, related equipment and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358113A (en) * | 2017-06-01 | 2017-11-17 | 徐州医科大学 | Based on the anonymous difference method for secret protection of micro- aggregation |
CN108011948A (en) * | 2017-11-30 | 2018-05-08 | 成都航天科工大数据研究院有限公司 | A kind of industrial equipment integrated monitoring system based on edge calculations |
CN108093401A (en) * | 2017-12-13 | 2018-05-29 | 电子科技大学 | A kind of mobile intelligent terminal intimacy protection system and method based on edge calculations |
CN108234493A (en) * | 2018-01-03 | 2018-06-29 | 武汉大学 | The space-time crowdsourcing statistical data dissemination method of secret protection under insincere server |
US20180189164A1 (en) * | 2017-01-05 | 2018-07-05 | Microsoft Technology Licensing, Llc | Collection of sensitive data--such as software usage data or other telemetry data--over repeated collection cycles in satisfaction of privacy guarantees |
CN108280491A (en) * | 2018-04-18 | 2018-07-13 | 南京邮电大学 | A kind of k means clustering methods towards difference secret protection |
US20180307854A1 (en) * | 2017-04-25 | 2018-10-25 | Sap Se | Tracking privacy budget with distributed ledger |
CN108734217A (en) * | 2018-05-22 | 2018-11-02 | 齐鲁工业大学 | A kind of customer segmentation method and device based on clustering |
-
2018
- 2018-11-19 CN CN201811379012.4A patent/CN109495476B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180189164A1 (en) * | 2017-01-05 | 2018-07-05 | Microsoft Technology Licensing, Llc | Collection of sensitive data--such as software usage data or other telemetry data--over repeated collection cycles in satisfaction of privacy guarantees |
US20180307854A1 (en) * | 2017-04-25 | 2018-10-25 | Sap Se | Tracking privacy budget with distributed ledger |
CN107358113A (en) * | 2017-06-01 | 2017-11-17 | 徐州医科大学 | Based on the anonymous difference method for secret protection of micro- aggregation |
CN108011948A (en) * | 2017-11-30 | 2018-05-08 | 成都航天科工大数据研究院有限公司 | A kind of industrial equipment integrated monitoring system based on edge calculations |
CN108093401A (en) * | 2017-12-13 | 2018-05-29 | 电子科技大学 | A kind of mobile intelligent terminal intimacy protection system and method based on edge calculations |
CN108234493A (en) * | 2018-01-03 | 2018-06-29 | 武汉大学 | The space-time crowdsourcing statistical data dissemination method of secret protection under insincere server |
CN108280491A (en) * | 2018-04-18 | 2018-07-13 | 南京邮电大学 | A kind of k means clustering methods towards difference secret protection |
CN108734217A (en) * | 2018-05-22 | 2018-11-02 | 齐鲁工业大学 | A kind of customer segmentation method and device based on clustering |
Non-Patent Citations (2)
Title |
---|
CHUGUI XU: "Distilling at the Edge:A Local Differential Privacy Obfuscation Framework for IoT Data Analytics", 《IEEE COMMUNICATIONS MAGAZINE》 * |
兰丽辉: "基于向量模型的加权社会网络发布隐私保护方法研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110300159A (en) * | 2019-06-10 | 2019-10-01 | 华侨大学 | A kind of sensing cloud data safety low cost storage method based on edge calculations |
CN110300159B (en) * | 2019-06-10 | 2021-08-31 | 华侨大学 | Sensing cloud data safe low-cost storage method based on edge computing |
CN110213036B (en) * | 2019-06-17 | 2021-07-06 | 西安电子科技大学 | Safe data storage and calculation method based on fog calculation-edge calculation of Internet of things |
CN110213036A (en) * | 2019-06-17 | 2019-09-06 | 西安电子科技大学 | Based on the storage of Internet of Things mist calculating-edge calculations secure data and calculation method |
CN110443063A (en) * | 2019-06-26 | 2019-11-12 | 电子科技大学 | The method of the federal deep learning of self adaptive protection privacy |
CN110443063B (en) * | 2019-06-26 | 2023-03-28 | 电子科技大学 | Adaptive privacy-protecting federal deep learning method |
CN111222532A (en) * | 2019-10-23 | 2020-06-02 | 西安交通大学 | Edge cloud collaborative deep learning model training method with classification precision maintenance and bandwidth protection |
CN111222532B (en) * | 2019-10-23 | 2024-04-02 | 西安交通大学 | Training method for edge cloud collaborative deep learning model with classification precision maintenance and bandwidth protection |
CN111082997A (en) * | 2019-12-30 | 2020-04-28 | 西安电子科技大学 | Network function arrangement method based on service identification in mobile edge computing platform |
CN111082997B (en) * | 2019-12-30 | 2021-05-14 | 西安电子科技大学 | Network function arrangement method based on service identification in mobile edge computing platform |
CN111401272A (en) * | 2020-03-19 | 2020-07-10 | 支付宝(杭州)信息技术有限公司 | Face feature extraction method, device and equipment |
CN111401272B (en) * | 2020-03-19 | 2021-08-24 | 支付宝(杭州)信息技术有限公司 | Face feature extraction method, device and equipment |
CN113657352A (en) * | 2020-03-19 | 2021-11-16 | 支付宝(杭州)信息技术有限公司 | Face feature extraction method, device and equipment |
CN111914285B (en) * | 2020-06-09 | 2022-06-17 | 深圳大学 | Geographic distributed graph calculation method and system based on differential privacy |
CN111914285A (en) * | 2020-06-09 | 2020-11-10 | 深圳大学 | Geographical distributed graph calculation method and system based on differential privacy |
CN114070950A (en) * | 2020-07-30 | 2022-02-18 | 北京市商汤科技开发有限公司 | Image processing method and related device and equipment |
CN112541574A (en) * | 2020-12-03 | 2021-03-23 | 支付宝(杭州)信息技术有限公司 | Privacy-protecting business prediction method and device |
CN112541593A (en) * | 2020-12-06 | 2021-03-23 | 支付宝(杭州)信息技术有限公司 | Method and device for jointly training business model based on privacy protection |
CN116049840A (en) * | 2022-07-25 | 2023-05-02 | 荣耀终端有限公司 | Data protection method, device, related equipment and system |
CN116049840B (en) * | 2022-07-25 | 2023-10-20 | 荣耀终端有限公司 | Data protection method, device, related equipment and system |
Also Published As
Publication number | Publication date |
---|---|
CN109495476B (en) | 2020-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109495476A (en) | A kind of data flow difference method for secret protection and system based on edge calculations | |
CN110399742B (en) | Method and device for training and predicting federated migration learning model | |
CN106097019A (en) | Virtual objects packet transmission method, device and system | |
CN106657379A (en) | Implementation method and system for NGINX server load balancing | |
CN112464179B (en) | Short video copyright storage algorithm based on block chain and expression recognition | |
CN108874823A (en) | The implementation method and device of intelligent customer service | |
CN110516418A (en) | A kind of operation user identification method, device and equipment | |
CN108664914A (en) | Face retrieval method, apparatus and server | |
CN109104696B (en) | Track privacy protection method and system for mobile user based on differential privacy | |
CN106982356A (en) | A kind of distributed extensive video flow processing system | |
CN111125386B (en) | Media resource processing method and device, storage medium and electronic device | |
CN110210858A (en) | A kind of air control guard system design method based on intelligent terminal identification | |
CN106921658A (en) | A kind of router device safety protecting method and system | |
CN109598110A (en) | A kind of recognition methods of user identity and device | |
CN109214326A (en) | A kind of information processing method, device and system | |
WO2023000261A1 (en) | Regional traffic prediction method and device | |
CN111832661B (en) | Classification model construction method, device, computer equipment and readable storage medium | |
Hsiang et al. | Analysis of the effect of automotive ethernet camera image quality on object detection models | |
CN107729860A (en) | Recognition of face computational methods and Related product | |
CN115858182B (en) | Intelligent adaptation method and system applied to edge computing nodes of meta universe | |
CN107948312A (en) | A kind of information categorization dissemination method and system using location point as information entrance | |
CN111353093B (en) | Problem recommendation method, device, server and readable storage medium | |
CN112669353B (en) | Data processing method, data processing device, computer equipment and storage medium | |
CN113362852A (en) | User attribute identification method and device | |
CN110033049A (en) | For generating model, for the method and apparatus of output information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |