CN109495476A - A kind of data flow difference method for secret protection and system based on edge calculations - Google Patents

A kind of data flow difference method for secret protection and system based on edge calculations Download PDF

Info

Publication number
CN109495476A
CN109495476A CN201811379012.4A CN201811379012A CN109495476A CN 109495476 A CN109495476 A CN 109495476A CN 201811379012 A CN201811379012 A CN 201811379012A CN 109495476 A CN109495476 A CN 109495476A
Authority
CN
China
Prior art keywords
characteristic
time window
current
preset
turbulent noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811379012.4A
Other languages
Chinese (zh)
Other versions
CN109495476B (en
Inventor
张尧学
刘峻丞
任炬
胥楚贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201811379012.4A priority Critical patent/CN109495476B/en
Publication of CN109495476A publication Critical patent/CN109495476A/en
Application granted granted Critical
Publication of CN109495476B publication Critical patent/CN109495476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC

Abstract

The invention discloses a kind of data flow difference method for secret protection and system based on edge calculations, method include: the characteristic that the reception of S1. edge device is acquired by terminal device, and obtained after preset encoder carries out feature extraction;S2. it polymerize the characteristic and adds turbulent noise;S3. feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, obtains reconstruct data;The encoder and the decoder are the encoder and decoder obtained after being trained to same self-encoding encoder.It is delayed with service response small, the advantages that service quality is high, and throughput of system is high, and the computational load of each edge device is small, and the volume of transmitted data between user and edge device is small, and secret protection degree is high.

Description

A kind of data flow difference method for secret protection and system based on edge calculations
Technical field
The present invention relates to edge calculations field more particularly to a kind of data flow difference secret protection sides based on edge calculations Method and system.
Background technique
With the arrival of information age, IT industry is fast-developing.Internet is as information with fastest developing speed One of technical industry can provide diversified service for user, it has also become indispensable a part in each field.With mutual The explosive increase of networked terminals device category increased with quantity and user are to service quality (Quality of Service, QoS) and diversified demand significant growth, nowadays Internet is also faced with lot of challenges.Wherein, how to handle Mass data, the safety for how guaranteeing the real-time serviced and how ensuring user, are three faced in Internet Significant challenge.
Cloud computing is calculated by centralization as a kind of calculating mode based on Internet and storage offer can expand on demand The service of exhibition.However, with the growth of terminal device and data volume and the fast development of ubiquitous network network technology, it will be in calculating It passes to cloud not only to need to occupy a large amount of network bandwidth, but also also adds the delay of service request and response, especially to prolonging When sensitive application support aspect, the cloud computing characterized by centralization calculates has been difficult to meet the hair of these technologies and application Exhibition demand.Thus edge calculations have been promoted, mist calculating etc. calculates the rise of mode.It is, in principle, that edge calculations and mist calculate With similar thought, purpose is all to make to calculate closer to user, i.e., extends cloud computing from the large-scale data center of centralization To from the closer network edge of user distance, the disadvantages of with this network bottleneck and high latency to overcome previous cloud computing, improve The service request response speed and user experience of terminal user.Technically, mist calculates and edge calculations pass through apart from user The calculating pressure of cloud is alleviated at closer network edge deployment-specific server or middle-size and small-size calculating center, improves user service QoS.The demand of user can be better meet under the scene that data volume is big, requirement of real-time is high using edge calculations.
In previous cloud computing, data need first to store to cloud to be handled again, be will increase service request in this way and is rung The time answered.If also using the mode for first storing reprocessing in edge calculations, although utilizing " closer to the calculating of user " The response time can be shortened, but be not still a good solution under the higher scene of requirement of real-time.Cause This will greatly shorten the service request response time, this mode is just if data can be handled in data transmission procedure It is the real-time stream solution based on edge calculations.Kafka is as a distributive type processing platform (a Distributed streaming platform), real-time stream processing capacity can be provided for edge device.It has three A key characteristic: (1) it can issue and subscribe to flow data;(2) flow data can be securely stored in distribution, can be answered In the cluster of system and fault tolerant mechanism;(3) flow data of arrival can be handled in time.These three characteristics are a Stream Processing platforms Institute is prerequisite.In Kafka, topic is abstracted to one group of message, or perhaps the classification to message.Common In producer consumer model, Producer can send a message to a topic, these message are stored in referred to as In the Kafka server of brokers, subsequent Consumer can subscribe to the topic and consume these message from brokers.
Although the real-time processing using edge calculations and data flow can bring benefit, edge calculations for the analysis of data Severe safety problem is also faced with as other traditional calculations modes.Such as in mobile application, many online services are relied on In the personal data collected from user, the practicability of mobile application is can be enhanced in these data, provides personalized clothes for user Business, such as advertisement pushing, purchase preference etc., but these personal data can be equally used to be inferred to user by malicious attacker Sensitive information, such as gender deduction, location tracking, speaker's identity identify etc..From the point of view of user, user wishes to expose More fewer better, i.e., as the few as possible collection users personal data of privacy information.From the point of view of ISP, it is desirable to collect More users personal datas provide better service.Obviously, there is essential contradictions between the two.Therefore, such as The availability of information is collected in what tradeoff and the safety of privacy of user is one and needs the problem of carefully considering.
The technology used in the scheme of existing protection privacy of user specifically includes that anonymization processing, data conversion, data add Close and difference privacy etc., even if scheme still has following deficiency at present using these technologies:
1, currently, still can not even if edge calculations make a part calculate the edge device for transferring to close user from cloud Meet the service of high real-time requirements.
2, edge calculations are faced with the problem of safety, i.e. the data of edge device processing are related to availability of data and privacy The contradiction of safety.
3, centralized data cleansing (going privacy) the limitation throughput of system that secret protection mode uses mostly at present, nothing Method meets the needs of low delay service.
4, there are contradictions between edge device computing capability and security strategy.
Summary of the invention
The technical problem to be solved in the present invention is that, for technical problem of the existing technology, the present invention provides one Kind service response delay is small, and service quality is high, and throughput of system is high, and the computational load of each edge device is small, user and edge Volume of transmitted data between equipment is small, the high data flow difference method for secret protection based on edge calculations of secret protection degree and is System.
In order to solve the above technical problems, technical solution proposed by the present invention are as follows: a kind of data flow based on edge calculations is poor Divide method for secret protection, comprising:
S1. edge device reception is acquired by terminal device, and the spy obtained after preset encoder carries out feature extraction Levy data;
S2. it polymerize the characteristic and adds turbulent noise;
S3. feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, obtains weight Structure data;
The encoder and the decoder are the encoder and decoder obtained after being trained to same self-encoding encoder.
Further, it is single that characteristic described in step S1, which is the terminal device according to preset acquisition time window, Position, collects in an acquisition time window, and the characteristic by obtaining after the progress feature extraction of preset encoder.
Further, step S2 is specifically included: the input layer in the edge device is according to preset first time Window, and the characteristic by each terminal device acquisition received in the first time window is polymerize, And the turbulent noise budget of each characteristic is calculated, it is that characteristic addition disturbance is made an uproar according to the turbulent noise budget Sound.
Further, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer Current first time window in i-th of input feature vector average degree of correlation, i.e., the point centered on current signature calculates adjacent spy Average Euclidean distance between sign, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiFor The privacy budget of i-th of input feature vector in the current first time window of current input layer.
Further, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0i) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th The privacy budget of a input feature vector.
Further, step S3 is specifically included: the output node layer in the edge device is according to preset second time Window reception and the characteristic polymerizeing after the addition turbulent noise that the input layer provides, and pass through preset decoder Feature reconstruction is carried out to the characteristic received in second time window, obtains reconstruct data.
A kind of data flow difference intimacy protection system based on edge calculations, including edge calculations equipment, are used for: receive by Terminal device acquisition, and the characteristic obtained after preset encoder carries out feature extraction;It polymerize the characteristic simultaneously Add turbulent noise;Feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, is obtained Reconstruct data;The encoder and the decoder are the encoder obtained after being trained to same self-encoding encoder and decoding Device.
Further, the edge device includes input layer, and the input layer is used for according to preset first Time window, and the characteristic by each terminal device acquisition received in the first time window is gathered It closes, and calculates the turbulent noise budget of each characteristic, be characteristic addition disturbance according to the turbulent noise budget Noise.
Further, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer The average degree of correlation of i-th of input feature vector, i.e., the point centered on current signature calculate adjacent feature in current first time window Between average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiTo work as The privacy budget of i-th of input feature vector in the current first time window of preceding input layer.
Further, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0i) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th The privacy budget of a input feature vector.
Further, the edge device includes output node layer, and the output node layer is used for according to preset second Time window receives and polymerize the characteristic after the addition turbulent noise that the input layer provides, and passes through preset solution Code device carries out feature reconstruction to the characteristic received in second time window, obtains reconstruct data.
It further, further include terminal device, the terminal device is used to according to preset acquisition time window be unit Data are acquired, and the data in the acquisition time window are subjected to feature extraction, obtained feature according to preset encoder Data, and it is supplied to the edge device.
Compared with the prior art, the advantages of the present invention are as follows:
1, the present invention acquires data according to acquisition time window by setting acquisition time window on the terminal device, Feature extraction is carried out, and sends edge device to and carries out subsequent processing, the input layer of edge device is according to first time window Mouth accesses characteristic transmitted by the terminal device of the node to receive, and is the addition of each characteristic by adaptive algorithm Turbulent noise, the output node layer of edge device receive the spy after input layer addition turbulent noise according to the second time window Data are levied, and are reconstructed to obtain reconstruct data by decoder, reconstruct data are supplied to other systems and are used, after reconstruct Data will be unable to obtain the sensitive information of user, in this way, can effectively reduce the response delay of edge device, mention High service quality, is effectively protected the privacy of user.
2, edge device of the invention has multiple input layers, and each input layer and multiple terminal devices connect It connects, the characteristic of the terminal device accessed is handled, by using this distributed processing mode, improve side The throughput of system of edge equipment reduces the computational load of the input layer of each edge device, also ensures entire edge Computing system it is stable.
3, collected data are aligned by terminal device of the invention through Hash, then carry out feature by the encoder of itself After extraction to, then characteristic is sent to the input layer of edge device, reduces the terminal device and edge device of user Input layer between volume of transmitted data, reduce the waste of network bandwidth;Also, encoder and decoder be it is same from Two parts in encoder, reload after training in advance to terminal device, do not need terminal device and carry out to encoder Training, also reduces the requirement to the processing capacity of terminal device.
Detailed description of the invention
Fig. 1 is the flow diagram of the specific embodiment of the invention.
Fig. 2 is the system architecture schematic diagram of the specific embodiment of the invention.
Fig. 3 is the self-encoding encoder configuration diagram of the specific embodiment of the invention.
Specific embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and It limits the scope of the invention.
As shown in Figure 1, the data flow difference method for secret protection based on edge calculations of the present embodiment, comprising: the edge S1. Equipment reception is acquired by terminal device, and the characteristic obtained after preset encoder carries out feature extraction;S2. it polymerize institute It states characteristic and adds turbulent noise;S3. by preset decoder to it is described addition turbulent noise after characteristic into Row feature reconstruction obtains reconstruct data;The encoder and the decoder are obtained after being trained to same self-encoding encoder Encoder and decoder.
In the present embodiment, characteristic described in step S1 is the terminal device according to preset acquisition time window It for unit, collects in an acquisition time window, and the feature by being obtained after the progress feature extraction of preset encoder Data.Step S2 is specifically included: the input layer in the edge device, and will be described according to preset first time window The received characteristic by each terminal device acquisition is polymerize in first time window, and calculates each characteristic According to turbulent noise budget, according to the turbulent noise budget be the characteristic add turbulent noise.
In the present embodiment, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer The average degree of correlation of i-th of input feature vector, i.e., the point centered on current signature calculate adjacent feature in current first time window Between average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiTo work as The privacy budget of i-th of input feature vector in the current first time window of preceding input layer.
In the present embodiment, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0i) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th The privacy budget of a input feature vector.
In the present embodiment, step S3 is specifically included: the output node layer in the edge device is according to preset second Time window receives and polymerize the characteristic after the addition turbulent noise that the input layer provides, and passes through preset solution Code device carries out feature reconstruction to the characteristic received in second time window, obtains reconstruct data.
A kind of data flow difference intimacy protection system based on edge calculations, including edge calculations equipment, are used for: receive by Terminal device acquisition, and the characteristic obtained after preset encoder carries out feature extraction;It polymerize the characteristic simultaneously Add turbulent noise;Feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, is obtained Reconstruct data;The encoder and the decoder are the encoder obtained after being trained to same self-encoding encoder and decoding Device.
In the present embodiment, the edge device includes input layer, and the input layer is used for according to preset First time window, and by the characteristic by each terminal device acquisition received in the first time window into Row polymerization, and the turbulent noise budget of each characteristic is calculated, it is characteristic addition according to the turbulent noise budget Turbulent noise.
Further, the turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkFor company, current input layer institute The terminal device number connect, εkFor the privacy budget of current input layer, βiIndicate that each feature exists in current first time window When ratio shared in the privacy budget of current input layer, d indicates the dimension of feature,Indicate current input layer The average degree of correlation of i-th of input feature vector, i.e., the point centered on current signature calculate adjacent feature in current first time window Between average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiTo work as The privacy budget of i-th of input feature vector in the current first time window of preceding input layer.
In the present embodiment, it is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0i) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark Know global susceptibility, Lap () is laplacian distribution, εiIt is in the current first time window of current input layer i-th The privacy budget of a input feature vector.
In the present embodiment, the edge device includes output node layer, and the output node layer is used for according to preset Second time window receives and polymerize the characteristic after the addition turbulent noise that the input layer provides, and by default Decoder feature reconstruction is carried out to the characteristic that receives in second time window, obtain reconstruct data.
It in the present embodiment, further include terminal device, the terminal device is for being according to preset acquisition time window Unit acquires data, and the data in the acquisition time window are carried out feature extraction according to preset encoder, obtains Characteristic, and it is supplied to the edge device.
In the present embodiment, using city call a taxi application scenarios 10000 real data as carrying out for experimental data Illustrate, experimental data includes 17 fields: medallion (with vehicle binding logo md5 value), hack_license are (with taxi The md5 value of driving license binding logo), pickup_datetime (passenger loading time), dropoff_datetime (passenger getting off car Time), trip_time_in_secs (riding time), trip_distance (running distance), fare_amount (expense gold Volume), surcharge (surcharge), mta_tax (tax), tip_amount (tip), tolls_amount (pass cost), The purpose of total_amount (all total costs) etc., the inquiry in cloud is expense summation of riding in each time window of statistics. Need to inquire due to cloud is expense summation by bus, it is therefore desirable to be retained with time and costs related field: pickup_ datetime、dropoff_datetime、fare_amount、surcharge、mta_tax、tip_amount、tolls_ Amount and total_amount.
In the application scenarios of the present embodiment, system architecture is as shown in Fig. 2, include multiple terminal devices (smart phone) With the edge device being made of multiple pc machine, it is in communication with each other using interchanger and the realization of high speed cable.Edge device includes multiple Input layer and an output node layer, each input layer and multiple terminal devices are connected to the network, receiving terminal apparatus The characteristic of transmission, and characteristic is polymerize, and add turbulent noise (difference disturbance, the disturbance of difference privacy).It is defeated Node layer is connect with input layer out, for receiving the data after turbulent noise is added in each input layer connection, and is polymerize And feature reconstruction, and the reconstruct data after feature reconstruction are exported, to be supplied to other equipment, system (such as cloud) uses.Terminal Equipment and edge device input layer are many-to-one relationships, i.e. the corresponding input layer of a terminal device, one Input layer corresponds to multiple terminal devices.Between terminal device and the input layer of edge device and edge device The mode that data flow is all made of between input layer and output node layer is transmitted.
In the application scenarios of the present embodiment, terminal device has data acquisition and feature extraction functions on software, leads to Cross the input layer for calling the API transmission feature data of edge device platform to edge device.Edge device on software by Kafka forms distributed computing framework, and wherein data are stored in kafka brokers, and the logical node of edge device is corresponding Topic in kafka, data flow just execute corresponding task (task) after flowing through topic, and it is poly- that input layer executes data flow Merge adaptive addition difference privacy disturbance, output node layer executes data flow polymerization and feature reconstruction.Pass through above mistake The data flow of journey, edge device output meets the definition of difference privacy, it is ensured that the transparency that sensitive information analyzes cloud.
In the application scenarios of the present embodiment, the application of self-encoding encoder is related to terminal device and edge device, concerning data The addition of the reduction of amount and the disturbance of difference privacy, it is preferred to use non-complete self-encoding encoder.Encoder in non-complete self-encoding encoder It can achieve the effect that similar principal component analysis (Principal Component Analysis), extract the main spy in data Sign.In embodiments of the present invention, it is preferred to use non-complete self-encoding encoder framework as shown in Figure 3, wherein encoder has 4 layers Neuron (does not include input layer), and the number of every layer of neuron is (6,5,3,3), and there are decoder 4 layers of neuron (not include defeated Enter layer), the number of every layer of neuron is (3,4,5,8).The training of self-encoding encoder uses off-line training mode, i.e., uses data in advance Collection is trained self-encoding encoder, obtains trained non-complete self-encoding encoder.
In the application scenarios of the present embodiment, the encoder neuron (i.e. encoder) of trained non-complete self-encoding encoder It operates in terminal device, decoder neuron (i.e. decoder) operates in the last one logic section of the edge device such as Fig. 2 On point (i.e. the output layer of edge device), for feature to be reconstructed.Terminal is placed on by separating encoder and decoder Equipment and edge device can reduce the data volume of transmission.In order to protect the safety of user data, meet difference privacy in addition Turbulent noise when, present invention preferably uses be characterized on edge device data addition turbulent noise.
In the application scenarios of the present embodiment, after reserved field has been determined, needs to train non-complete self-encoding encoder, be Enable the field data of selection to input self-encoding encoder to be trained, needing for each field to be converted into regular length is k ratio Special string is in the present embodiment aligned each field using hash algorithm to obtain the string of k bit, every in data set New record, every record by alignment pass through matrix operation group to group to the field of message in a row after Hash alignment A message matrix is synthesized, every message matrix is combined into final training set matrix also by matrix operation, finally by the instruction Practice the above-mentioned non-complete self-encoding encoder of collection input to be trained, loss function is L (x, g (f (z (x)))), and wherein L () is usually adopted With mean square deviation function, g () is decoder, and f () is encoder, and z () is Hash alignment operation.The self-encoding encoder trained In encoder operate in each terminal device, i.e., each terminal device has the pair of an encoder neural network model This, decoder operates in edge device, i.e., one and only one decoder copy operates on edge device.
In the present embodiment application scenarios, as shown in Fig. 2, terminal device is smart phone, data acquisition and feature are taken out Take and etc. by software realization.The entire data of smart phone acquire and feature extraction process is with preset acquisition time Window is unit, and the acquisition time window between different mobile phones is asynchronous execution, i.e., is not necessarily to communication-cooperation between each mobile phone.Tool The process of body are as follows: for some mobile phone, in an acquisition time window, to the data that acquire of needs with a lesser time Interval spans are acquired and are cached, and only caching needs the relevant field retained when acquiring data.In view of the process performance system of mobile phone About, in the present embodiment preferably by way of batch processing, the data of caching are handled according to batch, when the number of caching As soon as reach a batch size according to amount, Hash alignment carried out to the data of this batch at once, and by the number after alignment According to input coding device neural network extraction feature;When acquisition time equals or exceeds acquisition time window threshold value, regardless of remaining Whether the data volume of acquisition meets a batch, all carries out Hash alignment operation and carries out feature extraction, finally, by currently adopting All characteristics extracted in collection time window are sent to terminal device.
In the present embodiment application scenarios, edge device is made of multiple pc machine, and the effect of edge device is to receive difference The characteristic that terminal device transmits, and to characteristic addition turbulent noise (difference disturbs, the disturbance of difference privacy) To meet difference privacy, characteristic is finally reconstructed to the subsequent analysis so as to cloud.Since edge device is not the high property in cloud Energy computer, therefore limited in performance and memory capacity.For this purpose, in the present embodiment, edge device is using distributed Computational frame disposes kafka Data Stream Processing frame that is, on multiple pc machine, and kafka frame is based on zookeeper frame, and Zookeeper frame is a centralized service, for safeguarding configuration information, name, providing distributed synchronization and offer group The distributed storage and redundancy backup of data may be implemented using zookeeper, configured by zookeeper by service, kafka File can set redundancy number of data etc., solve the problems, such as the limitation of single equipment memory capacity, and kafka is utilized to realize The data flow distributed treatment problem that then very good solution equipment performance restriction band is come.In the present embodiment, edge device is utilized Kafka data flow framework realizes distributed computing and flow data processing, as shown in Fig. 2, kafka topic and edge device Logical node corresponds, and the node that receiving terminal apparatus data flow is used in edge device is input layer, is used for logarithm According to the addition that the polymerization of stream and difference privacy disturb, the node of edge device output data is output node layer, defeated for polymerizeing Enter the data flow of node layer output, is responsible for the polymerization of data flow and the reconstruct of characteristic, input layer section similar with terminal device Point and output node layer respectively have between oneself corresponding time window namely input layer and have the asynchronous and identical time The first time window of window, i.e. input layer is all the same, but and asynchronous execution, output node layer only one, and Export the second time window of node layer and input layer is unrelated namely first time window and the second time window it is mutually only It is vertical.It should be noted that edge device shown in Fig. 2 is not physical structure, but logical architecture, i.e., it is physically multiple pc Machine has collectively constituted distributed traffic processing platform, does not have layer architecture shown in Fig. 2.
In the application scenarios of the present embodiment, more terminal devices (smart phone) are connect by kafka producer api The corresponding topic of input layer that data after feature extraction are wirelessly transmitted to edge device by mouth, each of input layer are patrolled It collects node and persistently receives the characteristic data flow sent from smart phone and caching in a first time window, utilize kafka Streams api, which extracts the characteristic data flow of different intelligent mobile phone, to be polymerize and caches, be equal to when the time that this process is spent or When greater than first time window threshold value, in order to enhance the availability that later reconstitution comes out data, it is calculated from the formula and currently patrols Collect the adaptive turbulent noise budget ε for the characteristic value that node receivesiAnd turbulent noise is added into the data of caching, it makes an uproar adding Data be combined into new data flow, be then transmitted to the corresponding topic of output node layer;Node layer is exported equally certainly The characteristic data flow after the addition disturbance of input layer Different Logic node is polymerize in oneself the second time window, utilizes kafka Consumer api obtains the specific data in data flow and caches, and is equal to or more than for the second time when the time that this process is spent When window threshold value, the data cached in current second time window are converted to the decoding of training pattern before matrix form input Feature reconstruction is carried out in device neural network, finally, being output to progress data analysis in the Cloud Server of distal end.
In the application scenarios of the present embodiment, difference privacy turbulent noise how is added and affects feature reconstruction coming out data Safety and availability.Common practice is that identical disturbance is added to each characteristic value in the prior art, however real feelings The contribution that condition shows that not each characteristic value exports decoder is identical, therefore, uses adaptive algorithm in the present embodiment Turbulent noise is added, under conditions of guaranteeing safety (fixed total privacy budget), feature reconstruction data influence is contributed small Feature adds disturbance as much as possible, and adds disturbance as small as possible to big feature is influenced, and improve reconstruct data can The property used.By using the formula of above-mentioned formula (1) and formula (2), turbulent noise is added to characteristic, can be very good to guarantee data Safety.
Above-mentioned only presently preferred embodiments of the present invention, is not intended to limit the present invention in any form.Although of the invention It has been disclosed in a preferred embodiment above, however, it is not intended to limit the invention.Therefore, all without departing from technical solution of the present invention Content, technical spirit any simple modifications, equivalents, and modifications made to the above embodiment, should all fall according to the present invention In the range of technical solution of the present invention protection.

Claims (12)

1. a kind of data flow difference method for secret protection based on edge calculations, it is characterised in that:
S1. edge device reception is acquired by terminal device, and the characteristic obtained after preset encoder carries out feature extraction According to;
S2. it polymerize the characteristic and adds turbulent noise;
S3. feature reconstruction is carried out to the characteristic after the addition turbulent noise by preset decoder, obtains reconstruct number According to;
The encoder and the decoder are the encoder and decoder obtained after being trained to same self-encoding encoder.
2. the data flow difference method for secret protection according to claim 1 based on edge calculations, it is characterised in that:
It according to preset acquisition time window is unit that characteristic described in step S1, which is the terminal device, is collected In one acquisition time window, and the characteristic by being obtained after the progress feature extraction of preset encoder.
3. the data flow difference method for secret protection according to claim 2 based on edge calculations, it is characterised in that: step S2 is specifically included: the input layer in the edge device is according to preset first time window, and by the first time The received characteristic by each terminal device acquisition is polymerize in window, and calculates the disturbance of each characteristic Noise budget is that the characteristic adds turbulent noise according to the turbulent noise budget.
4. the data flow difference method for secret protection according to claim 3 based on edge calculations, it is characterised in that: described Turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkThe end connected by current input layer End equipment number, εkFor the privacy budget of current input layer, βiIndicate that each feature is when current in current first time window Shared ratio in the privacy budget of input layer, d indicate the dimension of feature,Indicate current the of current input layer The average degree of correlation of i-th of input feature vector in one time window, i.e., the point centered on current signature, calculates between adjacent feature Average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiCurrently to input The privacy budget of i-th of input feature vector in the current first time window of node layer.
5. the data flow difference method for secret protection according to claim 4 based on edge calculations, it is characterised in that: according to Formula (2) is that the characteristic adds turbulent noise:
fi'=fi+Lap(Δh0i) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark is complete Office's susceptibility, Lap () are laplacian distribution, εiIt is defeated for i-th in the current first time window of current input layer Enter the privacy budget of feature.
6. the data flow difference method for secret protection according to claim 5 based on edge calculations, it is characterised in that: step S3 is specifically included: the output node layer in the edge device receives according to preset second time window and polymerize the input Characteristic after the addition turbulent noise that node layer provides, and it is inscribed to second time window by preset decoder The characteristic received carries out feature reconstruction, obtains reconstruct data.
7. a kind of data flow difference intimacy protection system based on edge calculations, it is characterised in that: including edge calculations equipment, use In: the characteristic for receiving and being acquired by terminal device, and obtained after preset encoder carries out feature extraction;It polymerize the spy Sign data simultaneously add turbulent noise;Feature weight is carried out to the characteristic after the addition turbulent noise by preset decoder Structure obtains reconstruct data;The encoder and the decoder are the encoders obtained after being trained to same self-encoding encoder And decoder.
8. the data flow difference intimacy protection system according to claim 7 based on edge calculations, it is characterised in that: described Edge device includes input layer, and the input layer is used for according to preset first time window, and by described first The received characteristic by each terminal device acquisition is polymerize in time window, and calculates each characteristic Turbulent noise budget is that the characteristic adds turbulent noise according to the turbulent noise budget.
9. the data flow difference intimacy protection system according to claim 8 based on edge calculations, it is characterised in that: described Turbulent noise budget is calculated according to formula (1) and is determined:
In formula (1), ε is preset total privacy budget, and n is terminal device sum, nkThe end connected by current input layer End equipment number, εkFor the privacy budget of current input layer, βiIndicate that each feature is when current in current first time window Shared ratio in the privacy budget of input layer, d indicate the dimension of feature,Indicate current the of current input layer The average degree of correlation of i-th of input feature vector in one time window, i.e., the point centered on current signature, calculates between adjacent feature Average Euclidean distance, fjIndicate j-th of characteristic value in the current first time window of current input layer, εiCurrently to input The privacy budget of i-th of input feature vector in the current first time window of node layer.
10. the data flow difference intimacy protection system according to claim 9 based on edge calculations, it is characterised in that: root It is that the characteristic adds turbulent noise according to formula (2):
fi'=fi+Lap(Δh0i) (2)
In formula (2), fi' it is the characteristic value added after turbulent noise, fiFor the characteristic value before addition turbulent noise, Δ h0Mark is complete Office's susceptibility, Lap () are laplacian distribution, εiIt is defeated for i-th in the current first time window of current input layer Enter the privacy budget of feature.
11. the data flow difference intimacy protection system according to claim 10 based on edge calculations, it is characterised in that: institute Stating edge device includes output node layer, and the output node layer according to preset second time window for receiving and polymerizeing institute Characteristic after the addition turbulent noise of input layer offer is provided, and by preset decoder to second time window The characteristic received in mouthful carries out feature reconstruction, obtains reconstruct data.
12. special according to the described in any item data flow difference intimacy protection systems based on edge calculations of claim 7 to 11 Sign is:
It further include terminal device, the terminal device is used to be unit acquisition data according to preset acquisition time window, and will Data in the acquisition time window carry out feature extraction, obtained characteristic according to preset encoder, and are supplied to The edge device.
CN201811379012.4A 2018-11-19 2018-11-19 Data stream differential privacy protection method and system based on edge calculation Active CN109495476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811379012.4A CN109495476B (en) 2018-11-19 2018-11-19 Data stream differential privacy protection method and system based on edge calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811379012.4A CN109495476B (en) 2018-11-19 2018-11-19 Data stream differential privacy protection method and system based on edge calculation

Publications (2)

Publication Number Publication Date
CN109495476A true CN109495476A (en) 2019-03-19
CN109495476B CN109495476B (en) 2020-11-20

Family

ID=65696894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811379012.4A Active CN109495476B (en) 2018-11-19 2018-11-19 Data stream differential privacy protection method and system based on edge calculation

Country Status (1)

Country Link
CN (1) CN109495476B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110213036A (en) * 2019-06-17 2019-09-06 西安电子科技大学 Based on the storage of Internet of Things mist calculating-edge calculations secure data and calculation method
CN110300159A (en) * 2019-06-10 2019-10-01 华侨大学 A kind of sensing cloud data safety low cost storage method based on edge calculations
CN110443063A (en) * 2019-06-26 2019-11-12 电子科技大学 The method of the federal deep learning of self adaptive protection privacy
CN111082997A (en) * 2019-12-30 2020-04-28 西安电子科技大学 Network function arrangement method based on service identification in mobile edge computing platform
CN111222532A (en) * 2019-10-23 2020-06-02 西安交通大学 Edge cloud collaborative deep learning model training method with classification precision maintenance and bandwidth protection
CN111401272A (en) * 2020-03-19 2020-07-10 支付宝(杭州)信息技术有限公司 Face feature extraction method, device and equipment
CN111914285A (en) * 2020-06-09 2020-11-10 深圳大学 Geographical distributed graph calculation method and system based on differential privacy
CN112541593A (en) * 2020-12-06 2021-03-23 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model based on privacy protection
CN112541574A (en) * 2020-12-03 2021-03-23 支付宝(杭州)信息技术有限公司 Privacy-protecting business prediction method and device
CN114070950A (en) * 2020-07-30 2022-02-18 北京市商汤科技开发有限公司 Image processing method and related device and equipment
CN116049840A (en) * 2022-07-25 2023-05-02 荣耀终端有限公司 Data protection method, device, related equipment and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358113A (en) * 2017-06-01 2017-11-17 徐州医科大学 Based on the anonymous difference method for secret protection of micro- aggregation
CN108011948A (en) * 2017-11-30 2018-05-08 成都航天科工大数据研究院有限公司 A kind of industrial equipment integrated monitoring system based on edge calculations
CN108093401A (en) * 2017-12-13 2018-05-29 电子科技大学 A kind of mobile intelligent terminal intimacy protection system and method based on edge calculations
CN108234493A (en) * 2018-01-03 2018-06-29 武汉大学 The space-time crowdsourcing statistical data dissemination method of secret protection under insincere server
US20180189164A1 (en) * 2017-01-05 2018-07-05 Microsoft Technology Licensing, Llc Collection of sensitive data--such as software usage data or other telemetry data--over repeated collection cycles in satisfaction of privacy guarantees
CN108280491A (en) * 2018-04-18 2018-07-13 南京邮电大学 A kind of k means clustering methods towards difference secret protection
US20180307854A1 (en) * 2017-04-25 2018-10-25 Sap Se Tracking privacy budget with distributed ledger
CN108734217A (en) * 2018-05-22 2018-11-02 齐鲁工业大学 A kind of customer segmentation method and device based on clustering

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180189164A1 (en) * 2017-01-05 2018-07-05 Microsoft Technology Licensing, Llc Collection of sensitive data--such as software usage data or other telemetry data--over repeated collection cycles in satisfaction of privacy guarantees
US20180307854A1 (en) * 2017-04-25 2018-10-25 Sap Se Tracking privacy budget with distributed ledger
CN107358113A (en) * 2017-06-01 2017-11-17 徐州医科大学 Based on the anonymous difference method for secret protection of micro- aggregation
CN108011948A (en) * 2017-11-30 2018-05-08 成都航天科工大数据研究院有限公司 A kind of industrial equipment integrated monitoring system based on edge calculations
CN108093401A (en) * 2017-12-13 2018-05-29 电子科技大学 A kind of mobile intelligent terminal intimacy protection system and method based on edge calculations
CN108234493A (en) * 2018-01-03 2018-06-29 武汉大学 The space-time crowdsourcing statistical data dissemination method of secret protection under insincere server
CN108280491A (en) * 2018-04-18 2018-07-13 南京邮电大学 A kind of k means clustering methods towards difference secret protection
CN108734217A (en) * 2018-05-22 2018-11-02 齐鲁工业大学 A kind of customer segmentation method and device based on clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHUGUI XU: "Distilling at the Edge:A Local Differential Privacy Obfuscation Framework for IoT Data Analytics", 《IEEE COMMUNICATIONS MAGAZINE》 *
兰丽辉: "基于向量模型的加权社会网络发布隐私保护方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110300159A (en) * 2019-06-10 2019-10-01 华侨大学 A kind of sensing cloud data safety low cost storage method based on edge calculations
CN110300159B (en) * 2019-06-10 2021-08-31 华侨大学 Sensing cloud data safe low-cost storage method based on edge computing
CN110213036B (en) * 2019-06-17 2021-07-06 西安电子科技大学 Safe data storage and calculation method based on fog calculation-edge calculation of Internet of things
CN110213036A (en) * 2019-06-17 2019-09-06 西安电子科技大学 Based on the storage of Internet of Things mist calculating-edge calculations secure data and calculation method
CN110443063A (en) * 2019-06-26 2019-11-12 电子科技大学 The method of the federal deep learning of self adaptive protection privacy
CN110443063B (en) * 2019-06-26 2023-03-28 电子科技大学 Adaptive privacy-protecting federal deep learning method
CN111222532A (en) * 2019-10-23 2020-06-02 西安交通大学 Edge cloud collaborative deep learning model training method with classification precision maintenance and bandwidth protection
CN111222532B (en) * 2019-10-23 2024-04-02 西安交通大学 Training method for edge cloud collaborative deep learning model with classification precision maintenance and bandwidth protection
CN111082997A (en) * 2019-12-30 2020-04-28 西安电子科技大学 Network function arrangement method based on service identification in mobile edge computing platform
CN111082997B (en) * 2019-12-30 2021-05-14 西安电子科技大学 Network function arrangement method based on service identification in mobile edge computing platform
CN111401272A (en) * 2020-03-19 2020-07-10 支付宝(杭州)信息技术有限公司 Face feature extraction method, device and equipment
CN111401272B (en) * 2020-03-19 2021-08-24 支付宝(杭州)信息技术有限公司 Face feature extraction method, device and equipment
CN113657352A (en) * 2020-03-19 2021-11-16 支付宝(杭州)信息技术有限公司 Face feature extraction method, device and equipment
CN111914285B (en) * 2020-06-09 2022-06-17 深圳大学 Geographic distributed graph calculation method and system based on differential privacy
CN111914285A (en) * 2020-06-09 2020-11-10 深圳大学 Geographical distributed graph calculation method and system based on differential privacy
CN114070950A (en) * 2020-07-30 2022-02-18 北京市商汤科技开发有限公司 Image processing method and related device and equipment
CN112541574A (en) * 2020-12-03 2021-03-23 支付宝(杭州)信息技术有限公司 Privacy-protecting business prediction method and device
CN112541593A (en) * 2020-12-06 2021-03-23 支付宝(杭州)信息技术有限公司 Method and device for jointly training business model based on privacy protection
CN116049840A (en) * 2022-07-25 2023-05-02 荣耀终端有限公司 Data protection method, device, related equipment and system
CN116049840B (en) * 2022-07-25 2023-10-20 荣耀终端有限公司 Data protection method, device, related equipment and system

Also Published As

Publication number Publication date
CN109495476B (en) 2020-11-20

Similar Documents

Publication Publication Date Title
CN109495476A (en) A kind of data flow difference method for secret protection and system based on edge calculations
CN110399742B (en) Method and device for training and predicting federated migration learning model
CN106097019A (en) Virtual objects packet transmission method, device and system
CN106657379A (en) Implementation method and system for NGINX server load balancing
CN112464179B (en) Short video copyright storage algorithm based on block chain and expression recognition
CN108874823A (en) The implementation method and device of intelligent customer service
CN110516418A (en) A kind of operation user identification method, device and equipment
CN108664914A (en) Face retrieval method, apparatus and server
CN109104696B (en) Track privacy protection method and system for mobile user based on differential privacy
CN106982356A (en) A kind of distributed extensive video flow processing system
CN111125386B (en) Media resource processing method and device, storage medium and electronic device
CN110210858A (en) A kind of air control guard system design method based on intelligent terminal identification
CN106921658A (en) A kind of router device safety protecting method and system
CN109598110A (en) A kind of recognition methods of user identity and device
CN109214326A (en) A kind of information processing method, device and system
WO2023000261A1 (en) Regional traffic prediction method and device
CN111832661B (en) Classification model construction method, device, computer equipment and readable storage medium
Hsiang et al. Analysis of the effect of automotive ethernet camera image quality on object detection models
CN107729860A (en) Recognition of face computational methods and Related product
CN115858182B (en) Intelligent adaptation method and system applied to edge computing nodes of meta universe
CN107948312A (en) A kind of information categorization dissemination method and system using location point as information entrance
CN111353093B (en) Problem recommendation method, device, server and readable storage medium
CN112669353B (en) Data processing method, data processing device, computer equipment and storage medium
CN113362852A (en) User attribute identification method and device
CN110033049A (en) For generating model, for the method and apparatus of output information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant