CN114817976B - Sensor data protection method, system, computer equipment and intelligent terminal - Google Patents
Sensor data protection method, system, computer equipment and intelligent terminal Download PDFInfo
- Publication number
- CN114817976B CN114817976B CN202210253232.2A CN202210253232A CN114817976B CN 114817976 B CN114817976 B CN 114817976B CN 202210253232 A CN202210253232 A CN 202210253232A CN 114817976 B CN114817976 B CN 114817976B
- Authority
- CN
- China
- Prior art keywords
- data
- sensor
- distribution
- module
- action
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 129
- 230000009471 action Effects 0.000 claims abstract description 104
- 238000004088 simulation Methods 0.000 claims abstract description 88
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 37
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000005295 random walk Methods 0.000 claims abstract description 26
- 238000010276 construction Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 52
- 238000001914 filtration Methods 0.000 claims description 40
- 230000008569 process Effects 0.000 claims description 40
- 238000011084 recovery Methods 0.000 claims description 35
- 230000007704 transition Effects 0.000 claims description 33
- 230000006399 behavior Effects 0.000 claims description 28
- 239000011159 matrix material Substances 0.000 claims description 26
- 238000005457 optimization Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 23
- 230000033001 locomotion Effects 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 16
- 238000000342 Monte Carlo simulation Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 14
- 238000012546 transfer Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 6
- 239000002131 composite material Substances 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 2
- 102000002274 Matrix Metalloproteinases Human genes 0.000 claims 1
- 108010000684 Matrix Metalloproteinases Proteins 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 10
- 239000003999 initiator Substances 0.000 abstract description 4
- 230000015572 biosynthetic process Effects 0.000 abstract 1
- 238000003786 synthesis reaction Methods 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 7
- 230000007123 defense Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 4
- 125000004122 cyclic group Chemical group 0.000 description 4
- 230000007547 defect Effects 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- 238000012952 Resampling Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 206010013710 Drug interaction Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000001404 mediated effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 230000010399 physical interaction Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Bioethics (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention belongs to the technical field of information data security, and discloses a sensor data protection method, a system, computer equipment and an intelligent terminal, wherein a random walk algorithm and a training method for generating an countermeasure network are adopted, a user does not need to define a specific action sequence or consume local computing resources to perform data synthesis, the user only needs to define the proportion of each action before use, then predefined data is transmitted to a cloud server, the cloud server completes action sequence construction and multi-sensor simulation data generation, a simulation data set formed by combining the simulation data with the action sequence is transmitted to a request initiator, the request initiator decomposes the simulation data set, and a Hook method is utilized to replace local sensor interface data, so that the effect of complete anonymization on a multi-sensor of mobile equipment is finally achieved.
Description
Technical Field
The invention belongs to the technical field of information data security, and particularly relates to a sensor data protection method, a sensor data protection system, computer equipment and an intelligent terminal.
Background
With the development of the mobile internet, the integration of new technologies such as intelligent terminals, location services and the like has led to unprecedented development of mobile applications and services. The sensor embedded in the personal intelligent device brings convenient use experience to the user in mobile application oriented to personalized customization. Such as accelerometers, gyroscopes, and magnetometers, the data generated may be used to monitor the physical activity, interaction, and emotion of the user. An application installed on the wearable device may obtain raw sensor data and make inferences for tasks such as gesture recognition or activity recognition. The existing research shows that: the motion sensor can be used as a medium and used by side channel attack, steal sensitive input of a user, obtain the motion state of the user, and identify and track the characteristic equipment. More importantly, the acquisition of sensor data does not require the user to grant rights, which results in a motion sensor based inference of privacy data that is easy to implement and extremely covert.
The existing sensor data privacy protection strategies use false random data or resampling and other distortion data to application programs and other similar methods, so that accuracy and accuracy of the sensor data in usability recognition, such as action recognition and step number calculation, are necessarily reduced, the provided data are obviously different from real data, moreover, the false random data are easily recognized by a service provider, and the application programs may be collapsed, so that service cannot be provided for users. The existing defense strategies are carried out on the premise of ensuring availability, privacy protection of the whole life cycle is not considered, and if users simultaneously need to protect user background knowledge such as movement information, the existing defense strategies cannot be effectively protected. The current simulation data generation method is limited to using a generation countermeasure network to solve the generation problem, and can not effectively solve the problem of small generation data space; when the iteration number reaches a certain number, the simulated data with higher similarity can appear. In addition, the current simulation data generation method is only aimed at a single sensor when data are generated, when an application program needs multi-sensor joint judgment, the cooperation of the multiple sensors cannot be completed, and therefore the condition that the simulation data of target behaviors are distorted is caused.
The current defense measures have some drawbacks. Providing random dummy data or resampled distorted data to an application can reduce precision and accuracy, leading to larger errors. The fuzzy data processed by the model is provided for the application program, and the usability is improved, but the model processing time is long, so that the timeliness of the sensor data cannot be met, and the real usability cannot be achieved. For the generation of the simulated data, the existing data generation scheme based on the generation countermeasure network has the problem of mode collapse, and is only feasible in a small batch and a small range. The prior art does not consider complete anonymity, and when an attacker obtains a certain background knowledge, protection of user privacy is affected to a certain extent. Specifically, for example, patent CN201810257632.4, patent No. CN201810257632.4, patent "differential privacy-based android terminal sensor information protection method" refers to a method of selecting to add specific Laplace noise into real data, and compared with the full confusion method, the scheme still reveals part of background information due to the fact that real data is used. The patent number is CN202110312274.4, the scheme model can only solve the generation problem, can generate specific sensor data aiming at specific actions, and achieves the effect of generating simulated data, but solves the problem of mode collapse caused by repeated use of the generated countermeasure network model, has the problem of repeated data in a large scale, and can not finish the cooperation of multiple sensors only aiming at a single sensor when the model generates data.
Through the above analysis, the problems and defects existing in the prior art are as follows:
(1) The existing defense strategies have the condition of providing false random data or resampling and other distortion data for the application program, which inevitably reduces the accuracy and precision of the application program in the usability identification, so that the provided data has larger difference from the real data, and the false random data is easily identified by a server, and the application program may be crashed. The existing defense strategies are carried out on the premise of ensuring availability, privacy protection of the whole life cycle is not considered, and if users simultaneously need to protect user background knowledge such as movement information, the existing defense strategies cannot be effectively protected.
(2) The current simulation data generation method is limited to using a generation countermeasure network to solve the generation problem, and can not effectively solve the problem of small generation data space; when the iteration number reaches a certain number, the simulated data with higher similarity can appear. In addition, the current simulation data generation method is only aimed at a single sensor when data are generated, when an application program needs multi-sensor joint judgment, the cooperation of the multiple sensors cannot be completed, and therefore the condition that the simulation data of target behaviors are distorted is caused.
(3) The prior art does not consider complete anonymity, and when an attacker obtains a certain background knowledge, protection of user privacy is affected to a certain extent.
The difficulty of solving the problems and the defects is as follows: the Android terminal-based sensor data replacement requires replacing codes of an Android system frame layer in the running process of the mobile equipment, so that the difficulty is high; for full life cycle sensor data privacy protection of complete anonymity, aiming at action proportion and transition probability specified by a user, generating an action sequence conforming to predefined distribution and transition probability, and generating simulated data conforming to specified action classification by combining multiple sensors,
The meaning of solving the problems and the defects is as follows: the method for generating the action sequence based on the Monte Carlo method is suitable for constructing false action behavior sequences. The data generation method based on time sequence generation countermeasure network and filtering combination is suitable for constructing the simulation data conforming to the specified action classification. The method achieves the effect of full anonymity privacy protection by omnidirectionally replacing sensor data all the time. The security of Android terminal sensor information is greatly improved, and the Android terminal sensor information system has important theoretical value and practical significance for privacy protection of future mobile terminals.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a sensor data protection method, a sensor data protection system, computer equipment and an intelligent terminal.
The invention is realized in such a way, a sensor data protection method adopts a random walk algorithm and a training method for generating an countermeasure network to synthesize data, and the proportion of each action is defined before use; the method comprises the steps of delivering predefined data to a cloud server, completing action sequence construction and multi-sensor simulation data generation by the cloud server, and delivering a simulation data set formed by combining the simulation data with the action sequence to a request initiator; the request initiator decomposes the simulated data set, and replaces the local sensor interface data by using a Hook method, so that complete anonymization is realized on the multi-sensor of the mobile equipment.
Further, the sensor data protection method replaces sensor data by using the sensor simulation data sequence in an all-time and all-round manner.
Further, the multi-sensor simulation data generation adopts a random walk algorithm based on a Markov chain Monte Carlo method, and by introducing transition probability between actions as suggested distribution for constructing a Markov matrix, receiving distribution in the process of constructing the Markov chain is improved, so that a behavior action sequence conforming to a predefined distribution can be generated after the random walk algorithm is finished;
The multi-sensor simulation data generation adopts a filtering combination method based on a time sequence generation countermeasure network model and Bayesian optimization, generates data sensor data conforming to predefined classification through the time sequence generation countermeasure network, introduces a filtering combination method and searches filtering combination parameters conforming to requirements by using Bayesian optimization.
Further, the sensor data protection method comprises the following steps:
Firstly, initializing a system, inputting motion distribution by a user, wherein the input motion comprises the proportion of standing, walking, running, sitting, lying, going upstairs and downstairs motions, constructing a transition matrix through predefined motion transition probability and the motion distribution input by the user, and performing multiple iterative verification to determine whether stable distribution is achieved or not, so as to provide feasibility support for a subsequent generation motion sequence; the state transition matrix P ij =p (i, j) i, j e S is calculated by the formula P (x, x ')=q (x, x') =q (x, x ') # (x, x'), where S represents all behavior states. By initializing vector lambda 0 = {1,0,0,0,0,0}, bringing into formula lambda t=λt-1 P, wherein P represents a state transition matrix, and obtaining the distribution of t rounds of iteration;
Secondly, constructing a simulated action sequence, namely generating a behavior action sequence conforming to predefined action distribution by using suggested distribution and accepted distribution of a constructed transfer matrix and combining a random walk algorithm, and providing data support for a follow-up sensor simulated data arrangement rule; generating an action sequence by using a random walk algorithm based on a Markov chain Monte Carlo method, wherein the receiving distribution is directly used in the random walk algorithm:
wherein p (x ') represents the distribution of the state x', and p (x) represents the distribution of the state x;
thirdly, generating sensor simulation data, training and generating an countermeasure network model by using real data in advance, enabling the accuracy of simulation data generated by the model under an action recognition task to reach more than 90%, generating multiple groups of data for each action to serve as a buffer, and providing an original data template for a follow-up simulation data space expansion task;
Expanding a simulated data space, taking out the data of each action in the buffer area, combining according to a filtering combination rule, and selecting a plurality of parameters capable of achieving local optimization by using a Bayesian optimization algorithm;
Fifthly, combining and replacing data, generating a behavior action sequence according to the sensor simulation data, filling the simulation data into the sensor simulation data, replacing the sensor data in batches at a Hook sensor data distribution interface at the bottom layer of the mobile equipment, and protecting the privacy safety of the sensor data from a mobile terminal data distribution link.
Further, the simulated action sequence of the second step is generated: a random walk algorithm based on a Markov chain Monte Carlo method is adopted, a state transition matrix is constructed according to action proportion preset by a user, and generation of false action sequences is achieved; using a Monte Carlo method, constructing a Markov transfer matrix adopts a transfer kernel formula as follows:
p(x,x′)=q(x,x′)α(x,x′);
where q (x, x ') is referred to as a proposed distribution and α (x, x') is referred to as a received distribution; the proposed distribution is symmetrical and the reception distribution is:
wherein p (x ') represents the duty cycle of the state x', and p (x) represents the duty cycle of the state x; the proposal distribution is the transition probability from the state x to the state x', which satisfies Wherein X represents a set of states adjacent to state X and includes state X;
The cost function of the countermeasure network model in the third step is as follows:
Wherein the first part on the right side of the formula equal sign represents the desire of the discriminant to train on the real data of the high-dimensional potential space representation, and the second part represents the desire of the discriminant to train on the synthesized data of the high-dimensional potential space synthesized by the generator; wherein G represents a generator network, D represents a discriminator network, E represents a desire, X-p data (X) represents real data sampled from a true dataset, log represents a logarithmic function, X represents real data that Gao Weiqian represents in space, z-p z (z) represents random noise vectors sampled from normal distribution, z represents random noise vectors;
The embedding recovery loss calculation adopts the following formula to calculate the degree of difference between the original data and the data processed by the embedding functional module and the recovery functional module:
Where l R denotes the degree of difference between the original data and the recovered data, E denotes the mathematical expectation, x t denotes the original data, Data representing the mapping of raw data from the raw space to the potential space and from the potential space to the raw space, ||.
The binary judgment module calculates the difference between the real data and the synthesized data by adopting the following loss function in the training process:
Where l U denotes a cross entropy function of the real data and the synthesized data, y t denotes the real data, Representing the composite data.
Further, the expansion simulation data space in the fourth step adopts a filtering combination method to realize expansion of the data space, and simultaneously adopts a Bayesian optimization method to search each parameter of the filtering combination;
The filtering combination method is to combine the original data and the filtering data according to a formula after the frequency and the combination proportion are cut off according to the simulated data generated by the generating countermeasure network:
f1(x1,x2,x3)=x1*filter(x2,data)+x3*data;
Wherein, the first part on the right of the equation equal sign represents a certain proportion of original data, and the second part represents a certain proportion of filtered data; x 1 represents the proportion of the filtered data in the combined data, x 2 represents the cut-off frequency of the filter, data represents the original data, filter (x 2, data) represents the data after the filtering process, and x 3 represents the proportion of the original data in the combined data; the left side of the formula equal sign shows the result after the filtering combination;
Using a Bayesian optimization algorithm to find parameters (x 1,x2,x3) including optimization expressions, fitting models, acquisition functions;
Determining an optimized expression, taking a Gaussian process as a fitting model, and taking a probability lifting function as an acquisition function:
f2(x1,x2,x3)=dtw(f1(x1,x2,x3),data);
Wherein, the right side of the formula equal sign represents the distance between the data and the original data after the filtering combination, and the left side of the formula equal sign represents a specific value of the distance; wherein dtw denotes a dynamic time adjustment distance calculation function, f 1(x1,x2,x3) denotes filter combination data, and data denotes original data;
Intercepting and replacing the sensor data in the fifth step: the sensor monitoring module is realized through Hook, and the bottom layer interception and replacement are carried out on the sensor transmission data interface; finding a module class android.hardware.System sensor manager for controlling distribution of sensor data in android8.0 system source code, and finding a specific sensor processing subclass SensorEventQueue and a distribution function dispatchSensorEvent in the module class; and carrying out Hook on the dispatchSensorEvent method under SystemSensorManager in the system service process, loading a pre-compiled substitution function module, and substituting the sensor interface by using the synthesized data.
It is a further object of the invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the sensor data protection method.
Another object of the present invention is to provide an information data processing terminal for implementing the sensor data protection method.
Another object of the present invention is to provide a sensor data protection system implementing the sensor data protection method, the sensor data protection system comprising:
the system initialization module is used for realizing user input action distribution, constructing a transition matrix through predefined action transition probability and the action distribution input by the user, and carrying out multiple iterative verification on whether stable distribution can be achieved or not, so as to provide feasibility support for subsequent generation of action sequences;
The simulation action sequence construction module is used for generating a behavior action sequence conforming to the predefined action distribution by using the suggested distribution and the accepted distribution of the constructed transfer matrix and combining a random walk algorithm, and providing data support for the follow-up sensor simulation data arrangement rule;
The sensor simulation data generation module is used for generating an countermeasure network model by training real data in advance, so that the accuracy of simulation data generated by the model under an action recognition task reaches more than 90%, generating multiple groups of data for each action as a buffer, and providing an original data template for a subsequent simulation data space expansion task;
the expansion simulation data space module is used for taking out the data of each action in the buffer zone, combining the data according to the filtering combination rule, and selecting a plurality of parameters which can reach local optimum by using a Bayesian optimization algorithm, and because the data which have larger difference with the original data and are classified as the same can be generated, the problem of mode collapse possibly existing in the generation countermeasure network is properly solved;
The data combination and replacement module is used for generating a behavior action sequence according to the sensor simulation data to fill the simulation data into the sensor simulation data, replacing the sensor data in batches at a Hook sensor data distribution interface at the bottom layer of the mobile equipment, and protecting the privacy safety of the sensor data from a mobile terminal data distribution link.
Further, the sensor data protection system further includes: a generator and a discriminator;
the generator comprises an embedded functional module, a recovery functional module, an embedded recovery loss calculation module, a multi-scale circulation module and a time sequence functional module;
The embedded functional module is used for mapping data from a low dimension in an original space to a high dimension in a potential space; the recovery function module is connected with the embedded function module and is used for accurately recovering data from Gao Weiqian to a low-dimensional real space in space; the embedded recovery loss calculation module is used for calculating the difference between the real data processed by the embedded functional module and the recovery functional module and the original data and repeatedly training the embedded functional module and the recovery functional module, so that the original data can be accurately expressed in a high-dimensional space; the multi-scale circulation module is used for learning time domain characteristics of each dimension of the multi-sensor and correlation of time domain characteristics among the dimensions; the time sequence functional module is used for better representing the synthesized data output by the generator in the high-dimensional potential space in the countermeasure training process;
the discriminator comprises a binary judging functional module and a similarity calculating module; the binary judgment functional module is used for distinguishing real data from synthetic data in the countermeasure training process; the similarity calculation module is connected with the binary judgment functional module and is used for calculating cosine similarity between the low-dimensional original space synthesized data and the real data;
The embedded functional module and the recovery functional module are both composed of a multi-scale circulating neural network and a full-connection network layer, wherein the multi-scale circulating neural network is composed of one-dimensional circulating neural network layers with different sizes, and the output of each node of the last layer of the multi-scale circulating neural network is used as the input of the full-connection layer;
The time sequence functional module comprises a full-connection network and a GRU network;
the embedded recovery loss calculation module calculates the difference degree between the original data and the data processed by the embedded functional module and the recovery functional module by adopting the following formula;
The binary judgment module calculates the difference between the real data and the synthesized data by adopting the following loss function in the training process.
By combining all the technical schemes, the invention has the advantages and positive effects that: the sensor data replacement and the simulated data generation method of the Android platform are combined, the defect that the existing scheme is poor in real-time performance is overcome, the full life cycle privacy safety of the sensor data is protected from the data generation link of the mobile terminal, and meanwhile malicious theft and analysis of user privacy by a third party are prevented at an application server.
The invention generates a behavior action sequence conforming to the predefined distribution and the predefined transition probability through a random walk algorithm based on a Markov chain Monte Carlo method; generating sensor simulation data conforming to predefined classification by a sensor simulation data generation method based on a time sequence generation countermeasure network and a simulation data space expansion method based on filtering combination and Bayesian optimization, wherein the simulation data have obvious differences; by omnidirectionally replacing the mobile terminal sensor data at all times, the user privacy protection effect of complete anonymization in the whole life cycle can be achieved aiming at the mobile device sensor.
According to the method, interception and replacement of Android platform sensor data are combined with an Android terminal sensor data protection strategy based on multi-sensor simulation data replacement, the privacy of the sensor data is protected from a data link issued by a mobile terminal, meanwhile, malicious inference of an attacker on user privacy is effectively prevented at a server terminal, and the user privacy is prevented from being stolen. According to the invention, the statistical learning method and the deep learning method are applied to the data privacy protection of the Android mobile terminal sensor, so that the capability of an attacker to infer user privacy can be eliminated. Even if an attacker collects sensor data of a user for a long time, the security of privacy protection is not affected. The invention provides a method for generating the simulation data generated by the countermeasure network, so that the simulation data can keep similar precision with the real data when being subjected to inference such as action classification. According to the invention, the expansion of the space of the simulated data is realized by adopting the filtering combination and the Bayesian optimization algorithm, and the filtering combination increases the gap between the filter combination and the original data in the time domain on the premise of guaranteeing the frequency domain characteristics, so that the space of the simulated data is more conveniently increased.
The invention can ensure the privacy safety of users at the mobile terminal and the server terminal, and simultaneously repeatedly use the generated counternetwork model and dynamically adjust the filtering parameters, thereby ensuring the usability recognition precision as much as possible and reducing the repetition rate, achieving better confusion effect at the server terminal and being insensitive to the background information owned by an attacker; the low-frequency resampling technology cannot guarantee the safety of the data of the mobile terminal due to the use of real data, can guarantee the reduction of the accuracy of the user privacy inference by the service terminal, but cannot be completely confused; the random data is completely used for intercepting and replacing the sensor data, so that the safety of the mobile terminal can be ensured, but the mobile terminal is easily identified as an abnormal user by the service terminal and the normal function service is terminated.
Drawings
Fig. 1 is a flowchart of a sensor data protection method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a sensor data protection system according to an embodiment of the present invention.
Fig. 3 is a flowchart of an implementation of a method for protecting sensor data according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a sensor data protection system according to an embodiment of the present invention.
Fig. 5 is a flowchart of an android terminal sensor data protection system based on multi-sensor simulated data replacement provided by an embodiment of the invention.
FIG. 6 is a schematic diagram of a common motion ratio and transition probability according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of stationary distribution convergence of a random walk algorithm according to an embodiment of the present invention.
Fig. 8 is a schematic diagram of a sensor simulation data generation model for generating a countermeasure network according to an embodiment of the present invention.
FIG. 9 is a graph of real data versus simulated data, behavior classification is a running schematic diagram, according to an embodiment of the present invention.
Fig. 10 is a graph of real data and simulated data, and behavior class is a schematic drawing of going downstairs.
FIG. 11 is a graph of real data versus simulated data, behavior classification is a walking schematic diagram, according to an embodiment of the present invention.
FIG. 12 is a graph showing the comparison of the low frequency filtered combination provided by the embodiment of the invention with the original data, the behavior is to go upstairs, and the compared data is the accelerometer X-axis data.
FIG. 13 is a graph showing the comparison of the high frequency filtered combination provided by the embodiment of the invention with the original data, the behavior is to go upstairs, and the compared data is the accelerometer X-axis data.
In the figure: 1. a system initialization module; 2. a simulated action sequence construction module; 3. the sensor simulation data generation module; 4. expanding a simulation data space module; 5. a data combining and replacing module; 100. a subsystem; 101. a generator; 102. a discriminator; 1011. an automatic codec; 1012. embedding a recovery loss calculation module; 1013. a multi-scale circulation module; 1014. a timing function module; 1021. a binary function discriminator; 1022. and a similarity calculation module.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a sensor data protection method, a system, a computer device and an intelligent terminal, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the method for protecting sensor data provided by the invention comprises the following steps:
s101, initializing a system, inputting action distribution by a user, constructing a transition matrix through predefined action transition probability and the action distribution input by the user, and performing multiple iterative verification to determine whether stable distribution can be achieved or not, so as to provide feasibility support for subsequent generation of an action sequence;
S102, constructing a simulated action sequence, namely generating a behavior action sequence conforming to predefined action distribution by using suggested distribution and accepted distribution of a constructed transfer matrix and combining a random walk algorithm, and providing data support for a follow-up sensor simulated data arrangement rule;
S103, generating sensor simulation data, training and generating an countermeasure network model by using real data in advance, enabling the accuracy of simulation data generated by the model under an action recognition task to reach more than 90%, generating multiple groups of data for each action to serve as a buffer, and providing an original data template for a follow-up simulation data space expansion task;
S104, expanding the simulated data space, taking out the data of each action in the buffer area, combining the data according to the filtering combination rule, and selecting a plurality of parameters which can reach local optimum by using a Bayesian optimization algorithm, wherein the data which have larger difference with the original data and are classified as the same can be generated, so that the problem of mode collapse possibly existing in the generation countermeasure network is properly solved;
s105, combining and replacing data, generating a behavior action sequence according to the sensor simulation data, filling the simulation data into the sensor simulation data, replacing the sensor data in batches at a Hook sensor data distribution interface at the bottom layer of the mobile equipment, and protecting the privacy safety of the sensor data from a mobile terminal data distribution link.
Other steps may be performed by those skilled in the art of the sensor data protection method provided by the present invention, and the sensor data protection method provided by the present invention of fig. 1 is merely one specific embodiment.
As shown in fig. 2 and 4, the sensor data protection system provided by the present invention includes:
The system initialization module 1 is used for realizing user input action distribution, constructing a transition matrix through predefined action transition probability and the action distribution input by the user, and carrying out multiple iterative verification to determine whether stable distribution can be achieved or not, so as to provide feasibility support for subsequent generation of action sequences;
the simulated action sequence construction module 2 is used for generating a behavior action sequence conforming to the predefined action distribution by using the suggested distribution and the accepted distribution of the constructed transfer matrix and combining a random walk algorithm, and providing data support for the simulated data arrangement rule of the subsequent sensor;
The sensor simulation data generation module 3 is used for generating an countermeasure network model by training real data in advance, so that the accuracy of simulation data generated by the model under an action recognition task reaches more than 90%, generating multiple groups of data for each action as a buffer, and providing an original data template for a subsequent simulation data space expansion task;
the extended simulation data space module 4 is used for taking out the data of each action in the buffer zone, combining the data according to the filtering combination rule, and selecting a plurality of parameters which can reach local optimum by using a Bayesian optimization algorithm, and because the data which have larger difference with the original data and are classified as the same can be generated, the problem of mode collapse possibly existing in the generation countermeasure network is properly solved;
The data combination and replacement module 5 is used for generating a behavior action sequence according to the sensor simulation data to fill the simulation data into the sensor simulation data, replacing the sensor data in batches at a Hook sensor data distribution interface at the bottom layer of the mobile equipment, and protecting the privacy safety of the sensor data from the data release link of the mobile terminal.
The technical scheme of the invention is further described below with reference to specific embodiments.
Example 1:
The sensor data protection method provided by the invention achieves the privacy protection effect in full life cycle and full anonymity by using the sensor simulation data sequence to replace the sensor data in all directions. The sensor data privacy protection method based on the multi-sensor simulation data replacement and the multi-sensor simulation data generation method are improved.
According to the sensor data privacy protection method, through analysis of the existing differential privacy protection method for privacy protection, the mobile device sensor is replaced all the time and all the directions by using highly-simulated multi-sensor data aiming at the limitation that real data processing is used in the current scheme to cause possible user background leakage risk. In the whole process, the server cannot acquire real data of the user, and the server can perform classification tasks such as action classification and the like on the input simulated data, but cannot acquire any background information and privacy data of the user.
In order to generate an action sequence conforming to a predefined distribution, the multi-sensor simulation data generation method designs a random walk algorithm based on a Markov chain Monte Carlo method, and improves the receiving distribution in the process of constructing a Markov chain by introducing transition probability among actions as suggested distribution for constructing the Markov matrix, so that the random walk algorithm can generate a behavior action sequence conforming to the predefined distribution after finishing; in order to generate multi-sensor simulation data with good performance and low repetition rate in action classification tasks, a filtering combination method based on a time sequence generation countermeasure network model and Bayesian optimization is designed, data sensor data conforming to predefined classification is generated through a time sequence generation countermeasure network, the space of the simulation data is enlarged by introducing the filtering combination method, and the Bayesian optimization is used for searching filtering combination parameters conforming to requirements, so that the problem of mode collapse of the generated countermeasure network is solved.
The steps are used for generating the simulated data sequence conforming to the multiple sensors of the mobile device in the real scene.
Example 2:
According to the android terminal sensor data protection method based on multi-sensor simulation data replacement, the sensor data replacement of the Andr o id platform is combined with the simulation data generation method, the privacy safety of the sensor data is protected from the data generation link of the mobile terminal, and meanwhile malicious theft and analysis of user privacy by a third party are prevented at an application server; the method specifically comprises the following steps:
step one, generating an action sequence: a random walk algorithm based on a Markov chain Monte Carlo method is adopted, a state transition matrix is constructed according to action proportion preset by a user, and generation of false action sequences is achieved; using a Monte Carlo method, constructing a Markov transfer matrix adopts a transfer kernel formula as follows:
p(x,x′)=q(x,x′)α(x,x′);
where q (x, x ') is referred to as a proposed distribution and α (x, x') is referred to as a received distribution. Assuming that the proposed distribution is symmetrical, the reception distribution is:
where p (x ') represents the duty cycle of state x', and p (x) represents the duty cycle of state x. The proposal distribution is the transition probability from the state x to the state x', which satisfies Where X represents a set of states adjacent to and including state X.
Step two, preliminary generation of simulation data: generating multi-sensor realistic data using a time-series generation countermeasure network, comprising: the system comprises a generator and a discriminator, wherein the generator comprises an embedded functional module, a recovery functional module, an embedded recovery loss calculation module, a multi-scale circulation module and a time sequence functional module; the embedded functional module is used for mapping data from a low dimension in an original space to a high dimension in a potential space; the recovery function module is connected with the embedded function module and is used for accurately recovering data from Gao Weiqian to a low-dimensional real space in space; the embedded recovery loss calculation module is used for calculating the difference between the real data processed by the embedded functional module and the recovery functional module and the original data and repeatedly training the embedded functional module and the recovery functional module, so that the original data can be accurately expressed in a high-dimensional space; the multi-scale circulation module is used for learning time domain characteristics of each dimension of the multi-sensor and correlation of time domain characteristics among the dimensions; the time sequence functional module is used for better representing the synthesized data output by the generator in the high-dimensional potential space in the countermeasure training process;
The discriminator comprises a binary judging functional module and a similarity calculating module; the binary judgment functional module is used for distinguishing real data from synthetic data in the countermeasure training process; the similarity calculation module is connected with the binary judgment functional module and is used for calculating cosine similarity between the low-dimensional original space synthesized data and the real data.
The cost function of the model is:
Wherein the first part on the right side of the formula equal sign represents the desire of the discriminant to train on the real data of the high-dimensional potential space representation, and the second part represents the desire of the discriminant to train on the synthesized data of the high-dimensional potential space synthesized by the generator; where G represents the generator network, D represents the arbiter network, E represents the expectation, X-p data (X) represents the real data sampled from the true dataset, log represents the log function, X represents the real data that Gao Weiqian represents in space, z-p z (z) represents the random noise vector sampled from the normal distribution, and z represents the random noise vector.
The embedded functional module and the recovery functional module are both composed of a multi-scale cyclic neural network and a full-connection network layer, the multi-scale cyclic neural network is composed of one-dimensional cyclic neural network layers with different sizes, and the output of each node of the last layer of the multi-scale cyclic neural network is used as the input of the full-connection layer.
The time sequence functional module comprises a full-connection network and a GRU network.
The embedding recovery loss calculation module calculates the difference degree between the original data and the data processed by the embedding functional module and the recovery functional module by adopting the following formula:
Where l R denotes the degree of difference between the original data and the recovered data, E denotes the mathematical expectation, x t denotes the original data, Representing data from which the original data maps to the potential space and from which the potential space maps to the original space, ||.
The binary judgment module calculates the difference between the real data and the synthesized data by adopting the following loss function in the training process:
Where l U denotes a cross entropy function of the real data and the synthesized data, y t denotes the real data, Representing the composite data.
Expanding the simulated data space: the expansion of the data space is realized by adopting a filtering combination method, and meanwhile, each parameter of the filtering combination is searched by adopting a Bayesian optimization method.
The filtering combination method is to set cut-off frequency and combination proportion according to the generated simulation data generated by the countermeasure network, and then combine the original data and the filtering data according to the following formula:
f1(x1,x2,x3)=x1*filter(x2,data)+x3*data;
Wherein, the first part on the right of the equation equal sign represents a certain proportion of original data, and the second part represents a certain proportion of filtered data; x 1 represents the proportion of the filtered data in the combined data, x 2 represents the cut-off frequency of the filter, data represents the original data, filter (x 2, data) represents the data after the filtering process, and x 3 represents the proportion of the original data in the combined data; the left side of the formula equal sign represents the result after the filtering combination.
A bayesian optimization algorithm is used to find parameters (x 1,x2,x3) including optimizing expressions, fitting models, acquisition functions.
The following formula is determined as an optimized expression, a Gaussian process is used as a fitting model, and a probability lifting function is used as an acquisition function:
f2(x1,x2,x3)=dtw(f1(x1,x2,x3),data);
wherein, the right side of the formula equal sign represents the distance between the data and the original data after the filtering combination, and the left side of the formula equal sign represents a specific value of the distance; wherein dtw denotes a dynamic time adjustment distance calculation function, f 1(x1,x2,x3) denotes filter combination data, and data denotes original data.
Step four, sensor data interception and replacement: the application programs under the Android system are hatched by Zygote processes; the executable program corresponding to Zygote process start is app_process, and Zygote is enabled to inject module code when starting the application program process by replacing app_process executable files and virtual machine dynamic link libraries of the system. The sensor monitoring module is realized through Hook, and the bottom layer interception and replacement are carried out on the sensor transmission data interface; by finding a module class android.hardware.systemsensor manager in the android8.0 system source code, a specific sensor processing subclass SensorEventQueue is found in the module class, and a distribution function dispatchSensorEvent therein; and carrying out Hook on the dispatchSensorEvent method under SystemSensorManager in the system service process, loading a pre-compiled substitution function module, and substituting the sensor interface by using the synthesized data.
In step one of the present invention, the system initialization specifically includes:
(1) The mobile end user inputs include proportions of several actions, including standing, walking, running, sitting, lying, going upstairs and downstairs, and the like, and fig. 6 illustrates proportions and transition probabilities defined in the embodiments.
(2) The state transition matrix P ij =p (i, j) i, j e S is calculated by the equation P (x, x ')=q (x, x ') α (x, x ') by the action distribution ratio and inter-action transition probability in fig. 6, where S represents all behavior action states. By initializing vector λ 0 = {1,0,0,0,0,0}, bring into formula λ t=λt-1 P, where P represents the state transition matrix, resulting in a distribution at t-round iteration, fig. 7 shows the distribution convergence case in the embodiment.
In the second step of the present invention, the modeling of the action sequence specifically includes:
Generating an action sequence by using a random walk algorithm based on a Markov chain Monte Carlo method, wherein the receiving distribution provided in the step one is directly used in the random walk algorithm:
where p (x ') represents the distribution of state x', and p (x) represents the distribution of state x.
The generation of the method is described in detail below in the form of pseudo code.
The procedure of the random walk algorithm is described in detail above.
In step three of the present invention, fig. 8 shows a system configuration according to an embodiment of the present invention. As can be seen from the figure, the subsystem 100 of the present invention includes a generator 101 and a discriminator 102, where the objective of the generator 101 is to fully utilize the potential time domain frequency domain characteristics of the sensor data itself to learn the distribution characteristics of the sensor real data, so as to generate sensor simulation data that is more similar to the real distribution; the objective of the arbiter 102 is to combine the real data with the composite data for binary classification, to enhance the classifier effect during the countermeasure training, and to measure the generator effect. The sensor simulation data generation specifically comprises the following steps:
(1) Carrying out min-max normalization on data in a real data set, storing the minimum value and the maximum value of the real data set, and preparing data for restoring the model to an original scale after generating simulated data;
(2) In the model training process, the automatic codec 1011 under the generator 101 shown in fig. 8 is trained first, and the purpose of the automatic codec 1011 is to be able to accurately map data from the low-dimensional original space to the high-dimensional potential space and to accurately restore Gao Weiqian data in the space to the low-dimensional original space; the real data used for training is brought into an embedded functional module, the real data of the low-dimensional original space is mapped to a high-dimensional potential space, and the high-dimensional form of the real data is brought into a recovery functional module to obtain the data of the original dimension; the loss function of the embedded recovery loss calculation module 1012 under the generator 101 is:
the formula represents the loss function of training the automatic codec, X t represents the raw data of the t batches, Representing the data after recovery of the t batches, the L2 norm is calculated and Σ represents the summation.
(3) The purpose of the timing function module under the generator 101 shown in fig. 8 is to capture the spatial characteristics of the real data Gao Weiqian, process the real data through the embedded function module under the automatic codec 1011, bring the timing function module 1014 into and output the real data, and perform binary cross entropy operation on the output result and the high-dimensional spatial result, where the loss function of the timing function module 1014 is:
Wherein h t represents the representation of real data at time t in Gao Weiqian in space, g X represents a time sequence function module function, h t-1 represents the representation of real data at time t-1 in high potential space, and z t represents random data at time t. According to one embodiment of the invention, the input-output dimensions of the multi-scale loop module 1013 are as follows:
time domain recurrent neural network input dimension (three-dimensional): [64, 128,9];
Time domain recurrent neural network output dimension (three-dimensional): [64, 128, 64];
time domain features fully connected network input dimension (three dimensions): [64, 128, 64];
Time domain feature fully connected network output dimension (three-dimensional): [64, 128, 64];
(4) The purpose of the binary function discriminator 1021 under the discriminator 102 shown in fig. 8 is to distinguish the real data from the synthesized data generated by the generator, the binary function discriminator 1021 needs to distinguish the real data from the synthesized data generated by the generator in a high-dimensional space, and the processing result of the real data by the discriminator and the processing result of the synthesized data need to satisfy the non-supervision loss function formula:
Where y t denotes the result of processing the real data by the arbiter, Representing the result of processing the composite data. According to one embodiment of the invention, the input-output dimensions of the multi-scale loop module 1013 are as follows:
Time domain recurrent neural network input dimension (three-dimensional): [64, 128, 64];
Time domain recurrent neural network output dimension (three-dimensional): [64, 128, 64];
Classifying full connection layer input dimension (three-dimensional): [64, 128, 64];
classification full connection layer output dimension (three-dimensional): [64, 128,1].
(5) The similarity calculation module 1022 under the arbiter 102 shown in fig. 8 is for verifying the distribution of the synthesized data and the original data, and needs to calculate the similarity between the real data and the real data, and the similarity between the real data and the simulated data, and if the two values are close, it indicates that the distribution of the simulated data is close to the distribution of the real data; the similarity between the simulated data and the simulated data needs to be calculated, so that a gap is reserved between the simulated data; the maximum similarity between the real data and the synthesized data needs to be calculated, so that the invention can know the similarity between the data generated by the invention and the real data under the most similar condition, and the method is valuable for ensuring that the privacy of the user is protected. If the maximum similarity between some of the simulated data and the original data is higher than 80%, the data processing operation is required by using the fifth step. Table 1 shows cosine similarity of each action under each index when 50 sets of data are synthesized.
TABLE 1
Activity | Similarity of real data to real data | Similarity of synthetic data to synthetic data | True data pair-mediated data similarity | Synthetic data similarity maximum for real data |
Downstairs | 0.6790 | 0.2918 | 0.3011 | 0.7998 |
Stair climbing | 0.3711 | 0.1326 | 0.1997 | 0.7997 |
Walking on | 0.9150 | 0.2230 | 0.1237 | 0.7997 |
Running | 0.2829 | 0.1067 | 0.0627 | 0.7801 |
Standing up | 0.4280 | 0.3459 | 0.3898 | 0.7991 |
Average of | 0.5352 | 0.2200 | 0.2154 | 0.7957 |
The training process is described in detail below in the form of pseudo code.
The above describes in detail the course of the countermeasure training
In the fourth step, expanding the simulated data space specifically includes:
(1) The formula for synthesizing the simulated data is as follows:
xo(n)=ra*x(n)+rb*filter(x(n),ft);
Wherein, the first part on the right of the equal sign represents a certain proportion of original data in the combined data, and the second part represents a certain proportion of filtering data; r a denotes a proportion of original data, x (n) denotes a proportion of filtered data, r b denotes a proportion of filtered data, filter (n, f t) denotes filtered data, and f t denotes a cut-off frequency; the left hand side of the equal sign indicates the result of the filter combination.
(2) The bayesian optimization objective function is:
f1(x1,x2,x3)=x1*filter(x2,data)+x3*data;
f2(x1,x2,x3)=dtw(f1(x1,x2,x3),data);
wherein x 1 represents the filtered data proportion, x 2 represents the cut-off frequency, x 3 represents the original data proportion, data represents the original data, dtw is a dynamic time warping distance calculation function to calculate the similarity of two time sequences, and the method is particularly suitable for time sequences with different lengths and different rhythms and used as an index for measuring the difference between the filtered combined data and the original data. Figure 12 shows a comparison of low pass, high pass filtered data of the accelerometer x-axis with raw data for an up-stairs maneuver.
The training process of step four is described in detail below in the form of a pseudo code.
The training process of step four is described in detail above.
In the fifth step, the data combination and replacement specifically includes:
(1) After setting the start-stop time and the behavior action proportion, the mobile terminal user acquires the sensor data set which is spliced perfectly from the algorithm model, and waits for real-time replacement.
(2) The invention uses Hook to realize the interception module and realize the replacement of the real-time sensor data. The process of Zygote starts to monitor the class android, hardware, system sensor manager of the distributed sensor in the system, and the sensor data processing subclass SensorEventQueue and the distributing method dispatchSensorEvent below, and waits for the data replacement module to operate.
(3) And combining the obtained simulation data, completing a data replacement module by means of a Java iterator, and packaging a sensor data distribution interface in the Android system, so that a group of simulation data is consumed when the system calls the interface once.
It should be noted that the embodiments of the present invention can be realized in hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or special purpose design hardware. Those of ordinary skill in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such as provided on a carrier medium such as a magnetic disk, CD or DVD-ROM, a programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The device of the present invention and its modules may be implemented by hardware circuitry, such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., as well as software executed by various types of processors, or by a combination of the above hardware circuitry and software, such as firmware.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.
Claims (9)
1. A method of protecting sensor data, the method comprising the steps of:
Firstly, initializing a system, inputting motion distribution by a user, wherein the input motion comprises the proportion of standing, walking, running, sitting, lying, going upstairs and downstairs motions, constructing a transition matrix through predefined motion transition probability and the motion distribution input by the user, and performing multiple iterative verification to determine whether stable distribution is achieved or not, so as to provide feasibility support for a subsequent generation motion sequence; by the formula Calculating a state transition matrixIn which, in the process,Representing all behavior action states; by initialising vectorsCarry-over formulaIn the followingRepresenting the state transition matrix to obtainDistribution during round iteration;
Secondly, constructing a simulated action sequence, namely generating a behavior action sequence conforming to predefined action distribution by using suggested distribution and accepted distribution of a constructed transfer matrix and combining a random walk algorithm, and providing data support for a follow-up sensor simulated data arrangement rule; generating an action sequence by using a random walk algorithm based on a Markov chain Monte Carlo method, wherein the receiving distribution is directly used in the random walk algorithm:
;
in the method, in the process of the invention, Representing the statusIs provided for the distribution of (a),Representing the statusIs a distribution of (3);
thirdly, generating sensor simulation data, training and generating an countermeasure network model by using real data in advance, enabling the accuracy of simulation data generated by the model under an action recognition task to reach more than 90%, generating multiple groups of data for each action to serve as a buffer, and providing an original data template for a follow-up simulation data space expansion task;
Expanding a simulated data space, taking out the data of each action in the buffer area, combining according to a filtering combination rule, and selecting a plurality of parameters capable of achieving local optimization by using a Bayesian optimization algorithm;
Fifthly, combining and replacing data, generating a behavior action sequence according to the sensor simulation data, filling the simulation data into the sensor simulation data, replacing the sensor data in batches at a Hook sensor data distribution interface at the bottom layer of the mobile equipment, and protecting the privacy safety of the sensor data from a mobile terminal data distribution link.
2. The method of claim 1, wherein the method of sensor data protection replaces sensor data by using a sequence of sensor-mimetic data omnidirectionally at all times.
3. The sensor data protection method of claim 1, wherein the multi-sensor simulation data generation adopts a random walk algorithm based on a markov chain monte carlo method, and by introducing transition probabilities between actions as suggested distribution for constructing a markov matrix, the receiving distribution in the process of constructing the markov chain is improved, so that a behavior action sequence conforming to a predefined distribution can be generated after the random walk algorithm is finished;
The multi-sensor simulation data generation adopts a filtering combination method based on a time sequence generation countermeasure network model and Bayesian optimization, generates data sensor data conforming to predefined classification through the time sequence generation countermeasure network, introduces a filtering combination method and searches filtering combination parameters conforming to requirements by using Bayesian optimization.
4. The method for protecting sensor data according to claim 1, wherein the simulated action sequence of the second step is generated by: a random walk algorithm based on a Markov chain Monte Carlo method is adopted, a state transition matrix is constructed according to action proportion preset by a user, and generation of false action sequences is achieved; using a Monte Carlo method, constructing a Markov transfer matrix adopts a transfer kernel formula as follows:
;
In the middle of Known as a suggested distribution,Known as a receive profile; the proposed distribution is symmetrical and the reception distribution is:
;
in the method, in the process of the invention, Representing the statusIs used in the present invention,Representing the statusIs the ratio of (2); the advice is distributed as slave statesTo stateIs satisfied with the transition probability of (1)In which, in the process,Representation and statusAdjacent state sets and includes states;
The cost function of the countermeasure network model in the third step is as follows:
;
wherein the first part on the right side of the formula equal sign represents the desire of the discriminant to train on the real data of the high-dimensional potential space representation, and the second part represents the desire of the discriminant to train on the synthesized data of the high-dimensional potential space synthesized by the generator; wherein, The network of the generator is represented,Representing a network of discriminators,It is indicated that the desire is to be met,Representing real data sampled from a real dataset,A logarithmic function is represented and is used to represent,The real data is represented by a representation of the real data,Real data representing a high-dimensional potential spatial representation,Representing a random noise vector sampled from a normal distribution,Representing a random noise vector;
The embedding recovery loss calculation adopts the following formula to calculate the degree of difference between the original data and the data processed by the embedding functional module and the recovery functional module:
;
in the method, in the process of the invention, Representing the degree of difference of the original data from the recovered data,Representing the mathematical expectation that the data will be,The original data is represented by a representation of the original data,Representing data that maps original data from an original space to a potential space, and from the potential space to the original space,Represents an L2 norm;
the binary judgment module calculates the difference between the real data and the synthesized data by adopting the following loss function in the training process:
;
in the method, in the process of the invention, Represents the cross entropy function of the real data and the synthesized data,The real data is represented by a representation of the real data,Representing the composite data.
5. The method for protecting sensor data according to claim 1, wherein the expansion simulation data space in the fourth step adopts a filter combination method to realize expansion of the data space, and adopts a Bayesian optimization method to search each parameter of the filter combination;
The filtering combination method is to combine the original data and the filtering data according to a formula after the frequency and the combination proportion are cut off according to the simulated data generated by the generating countermeasure network:
;
wherein, the first part on the right of the equation equal sign represents a certain proportion of original data, and the second part represents a certain proportion of filtered data; representing the proportion of filtered data in the combined data, Representing the cut-off frequency of the filter,The original data is represented by a representation of the original data,Representing the data after the filtering process,Representing the proportion of the original data in the combined data; the left side of the formula equal sign shows the result after the filtering combination;
searching parameters using bayesian optimization algorithm The method comprises the steps of optimizing an expression, fitting a model and acquiring a function;
Determining an optimized expression, taking a Gaussian process as a fitting model, and taking a probability lifting function as an acquisition function:
;
Wherein, the right side of the formula equal sign represents the distance between the data and the original data after the filtering combination, and the left side of the formula equal sign represents a specific value of the distance; wherein, Representing a dynamic time adjustment distance calculation function,Representing the filtered combined data is represented by a filtered version,Showing the raw data;
Intercepting and replacing the sensor data in the fifth step: the sensor monitoring module is realized through Hook, and the bottom layer interception and replacement are carried out on the sensor transmission data interface; finding a module class android.hardware.System sensor manager for controlling distribution of sensor data in android8.0 system source code, and finding a specific sensor processing subclass SensorEventQueue and a distribution function dispatchSensorEvent in the module class; and carrying out Hook on the dispatchSensorEvent method under SystemSensorManager in the system service process, loading a pre-compiled substitution function module, and substituting the sensor interface by using the synthesized data.
6. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the sensor data protection method of any one of claims 1 to 5.
7. An information data processing terminal, characterized in that the information data processing terminal is configured to implement the sensor data protection method according to any one of claims 1 to 5.
8. A sensor data protection system for implementing the sensor data protection method of any one of claims 1 to 5, wherein the sensor data protection system comprises:
the system initialization module is used for realizing user input action distribution, constructing a transition matrix through predefined action transition probability and the action distribution input by the user, and carrying out multiple iterative verification on whether stable distribution can be achieved or not, so as to provide feasibility support for subsequent generation of action sequences;
The simulation action sequence construction module is used for generating a behavior action sequence conforming to the predefined action distribution by using the suggested distribution and the accepted distribution of the constructed transfer matrix and combining a random walk algorithm, and providing data support for the follow-up sensor simulation data arrangement rule;
The sensor simulation data generation module is used for generating an countermeasure network model by training real data in advance, so that the accuracy of simulation data generated by the model under an action recognition task reaches more than 90%, generating multiple groups of data for each action as a buffer, and providing an original data template for a subsequent simulation data space expansion task;
the expansion simulation data space module is used for taking out the data of each action in the buffer zone, combining the data according to the filtering combination rule, and selecting a plurality of parameters which can reach local optimum by using a Bayesian optimization algorithm, and because the data which have larger difference with the original data and are classified as the same can be generated, the problem of mode collapse possibly existing in the generation countermeasure network is properly solved;
The data combination and replacement module is used for generating a behavior action sequence according to the sensor simulation data to fill the simulation data into the sensor simulation data, replacing the sensor data in batches at a Hook sensor data distribution interface at the bottom layer of the mobile equipment, and protecting the privacy safety of the sensor data from a mobile terminal data distribution link.
9. The sensor data protection system of claim 8, wherein the sensor data protection system further comprises: a generator and a discriminator;
the generator comprises an embedded functional module, a recovery functional module, an embedded recovery loss calculation module, a multi-scale circulation module and a time sequence functional module;
The embedded functional module is used for mapping data from a low dimension in an original space to a high dimension in a potential space; the recovery function module is connected with the embedded function module and is used for accurately recovering data from Gao Weiqian to a low-dimensional real space in space; the embedded recovery loss calculation module is used for calculating the difference between the real data processed by the embedded functional module and the recovery functional module and the original data and repeatedly training the embedded functional module and the recovery functional module, so that the original data can be accurately expressed in a high-dimensional space; the multi-scale circulation module is used for learning time domain characteristics of each dimension of the multi-sensor and correlation of time domain characteristics among the dimensions; the time sequence functional module is used for better representing the synthesized data output by the generator in the high-dimensional potential space in the countermeasure training process;
the discriminator comprises a binary judging functional module and a similarity calculating module; the binary judgment functional module is used for distinguishing real data from synthetic data in the countermeasure training process; the similarity calculation module is connected with the binary judgment functional module and is used for calculating cosine similarity between the low-dimensional original space synthesized data and the real data;
The embedded functional module and the recovery functional module are both composed of a multi-scale circulating neural network and a full-connection network layer, wherein the multi-scale circulating neural network is composed of one-dimensional circulating neural network layers with different sizes, and the output of each node of the last layer of the multi-scale circulating neural network is used as the input of the full-connection layer;
The time sequence functional module comprises a full-connection network and a GRU network;
the embedded recovery loss calculation module calculates the difference degree between the original data and the data processed by the embedded functional module and the recovery functional module by adopting the following formula;
The binary judgment module calculates the difference between the real data and the synthesized data by adopting the following loss function in the training process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210253232.2A CN114817976B (en) | 2022-03-15 | 2022-03-15 | Sensor data protection method, system, computer equipment and intelligent terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210253232.2A CN114817976B (en) | 2022-03-15 | 2022-03-15 | Sensor data protection method, system, computer equipment and intelligent terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114817976A CN114817976A (en) | 2022-07-29 |
CN114817976B true CN114817976B (en) | 2024-07-23 |
Family
ID=82529651
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210253232.2A Active CN114817976B (en) | 2022-03-15 | 2022-03-15 | Sensor data protection method, system, computer equipment and intelligent terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114817976B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112835709A (en) * | 2020-12-17 | 2021-05-25 | 华南理工大学 | Method, system and medium for generating cloud load time sequence data based on generation countermeasure network |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2305249A1 (en) * | 2000-04-14 | 2001-10-14 | Branko Sarcanin | Virtual safe |
US9128823B1 (en) * | 2012-09-12 | 2015-09-08 | Emc Corporation | Synthetic data generation for backups of block-based storage |
KR101736007B1 (en) * | 2015-09-16 | 2017-05-15 | 한양대학교 에리카산학협력단 | Method and apparatus for verifying location and time of in-vehicle dashcam videos under owners' anonymity |
-
2022
- 2022-03-15 CN CN202210253232.2A patent/CN114817976B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112835709A (en) * | 2020-12-17 | 2021-05-25 | 华南理工大学 | Method, system and medium for generating cloud load time sequence data based on generation countermeasure network |
Also Published As
Publication number | Publication date |
---|---|
CN114817976A (en) | 2022-07-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Pei et al. | Deepxplore: Automated whitebox testing of deep learning systems | |
US10452923B2 (en) | Method and apparatus for integration of detected object identifiers and semantic scene graph networks for captured visual scene behavior estimation | |
US11995155B2 (en) | Adversarial image generation method, computer device, and computer-readable storage medium | |
EP3812988A1 (en) | Method for training and testing adaption network corresponding to obfuscation network capable of processing data to be concealed for privacy, and training device and testing device using the same | |
CN110659723B (en) | Data processing method and device based on artificial intelligence, medium and electronic equipment | |
CN110728330A (en) | Object identification method, device, equipment and storage medium based on artificial intelligence | |
KR101581112B1 (en) | Method for generating hierarchical structured pattern-based descriptor and method for recognizing object using the descriptor and device therefor | |
Lechner et al. | Adversarial training is not ready for robot learning | |
CN113408558B (en) | Method, apparatus, device and medium for model verification | |
KR20190056940A (en) | Method and device for learning multimodal data | |
US20220004904A1 (en) | Deepfake detection models utilizing subject-specific libraries | |
CN113254927B (en) | Model processing method and device based on network defense and storage medium | |
CN110765843A (en) | Face verification method and device, computer equipment and storage medium | |
Wu et al. | Sharing deep neural network models with interpretation | |
CN112099848B (en) | Service processing method, device and equipment | |
Liu et al. | RGB‐D human action recognition of deep feature enhancement and fusion using two‐stream convnet | |
CN113240430A (en) | Mobile payment verification method and device | |
CN114022713A (en) | Model training method, system, device and medium | |
CN115601629A (en) | Model training method, image recognition method, medium, device and computing equipment | |
CN116403253A (en) | Face recognition monitoring management system and method based on convolutional neural network | |
CN115439708A (en) | Image data processing method and device | |
Pan et al. | Magthief: Stealing private app usage data on mobile devices via built-in magnetometer | |
CN113762331A (en) | Relational self-distillation method, apparatus and system, and storage medium | |
CN114817976B (en) | Sensor data protection method, system, computer equipment and intelligent terminal | |
JP2021093144A (en) | Sensor-specific image recognition device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |