CN114817976B

CN114817976B - Sensor data protection method, system, computer equipment and intelligent terminal

Info

Publication number: CN114817976B
Application number: CN202210253232.2A
Authority: CN
Inventors: 朱辉; 文浩斌; 李晖; 王枫为; 薛行策; 张璇
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2024-07-23
Anticipated expiration: 2042-03-15
Also published as: CN114817976A

Abstract

The invention belongs to the technical field of information data security, and discloses a sensor data protection method, a system, computer equipment and an intelligent terminal, wherein a random walk algorithm and a training method for generating an countermeasure network are adopted, a user does not need to define a specific action sequence or consume local computing resources to perform data synthesis, the user only needs to define the proportion of each action before use, then predefined data is transmitted to a cloud server, the cloud server completes action sequence construction and multi-sensor simulation data generation, a simulation data set formed by combining the simulation data with the action sequence is transmitted to a request initiator, the request initiator decomposes the simulation data set, and a Hook method is utilized to replace local sensor interface data, so that the effect of complete anonymization on a multi-sensor of mobile equipment is finally achieved.

Description

Sensor data protection method, system, computer equipment and intelligent terminal

技术领域Technical Field

本发明属于信息数据安全技术领域，尤其涉及一种传感器数据保护方法、系统、计算机设备及智能终端。The present invention belongs to the technical field of information data security, and in particular relates to a sensor data protection method, system, computer equipment and intelligent terminal.

背景技术Background technique

随着移动互联网的发展，智能终端、位置服务等新技术的融合催生了移动应用与服务的空前发展。嵌入个人智能设备的传感器，在面向个性化定制的移动应用中更是为用户带来了便捷的使用体验。如加速度计、陀螺仪和磁力计，产生的数据可用于监测用户的物理活动、互动以及情绪。安装在可穿戴设备上的应用程序可以获得原始的传感器数据，为手势识别或者活动识别等任务做出推断。现有研究表明：运动传感器能够作为媒介被侧信道攻击利用，窃取用户的敏感输入、获得用户的运动状态、识别并追踪特点设备。更重要的是，获取传感器数据并不需要用户授予权限，这导致基于运动传感器的隐私数据推断易于实现，并且隐蔽性极强。With the development of mobile Internet, the integration of new technologies such as smart terminals and location services has led to unprecedented development of mobile applications and services. Sensors embedded in personal smart devices have brought users a convenient user experience in personalized mobile applications. Data generated by sensors such as accelerometers, gyroscopes, and magnetometers can be used to monitor users' physical activities, interactions, and emotions. Applications installed on wearable devices can obtain raw sensor data and make inferences for tasks such as gesture recognition or activity recognition. Existing research shows that motion sensors can be used as a medium for side channel attacks to steal users' sensitive inputs, obtain users' motion status, and identify and track characteristic devices. More importantly, obtaining sensor data does not require users to grant permissions, which makes it easy to implement privacy data inference based on motion sensors and is extremely concealed.

目前传感器数据隐私保护策略使用虚假随机数据或重采样等失真数据给应用程序等类似方法，必然会降低传感器数据在可用性识别上的精度和准确度，例如动作识别与步数计算，使提供的数据与真实的数据有明显差异，而且虚假随机数据容易被服务商识别，可能引起应用程序的崩溃从而无法为用户提供服务。而且目前的防御策略均在保证可用性的前提下进行，未考虑到全生命周期的隐私保护，若用户同时需要保护运动信息等用户背景知识，则目前的防御策略无法做到有效保护。目前的拟真数据生成方法，局限于使用生成对抗网络解决生成问题，无法有效解决生成数据空间小的问题；当迭代次数达到一定数量之后，会出现相似度较高的拟真数据。且目前的拟真数据生成方法在生成数据时仅针对单一传感器，当应用程序需要多传感器联合判断时，无法完成多传感器的协同，从而造成对目标行为的拟真数据存在失真的情况。At present, the privacy protection strategy of sensor data uses false random data or distorted data such as resampling to provide applications and similar methods, which will inevitably reduce the precision and accuracy of sensor data in usability identification, such as action recognition and step counting, so that the provided data is significantly different from the real data, and false random data is easily identified by service providers, which may cause the application to crash and thus fail to provide services to users. In addition, the current defense strategies are all carried out under the premise of ensuring usability, without considering the privacy protection of the entire life cycle. If users also need to protect user background knowledge such as motion information, the current defense strategies cannot effectively protect it. The current simulation data generation method is limited to using generative adversarial networks to solve the generation problem, and cannot effectively solve the problem of small generated data space; when the number of iterations reaches a certain number, simulation data with high similarity will appear. In addition, the current simulation data generation method only targets a single sensor when generating data. When the application requires joint judgment of multiple sensors, it cannot complete the coordination of multiple sensors, resulting in distortion of the simulation data of the target behavior.

目前的防御措施存在一些弊端。提供随机的虚假数据或重采样后的失真数据给应用程序，会降低精度和准确度，带来较大的误差。提供模型处理后的模糊数据给应用程序，虽然会提高可用性，但模型处理时间较长，无法满足传感器数据的时效性，无法做到真正的可用。对于拟真数据生成，现有的基于生成对抗网络的数据生成方案存在模式崩溃的问题，仅在小批次、小范围内可行。现有技术未考虑到完全匿名性，当攻击者获得一定的背景知识时，会在一定程度上影响对用户隐私的保护。具体来说，例如专利《基于差分隐私的安卓终端传感器信息保护方法》，专利号为CN201810257632.4，该方法选择在真实数据中加入特定的Laplace噪声，由于使用了真实数据，相比于全混淆方法，此方案还是会泄露部分背景信息。专利《一种基于条件式生成对抗网络的传感器数据生成模型及方法》，专利号为CN202110312274.4，该方案模型只能解决生成问题，可以针对具体的动作生成具体的传感器数据，达到拟真数据生成的效果，但是解决生成对抗网络模型重复使用带来的模式崩溃问题，存在数据大批量重复问题，且模型生成数据时仅针对单一传感器，无法完成多传感器的协同。There are some drawbacks in the current defense measures. Providing random false data or distorted data after resampling to the application will reduce precision and accuracy and bring large errors. Providing fuzzy data processed by the model to the application will improve availability, but the model processing time is long, which cannot meet the timeliness of sensor data and cannot be truly available. For the generation of simulated data, the existing data generation scheme based on generative adversarial networks has the problem of mode collapse and is only feasible in small batches and small ranges. The existing technology does not take into account complete anonymity. When the attacker obtains certain background knowledge, it will affect the protection of user privacy to a certain extent. Specifically, for example, the patent "Android terminal sensor information protection method based on differential privacy", patent number CN201810257632.4, this method chooses to add specific Laplace noise to the real data. Due to the use of real data, compared with the full obfuscation method, this scheme will still leak some background information. The patent "A sensor data generation model and method based on conditional generative adversarial network", patent number is CN202110312274.4. This solution model can only solve the generation problem. It can generate specific sensor data for specific actions to achieve the effect of realistic data generation. However, it cannot solve the mode collapse problem caused by the repeated use of the generative adversarial network model. There is a problem of large-scale data duplication, and the model only generates data for a single sensor, and cannot complete the coordination of multiple sensors.

通过上述分析，现有技术存在的问题及缺陷为：Through the above analysis, the problems and defects of the prior art are as follows:

(1)目前的防御策略都存在提供虚假随机数据或重采样等失真数据给应用程序的情况，必然会降低其在可用性识别上的精度和准确度，使提供的数据与真实的数据有较大差异，而且虚假随机数据容易被服务商识别，可能引起应用程序的崩溃。而且目前的防御策略均在保证可用性的前提下进行，未考虑到全生命周期的隐私保护，若用户同时需要保护运动信息等用户背景知识，则目前的防御策略无法做到有效保护。(1) The current defense strategies all provide false random data or distorted data such as resampling to the application, which will inevitably reduce the precision and accuracy of its availability identification, making the provided data significantly different from the real data. Moreover, false random data can be easily identified by the service provider, which may cause the application to crash. Moreover, the current defense strategies are all carried out under the premise of ensuring availability, without considering the privacy protection of the entire life cycle. If the user also needs to protect the user background knowledge such as movement information, the current defense strategies cannot provide effective protection.

(2)目前的拟真数据生成方法，局限于使用生成对抗网络解决生成问题，无法有效解决生成数据空间小的问题；当迭代次数达到一定数量之后，会出现相似度较高的拟真数据。且目前的拟真数据生成方法在生成数据时仅针对单一传感器，当应用程序需要多传感器联合判断时，无法完成多传感器的协同，从而造成对目标行为的拟真数据存在失真的情况。(2) The current simulation data generation method is limited to using generative adversarial networks to solve the generation problem, and cannot effectively solve the problem of small generated data space; when the number of iterations reaches a certain number, simulation data with high similarity will appear. In addition, the current simulation data generation method only targets a single sensor when generating data. When the application requires multi-sensor joint judgment, it cannot complete the coordination of multiple sensors, resulting in distortion of the simulation data of the target behavior.

(3)现有技术未考虑到完全匿名性，当攻击者获得一定的背景知识时，就会在一定程度上影响对用户隐私的保护。(3) Existing technologies do not take complete anonymity into account. When the attacker obtains certain background knowledge, it will affect the protection of user privacy to a certain extent.

解决以上问题及缺陷的难度为：基于Android终端的传感器数据替换需要在移动设备运行过程中替换Android系统框架层的代码，难度较大；对于完全匿名性的全生命周期传感器数据隐私保护，针对用户指定的动作比例与转移概率，生成符合预定义分布及转移概率的动作序列，以及联合多传感器生成符合指定动作分类的拟真数据，The difficulty of solving the above problems and defects is as follows: sensor data replacement based on Android terminals requires replacing the code of the Android system framework layer during the operation of the mobile device, which is quite difficult; for the privacy protection of sensor data throughout the entire life cycle with complete anonymity, for the action proportion and transition probability specified by the user, an action sequence that conforms to the predefined distribution and transition probability is generated, and multiple sensors are combined to generate simulated data that conforms to the specified action classification.

解决以上问题及缺陷的意义为：通过引入基于蒙特卡洛法的动作序列生成方法，适用于构建虚假的动作行为序列。通过引入基于时序生成对抗网络以及滤波组合的数据生成方法，适用于构建符合指定动作分类的拟真数据。该方法通过全时刻全方位替换传感器数据来达到全匿名隐私保护效果。大大提高了Android终端传感器信息的安全性，对未来移动终端的隐私保护有重要的理论价值和现实意义。The significance of solving the above problems and defects is as follows: By introducing the action sequence generation method based on the Monte Carlo method, it is suitable for constructing false action behavior sequences. By introducing the data generation method based on the time-series generative adversarial network and the filter combination, it is suitable for constructing simulated data that conforms to the specified action classification. This method achieves full anonymous privacy protection by replacing sensor data at all times and in all directions. It greatly improves the security of Android terminal sensor information and has important theoretical value and practical significance for the privacy protection of future mobile terminals.

发明内容Summary of the invention

针对现有技术存在的问题，本发明提供了一种传感器数据保护方法、系统、计算机设备及智能终端。In view of the problems existing in the prior art, the present invention provides a sensor data protection method, system, computer equipment and intelligent terminal.

本发明是这样实现的，一种传感器数据保护方法，所述传感器数据保护方法采用随机游走算法以及生成对抗网络的训练方法，进行数据合成，在使用之前定义各动作所占比例；将预定义数据交由云服务器，由云服务器完成动作序列构建以及多传感器拟真数据生成，并将拟真数据结合动作序列形成的拟真数据集合交由请求发起者；由请求发起者对拟真数据集合进行分解处理，并利用Hook方法对本地传感器接口数据进行替换，达到在移动设备多传感器上实现完全匿名化。The present invention is implemented as follows: a sensor data protection method, which uses a random walk algorithm and a training method of a generative adversarial network to perform data synthesis, and defines the proportion of each action before use; the predefined data is handed over to a cloud server, and the cloud server completes the action sequence construction and multi-sensor simulated data generation, and a simulated data set formed by combining the simulated data with the action sequence is handed over to a request initiator; the request initiator decomposes the simulated data set, and uses the Hook method to replace the local sensor interface data, so as to achieve complete anonymization on multiple sensors of a mobile device.

进一步，所述传感器数据保护方法通过全时刻全方位使用传感器拟真数据序列替换传感器数据。Furthermore, the sensor data protection method replaces the sensor data by using the sensor simulated data sequence at all times and in all directions.

进一步，所述多传感器拟真数据生成采用基于马尔科夫链蒙特卡洛法的随机游走算法，通过引入动作之间转移概率作为构造马尔科夫矩阵的建议分布，改进构造马尔科夫链过程中的接收分布，使随机游走算法结束后能够生成符合预定义分布的行为动作序列；Furthermore, the multi-sensor simulation data generation adopts a random walk algorithm based on the Markov chain Monte Carlo method, and by introducing the transition probability between actions as a suggested distribution for constructing the Markov matrix, the receiving distribution in the process of constructing the Markov chain is improved, so that after the random walk algorithm is completed, a behavior action sequence that conforms to the predefined distribution can be generated;

所述多传感器拟真数据生成采用基于时间序列的生成对抗网络模型结合基于贝叶斯优化的滤波组合方法，通过时间序列生成对抗网络产生符合预定义分类的数据传感器数据，引入滤波组合的方法并使用贝叶斯优化搜索符合要求的滤波组合参数。The multi-sensor simulated data generation adopts a time series-based generative adversarial network model combined with a filter combination method based on Bayesian optimization. The time series generative adversarial network generates data sensor data that conforms to a predefined classification, introduces a filter combination method and uses Bayesian optimization to search for filter combination parameters that meet the requirements.

进一步，所述传感器数据保护方法包括以下步骤：Further, the sensor data protection method comprises the following steps:

第一步，系统初始化，用户输入动作分布，输入动作包括站立、走路、跑步、坐、躺、上楼下楼动作的比例，通过预定义的动作转移概率与用户输入的动作分布构建转移矩阵，并进行多轮迭代验证是否达到平稳分布，为后续生成动作序列提供可行性支持；通过公式p(x，x′)＝q(x，x′)α(x，x′)计算状态转移矩阵P_ij＝p(i，j)i，j∈S，式中，S表示所有行为动作状态。通过初始化向量λ₀＝{1，0，0，0，0，0}，带入公式λ_t＝λ_t-1P，式中P表示状态转移矩阵，得到t轮迭代时的分布；The first step is to initialize the system. The user inputs the action distribution, including the proportion of standing, walking, running, sitting, lying, and going up and down stairs. The transfer matrix is constructed by the predefined action transition probability and the action distribution input by the user, and multiple rounds of iterations are performed to verify whether a stable distribution is achieved, providing feasibility support for the subsequent generation of action sequences; the state transfer matrix P _ij = p(i, j)i, j∈S is calculated by the formula p(x, x′) = q(x, x′) α(x, x′), where S represents all behavior action states. By initializing the vector λ ₀ = {1, 0, 0, 0, 0}, substituting it into the formula λ _t = λ _t-1 P, where P represents the state transfer matrix, the distribution after t rounds of iteration is obtained;

第二步，拟真动作序列构建，使用构建转移矩阵的建议分布与接受分布，结合随机游走算法生成符合预定义动作分布的行为动作序列，为后续传感器拟真数据排列规则提供数据支持；使用基于马尔科夫链蒙特卡罗方法的随机游走算法进行动作序列的生成，在随机游走算法中直接使用接收分布：The second step is to construct a simulated action sequence. The proposed distribution and acceptance distribution of the constructed transfer matrix are used in combination with the random walk algorithm to generate a behavior action sequence that conforms to the predefined action distribution, providing data support for the subsequent sensor simulation data arrangement rules; the random walk algorithm based on the Markov chain Monte Carlo method is used to generate the action sequence, and the acceptance distribution is directly used in the random walk algorithm:

式中，p(x′)表示状态x′的分布，p(x)表示状态x的分布；Where p(x′) represents the distribution of state x′, and p(x) represents the distribution of state x;

第三步，传感器拟真数据生成，预先使用真实数据训练生成对抗网络模型，使模型产生的拟真数据在动作识别任务下的准确率达到90％以上，对于每个动作均生成多组数据作为缓冲，为后续拟真数据空间扩充任务提供原始数据模板；The third step is to generate sensor simulated data. The adversarial network model is trained with real data in advance, so that the accuracy of the simulated data generated by the model in the action recognition task reaches more than 90%. For each action, multiple sets of data are generated as a buffer to provide the original data template for the subsequent simulated data space expansion task.

第四步，扩充拟真数据空间，将缓冲区中各个动作的数据取出，按照滤波组合规则进行结合，并使用贝叶斯优化算法选择多个能达到局部最优的参数；The fourth step is to expand the simulated data space, take out the data of each action in the buffer, combine them according to the filter combination rules, and use the Bayesian optimization algorithm to select multiple parameters that can achieve local optimality;

第五步，数据结合与替换，按传感器拟真数据生成照行为动作序列将拟真数据进行填入，在移动设备底层Hook传感器数据分发接口，将批量的传感器数据进行替换，从移动端数据发布环节开始保护传感器数据的隐私安全。The fifth step is data combination and replacement. The simulated data is filled in according to the behavioral action sequence generated by the sensor simulated data. The sensor data distribution interface is hooked at the bottom layer of the mobile device to replace the batch of sensor data. The privacy and security of sensor data are protected from the mobile terminal data release link.

进一步，所述第二步的拟真动作序列生成：采用基于马尔科夫链蒙特卡洛法的随机游走算法，根据用户预设的动作比例构建状态转移矩阵，实现虚假行为动作序列的生成；使用蒙特卡洛方法，构建马尔可夫转移矩阵采用转移核公式为：Furthermore, the second step of generating a simulated action sequence is as follows: a random walk algorithm based on the Markov chain Monte Carlo method is used to construct a state transfer matrix according to the action ratio preset by the user to generate a false action sequence; the Monte Carlo method is used to construct a Markov transfer matrix using the transfer kernel formula:

p(x，x′)＝q(x，x′)α(x，x′)；p(x, x′)=q(x, x′)α(x, x′);

式中，式中q(x，x′)称为建议分布，α(x，x′)称为接收分布；建议分布是对称的，接收分布为：In the formula, q(x, x′) is called the proposed distribution, and α(x, x′) is called the received distribution. The proposed distribution is symmetric, and the received distribution is:

式中，p(x′)表示状态x′的占比，p(x)表示状态x的占比；建议分布为从状态x到状态x′的转移概率，满足式中，X表示与状态x相邻的状态集合，且包括状态x；In the formula, p(x′) represents the proportion of state x′, and p(x) represents the proportion of state x; the recommended distribution is the transition probability from state x to state x′, satisfying In the formula, X represents the set of states adjacent to state x and includes state x;

所述第三步的对抗网络模型的价值函数为：The value function of the adversarial network model in the third step is:

其中，公式等号右侧第一部分表示判别器在高维潜在空间表示的真实数据上训练的期望，第二部分表示判别器在由生成器合成的高位潜在空间的合成数据上训练的期望；其中，G表示生成器网络，D表示判别器网络，E表示期望，x～p_data(x)表示从真是数据集中采样的真实数据，log表示对数函数，x表示真实数据，X表示高维潜在空间表示的真实数据，z～p_z(z)表示从正态分布采样的随机噪声向量，z表示随机噪声向量；Wherein, the first part on the right side of the formula equal sign represents the expectation of the discriminator trained on the real data represented by the high-dimensional latent space, and the second part represents the expectation of the discriminator trained on the synthetic data of the high-order latent space synthesized by the generator; where G represents the generator network, D represents the discriminator network, E represents the expectation, x~p _data (x) represents the real data sampled from the real data set, log represents the logarithmic function, x represents the real data, X represents the real data represented by the high-dimensional latent space, z~p _z (z) represents the random noise vector sampled from the normal distribution, and z represents the random noise vector;

嵌入恢复损失计算采用以下公式计算原始数据与经过嵌入功能模块、所述恢复功能模块处理的数据之间的差异度：The embedding recovery loss calculation uses the following formula to calculate the difference between the original data and the data processed by the embedding function module and the recovery function module:

式中，l_R表示原始数据与恢复后数据的差异度，E表示数学期望，x_t表示原始数据，表示原始数据从原始空间映射到潜在空间，并从潜在空间映射到原始空间的数据，||...||₂表示L2范数；In the formula, l _R represents the difference between the original data and the restored data, E represents the mathematical expectation, x _t represents the original data, Represents the data mapped from the original space to the latent space, and from the latent space to the original space, ||...|| ₂ represents the L2 norm;

二元判断模块在训练过程中采用以下损失函数计算真实数据与合成数据之间的差异：The binary judgment module uses the following loss function to calculate the difference between real data and synthetic data during training:

式中，l_U表示真实数据与合成数据的交叉熵函数，y_t表示真实数据，表示合成数据。In the formula, l _U represents the cross entropy function between real data and synthetic data, y _t represents real data, Represents synthetic data.

进一步，所述第四步的扩充拟真数据空间采用滤波组合方法实现数据空间的扩充，同时采用贝叶斯优化的方法来寻找滤波组合的各个参数；Furthermore, the fourth step of expanding the simulated data space adopts a filter combination method to achieve the expansion of the data space, and at the same time adopts a Bayesian optimization method to find various parameters of the filter combination;

其中，滤波组合方法为根据生成对抗网络产生的拟真数据，截至频率与组合比例后按照公式进行原始数据与滤波数据的组合：Among them, the filter combination method is to combine the original data and the filtered data according to the formula after the frequency and combination ratio are cut off according to the simulated data generated by the generative adversarial network:

f₁(x₁，x₂，x₃)＝x₁*filter(x₂，data)+x₃*data；f ₁ (x ₁ , x ₂ , x ₃ )=x ₁ *filter (x ₂ , data)+x ₃ *data;

式中，公式等号右边第一部分表示一定比例的原始数据，第二部分表示一定比例的滤波后的数据；x₁表示组合数据中滤波数据的比例，x₂表示滤波器的截止频率，data表示原始数据，filter(x_２，data)表示滤波处理后的数据，x₃表示组合数据中原始数据的比例；公式等号左边表示滤波组合后的结果；In the formula, the first part on the right side of the equal sign represents a certain proportion of the original data, and the second part represents a certain proportion of the filtered data; _x1 represents the proportion of the filtered data in the combined data, _x2 represents the cutoff frequency of the filter, data represents the original data, filter( _x2 , data) represents the filtered data, and _x3 represents the proportion of the original data in the combined data; the left side of the equal sign represents the result after the filtered combination;

使用贝叶斯优化算法来寻找参数(x₁，x₂，x₃)，包括优化表达式、拟合模型、采集函数；Use Bayesian optimization algorithm to find parameters (x ₁ , x ₂ , x ₃ ), including optimizing expressions, fitting models, and collecting functions;

确定优化表达式，高斯过程作为拟合模型，概率提升函数作为采集函数：Determine the optimization expression, use the Gaussian process as the fitting model, and the probability boost function as the acquisition function:

f₂(x₁，x₂，x₃)＝dtw(f₁(x₁，x₂，x₃)，data)；f ₂ (x ₁ , x ₂ , x ₃ )=dtw (f ₁ (x ₁ , x ₂ , x ₃ ), data);

式中，公式等号右边表示滤波组合后数据与原始数据的距离，公式等号左边表示距离的具体数值；其中，dtw表示动态时间调整距离计算函数，f₁(x₁，x_２，x₃)表示滤波组合数据，data表示原始数据；In the formula, the right side of the formula equal sign represents the distance between the filtered combined data and the original data, and the left side of the formula equal sign represents the specific value of the distance; where dtw represents the dynamic time adjustment distance calculation function, f ₁ (x ₁ , x ₂ , x ₃ ) represents the filtered combined data, and data represents the original data;

所述第五步的传感器数据的拦截与替换：通过Hook实现传感器监控模块，对传感器传递数据接口进行底层拦截与替换；在Android8.0系统源码中找到控制分发传感器数据的模块类android.hardware.SystemSensorManager，在模块类中找到具体的传感器处理子类SensorEventQueue，与其中的分发函数dispatchSensorEvent；对系统服务进程中的SystemSensorManager下的dispatchSensorEvent方法进行Hook，并加载预先编译好的替换函数模块，使用合成数据对传感器接口进行替换。The interception and replacement of sensor data in the fifth step: implement the sensor monitoring module through Hook, and perform bottom-level interception and replacement on the sensor data transmission interface; find the module class android.hardware.SystemSensorManager that controls the distribution of sensor data in the Android8.0 system source code, find the specific sensor processing subclass SensorEventQueue in the module class, and the distribution function dispatchSensorEvent therein; Hook the dispatchSensorEvent method under SystemSensorManager in the system service process, load the pre-compiled replacement function module, and replace the sensor interface with synthetic data.

本发明的另一目的在于提供一种计算机设备，所述计算机设备包括存储器和处理器，所述存储器存储有计算机程序，所述计算机程序被所述处理器执行时，使得所述处理器执行所述传感器数据保护方法的步骤。Another object of the present invention is to provide a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the sensor data protection method.

本发明的另一目的在于提供一种信息数据处理终端，所述信息数据处理终端用于实现所述传感器数据保护方法。Another object of the present invention is to provide an information data processing terminal, wherein the information data processing terminal is used to implement the sensor data protection method.

本发明的另一目的在于提供一种实施所述传感器数据保护方法的传感器数据保护系统，所述传感器数据保护系统包括：Another object of the present invention is to provide a sensor data protection system for implementing the sensor data protection method, the sensor data protection system comprising:

系统初始化模块，用于实现用户输入动作分布，通过预定义的动作转移概率与用户输入的动作分布构建转移矩阵，并进行多轮迭代验证是否可以达到平稳分布，为后续生成动作序列提供可行性支持；The system initialization module is used to realize the user input action distribution, build a transfer matrix through the predefined action transition probability and the user input action distribution, and perform multiple rounds of iterations to verify whether a stable distribution can be achieved, providing feasibility support for the subsequent generation of action sequences;

拟真动作序列构建模块，用于使用构建转移矩阵的建议分布与接受分布，结合随机游走算法生成符合预定义动作分布的行为动作序列，为后续传感器拟真数据排列规则提供数据支持；The simulated action sequence construction module is used to use the proposed distribution and the accepted distribution of the constructed transfer matrix, combined with the random walk algorithm to generate a behavior action sequence that conforms to the predefined action distribution, providing data support for the subsequent sensor simulated data arrangement rules;

传感器拟真数据生成模块，用于预先使用真实数据训练生成对抗网络模型，使模型产生的拟真数据在动作识别任务下的准确率达到90％以上，对于每个动作均生成多组数据作为缓冲，为后续拟真数据空间扩充任务提供原始数据模板；The sensor simulation data generation module is used to pre-train the generative adversarial network model with real data, so that the accuracy of the simulation data generated by the model in the action recognition task reaches more than 90%. For each action, multiple sets of data are generated as a buffer to provide the original data template for the subsequent simulation data space expansion task;

扩充拟真数据空间模块，用于将缓冲区中各个动作的数据取出，按照滤波组合规则进行结合，并使用贝叶斯优化算法选择多个能达到局部最优的参数，由于可以产生与原始数据差异较大且分类相同的数据，这就适当解决了生成对抗网络可能存在的模式崩溃问题；Expand the simulated data space module to extract the data of each action in the buffer, combine them according to the filter combination rules, and use the Bayesian optimization algorithm to select multiple parameters that can achieve local optimality. Since data with large differences from the original data but the same classification can be generated, this appropriately solves the problem of mode collapse that may exist in the generative adversarial network;

数据结合与替换模块，用于按传感器拟真数据生成照行为动作序列将拟真数据进行填入，在移动设备底层Hook传感器数据分发接口，将批量的传感器数据进行替换，从移动端数据发布环节开始保护传感器数据的隐私安全。The data combination and replacement module is used to generate simulated data according to the behavioral action sequence generated by the sensor simulated data, hook the sensor data distribution interface at the bottom layer of the mobile device, replace the batch of sensor data, and protect the privacy and security of sensor data from the mobile terminal data release link.

进一步，所述传感器数据保护系统还包括：生成器、判别器；Furthermore, the sensor data protection system further includes: a generator, a discriminator;

所述生成器包括嵌入功能模块、恢复功能模块、嵌入恢复损失计算模块、多尺度循环模块、时序功能模块；The generator includes an embedding function module, a recovery function module, an embedding recovery loss calculation module, a multi-scale circulation module, and a timing function module;

所述嵌入功能模块用于将数据从原始空间下的低维度映射到潜在空间下的高维度；所述恢复功能模块与所述嵌入功能模块相连接，用于将数据从高维潜在空间精确地恢复到低维的真实空间；所述嵌入恢复损失计算模块用于计算真实数据经过所述嵌入功能模块与所述恢复功能模块处理后，与原始数据的差异，用于重复训练所述嵌入功能模块与恢复功能模块，使原始数据能精准地在高维空间表达；所述多尺度循环模块用于学习多传感器各个维度的时域特性以及各个维度之间时域特征的相关性；所述时序功能模块，用于在对抗训练过程中更好地在高维潜在空间表示生成器输出的合成数据；The embedding function module is used to map data from a low dimension in the original space to a high dimension in the latent space; the recovery function module is connected to the embedding function module and is used to accurately restore data from a high-dimensional latent space to a low-dimensional real space; the embedding recovery loss calculation module is used to calculate the difference between the real data and the original data after being processed by the embedding function module and the recovery function module, and is used to repeatedly train the embedding function module and the recovery function module so that the original data can be accurately expressed in the high-dimensional space; the multi-scale cycle module is used to learn the time domain characteristics of each dimension of the multi-sensor and the correlation between the time domain characteristics of each dimension; the timing function module is used to better represent the synthetic data output by the generator in the high-dimensional latent space during the adversarial training process;

所述判别器包括二元判断功能模块、相似度计算模块；所述二元判断功能模块用于在对抗训练过程中区分真实数据与合成数据；所述相似度计算模块与所述二元判断功能模块相连接，用于计算在低维原始空间合成数据与真实数据之间的余弦相似度；The discriminator includes a binary judgment function module and a similarity calculation module; the binary judgment function module is used to distinguish real data from synthetic data during adversarial training; the similarity calculation module is connected to the binary judgment function module and is used to calculate the cosine similarity between the synthetic data and the real data in the low-dimensional original space;

嵌入功能模块、恢复功能模块均由多尺度循环神经网络和全连接网络层构成，所述多尺度循环神经网络由不同大小的一维循环神经网络层构成，所述多尺度循环神经网络最后一层每个节点的输出作为全连接层的输入；The embedding function module and the restoration function module are both composed of a multi-scale recurrent neural network and a fully connected network layer. The multi-scale recurrent neural network is composed of one-dimensional recurrent neural network layers of different sizes. The output of each node in the last layer of the multi-scale recurrent neural network is used as the input of the fully connected layer.

时序功能模块包括全连接网络和GRU网络；The timing function module includes a fully connected network and a GRU network;

嵌入恢复损失计算模块采用以下公式计算原始数据与经过所述嵌入功能模块、所述恢复功能模块处理的数据之间的差异度；The embedding recovery loss calculation module uses the following formula to calculate the difference between the original data and the data processed by the embedding function module and the recovery function module;

二元判断模块在训练过程中采用以下损失函数计算真实数据与合成数据之间的差异。The binary judgment module uses the following loss function to calculate the difference between real data and synthetic data during training.

结合上述的所有技术方案，本发明所具备的优点及积极效果为：本发明的Android平台的传感器数据替换与拟真数据生成方法相结合，改进了现有方案实时性较差的缺陷，并且从移动终端的数据产生环节开始保护传感器数据的全生命周期隐私安全，同时在应用服务端防止第三方对用户隐私的恶意窃取与分析。Combining all the above-mentioned technical solutions, the advantages and positive effects of the present invention are as follows: the sensor data replacement of the Android platform of the present invention is combined with the simulation data generation method, which improves the defect of poor real-time performance of the existing solution, and protects the privacy security of the sensor data throughout the life cycle from the data generation link of the mobile terminal, while preventing malicious theft and analysis of user privacy by third parties on the application server side.

本发明通过基于马尔科夫链蒙特卡洛法的随机游走算法生成符合预定义分布与预定义转移概率的行为动作序列；通过基于时序生成对抗网络的传感器拟真数据生成方法与基于滤波组合与贝叶斯优化的拟真数据空间扩充方法，生成符合预定义分类的传感器拟真数据，且拟真数据之间有明显差异；通过全时刻全方位的替换移动终端传感器数据，可以针对移动设备传感器，达到在全生命周期且完全匿名化的用户隐私保护效果。The present invention generates a behavioral action sequence that conforms to a predefined distribution and a predefined transition probability through a random walk algorithm based on the Markov chain Monte Carlo method; generates sensor simulated data that conforms to predefined classifications through a sensor simulated data generation method based on a time series generative adversarial network and a simulated data space expansion method based on filter combination and Bayesian optimization, and there are obvious differences between the simulated data; and by replacing mobile terminal sensor data at all times and in all directions, a user privacy protection effect that is completely anonymized throughout the entire life cycle can be achieved for mobile device sensors.

本发明将Android平台传感器数据的拦截及替换与基于多传感器拟真数据替换的安卓终端传感器数据保护策略相结合，不仅从移动端发布数据环节开始保护传感器数据的隐私，同时在服务器端有效防止攻击方对用户隐私的恶意推断，防止用户隐私被窃取。本发明将统计学习方法与深度学习方法应用在Android移动终端传感器数据隐私保护上，可以消除攻击者对用户隐私进行推断的能力。即使攻击者在较长时间对用户的传感器数据进行采集，也不会对隐私保护的安全性造成影响。本发明提出生成对抗网络产生拟真数据，使拟真数据在面对动作分类等推断时能保持与真实数据类似的精度。本发明采用滤波组合与贝叶斯优化算法实现拟真数据空间的扩充，滤波组合在保证频域特征的前提下，增大了在时域下与原始数据的差距，更方便地增大了拟真数据的空间。The present invention combines the interception and replacement of sensor data on the Android platform with an Android terminal sensor data protection strategy based on multi-sensor simulated data replacement, which not only protects the privacy of sensor data from the data release link of the mobile terminal, but also effectively prevents the malicious inference of user privacy by the attacker on the server side, thereby preventing the theft of user privacy. The present invention applies statistical learning methods and deep learning methods to the privacy protection of sensor data on Android mobile terminals, which can eliminate the attacker's ability to infer user privacy. Even if the attacker collects the user's sensor data for a long time, it will not affect the security of privacy protection. The present invention proposes to generate adversarial networks to generate simulated data, so that the simulated data can maintain accuracy similar to that of real data when facing inferences such as action classification. The present invention adopts a filter combination and a Bayesian optimization algorithm to realize the expansion of the simulated data space. The filter combination increases the gap with the original data in the time domain while ensuring the frequency domain characteristics, and more conveniently increases the space of simulated data.

本发明可以在移动端和服务端保证用户隐私安全，同时重复使用生成对抗网络模型并动态调整滤波参数，尽可能保证可用性识别精度并降低重复率，对服务端能达到较好的混淆效果，对攻击者拥有的背景信息不敏感；低频重采样技术由于使用真实数据，无法保证移动端数据安全，可以保证服务端对用户隐私推断的精度降低，但无法做到完全混淆；完全使用随机数据对传感器数据进行拦截与替换，可以保证移动端安全，但容易被服务端识别为异常用户并终止正常功能的服务。The present invention can ensure user privacy security on the mobile terminal and the server terminal, while repeatedly using the generative adversarial network model and dynamically adjusting the filtering parameters, so as to ensure the availability recognition accuracy and reduce the repetition rate as much as possible, achieve a better obfuscation effect on the server side, and be insensitive to the background information possessed by the attacker; the low-frequency resampling technology cannot ensure the security of the mobile terminal data because it uses real data, and can ensure that the accuracy of the server's inference of user privacy is reduced, but it cannot achieve complete obfuscation; the sensor data is intercepted and replaced completely with random data, which can ensure the security of the mobile terminal, but it is easy to be identified as an abnormal user by the server and terminate the normal function service.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明实施例提供的传感器数据保护方法的流程图。FIG1 is a flow chart of a sensor data protection method provided by an embodiment of the present invention.

图2是本发明实施例提供的传感器数据保护系统的结构示意图。FIG. 2 is a schematic diagram of the structure of a sensor data protection system provided in an embodiment of the present invention.

图3是本发明实施例提供的传感器数据保护方法的实现流程图。FIG3 is a flow chart of an implementation of a sensor data protection method provided in an embodiment of the present invention.

图4是本发明实施例提供的传感器数据保护系统的原理图。FIG. 4 is a schematic diagram of a sensor data protection system provided in an embodiment of the present invention.

图5是本发明实施例提供的基于多传感器拟真数据替换的安卓终端传感器数据保护系统流程图。FIG5 is a flow chart of an Android terminal sensor data protection system based on multi-sensor simulated data replacement provided in an embodiment of the present invention.

图6是本发明实施例提供的常见动作比例与转移概率示意图。FIG. 6 is a schematic diagram of common action ratios and transition probabilities provided by an embodiment of the present invention.

图7是本发明实施例提供的随机游走算法平稳分布收敛示意图。FIG. 7 is a schematic diagram of the steady distribution convergence of the random walk algorithm provided by an embodiment of the present invention.

图8是本发明实施例提供的生成对抗网络的传感器拟真数据生成模型架构示意图。FIG8 is a schematic diagram of a sensor simulation data generation model architecture of a generative adversarial network provided in an embodiment of the present invention.

图9是本发明实施例提供的真实数据与拟真数据的曲线，行为类别是跑步示意图。FIG. 9 is a curve of real data and simulated data provided by an embodiment of the present invention, and the behavior category is a running diagram.

图10是本发明实施例提供的真实数据与拟真数据的曲线，行为类别是下楼示意图。FIG. 10 is a curve of real data and simulated data provided by an embodiment of the present invention, and the behavior category is a schematic diagram of going downstairs.

图11是本发明实施例提供的真实数据与拟真数据的曲线，行为类别是走路示意图。FIG. 11 is a curve of real data and simulated data provided by an embodiment of the present invention, and the behavior category is a walking diagram.

图12是本发明实施例提供的低频滤波组合与原始数据的对比，行为是上楼梯，对比的数据是加速度计X轴数据示意图。FIG12 is a comparison between the low-frequency filter combination provided by an embodiment of the present invention and the original data, the behavior is climbing stairs, and the compared data is a schematic diagram of the accelerometer X-axis data.

图13是本发明实施例提供的高频滤波组合与原始数据的对比，行为是上楼梯，对比的数据是加速度计X轴数据示意图。FIG13 is a comparison between the high-frequency filter combination provided by an embodiment of the present invention and the original data. The behavior is climbing stairs, and the compared data is a schematic diagram of the accelerometer X-axis data.

图中：1、系统初始化模块；2、拟真动作序列构建模块；3、传感器拟真数据生成模块；4、扩充拟真数据空间模块；5、数据结合与替换模块；100、子系统；101、生成器；102、判别器；1011、自动编解码器；1012、嵌入恢复损失计算模块；1013、多尺度循环模块；1014、时序功能模块；1021、二元功能判别器；1022、相似度计算模块。In the figure: 1. System initialization module; 2. Simulated action sequence construction module; 3. Sensor simulated data generation module; 4. Expanded simulated data space module; 5. Data combination and replacement module; 100. Subsystem; 101. Generator; 102. Discriminator; 1011. Automatic encoder and decoder; 1012. Embedding recovery loss calculation module; 1013. Multi-scale cycle module; 1014. Timing function module; 1021. Binary function discriminator; 1022. Similarity calculation module.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not used to limit the present invention.

针对现有技术存在的问题，本发明提供了一种传感器数据保护方法、系统、计算机设备及智能终端，下面结合附图对本发明作详细的描述。In view of the problems existing in the prior art, the present invention provides a sensor data protection method, system, computer device and intelligent terminal. The present invention is described in detail below with reference to the accompanying drawings.

如图1所示，本发明提供的传感器数据保护方法包括以下步骤：As shown in FIG1 , the sensor data protection method provided by the present invention comprises the following steps:

S101，系统初始化，用户输入动作分布，通过预定义的动作转移概率与用户输入的动作分布构建转移矩阵，并进行多轮迭代验证是否可以达到平稳分布，为后续生成动作序列提供可行性支持；S101, system initialization, user input action distribution, construct a transfer matrix through predefined action transition probability and user input action distribution, and perform multiple rounds of iterations to verify whether a stable distribution can be achieved, so as to provide feasibility support for subsequent generation of action sequences;

S102，拟真动作序列构建，使用构建转移矩阵的建议分布与接受分布，结合随机游走算法生成符合预定义动作分布的行为动作序列，为后续传感器拟真数据排列规则提供数据支持；S102, simulated action sequence construction, using the proposed distribution and accepted distribution of the constructed transfer matrix, combined with the random walk algorithm to generate a behavior action sequence that conforms to the predefined action distribution, providing data support for the subsequent sensor simulated data arrangement rules;

S103，传感器拟真数据生成，预先使用真实数据训练生成对抗网络模型，使模型产生的拟真数据在动作识别任务下的准确率达到90％以上，对于每个动作均生成多组数据作为缓冲，为后续拟真数据空间扩充任务提供原始数据模板；S103, sensor simulation data generation, using real data to train the generative adversarial network model in advance, so that the accuracy of the simulation data generated by the model in the action recognition task reaches more than 90%, and multiple sets of data are generated for each action as a buffer to provide the original data template for the subsequent simulation data space expansion task;

S104，扩充拟真数据空间，将缓冲区中各个动作的数据取出，按照滤波组合规则进行结合，并使用贝叶斯优化算法选择多个能达到局部最优的参数，由于可以产生与原始数据差异较大且分类相同的数据，这就适当解决了生成对抗网络可能存在的模式崩溃问题；S104, expanding the simulated data space, taking out the data of each action in the buffer, combining them according to the filter combination rule, and using the Bayesian optimization algorithm to select multiple parameters that can achieve local optimality. Since data with large differences from the original data and the same classification can be generated, this appropriately solves the problem of mode collapse that may exist in the generative adversarial network;

S105，数据结合与替换，按传感器拟真数据生成照行为动作序列将拟真数据进行填入，在移动设备底层Hook传感器数据分发接口，将批量的传感器数据进行替换，从移动端数据发布环节开始保护传感器数据的隐私安全。S105, data combination and replacement, fill in the simulated data according to the behavioral action sequence generated by the sensor simulated data, hook the sensor data distribution interface at the bottom layer of the mobile device, replace the batch of sensor data, and protect the privacy and security of sensor data from the mobile terminal data release link.

本发明提供的传感器数据保护方法业内的普通技术人员还可以采用其他的步骤实施，图1的本发明提供的传感器数据保护方法仅仅是一个具体实施例而已。The sensor data protection method provided by the present invention may be implemented by ordinary technicians in the industry using other steps. The sensor data protection method provided by the present invention in FIG. 1 is only a specific embodiment.

如图2与图4所示，本发明提供的传感器数据保护系统包括：As shown in FIG. 2 and FIG. 4 , the sensor data protection system provided by the present invention includes:

系统初始化模块1，用于实现用户输入动作分布，通过预定义的动作转移概率与用户输入的动作分布构建转移矩阵，并进行多轮迭代验证是否可以达到平稳分布，为后续生成动作序列提供可行性支持；System initialization module 1 is used to realize the user input action distribution, construct a transfer matrix through the predefined action transition probability and the user input action distribution, and perform multiple rounds of iterations to verify whether a stable distribution can be achieved, so as to provide feasibility support for the subsequent generation of action sequences;

拟真动作序列构建模块2，用于使用构建转移矩阵的建议分布与接受分布，结合随机游走算法生成符合预定义动作分布的行为动作序列，为后续传感器拟真数据排列规则提供数据支持；The simulated action sequence construction module 2 is used to use the proposed distribution and the accepted distribution of the constructed transfer matrix, combined with the random walk algorithm to generate a behavior action sequence that conforms to the predefined action distribution, and provide data support for the subsequent sensor simulated data arrangement rules;

传感器拟真数据生成模块3，用于预先使用真实数据训练生成对抗网络模型，使模型产生的拟真数据在动作识别任务下的准确率达到90％以上，对于每个动作均生成多组数据作为缓冲，为后续拟真数据空间扩充任务提供原始数据模板；The sensor simulation data generation module 3 is used to train the generative adversarial network model with real data in advance, so that the accuracy of the simulation data generated by the model in the action recognition task reaches more than 90%. For each action, multiple sets of data are generated as a buffer to provide the original data template for the subsequent simulation data space expansion task;

扩充拟真数据空间模块4，用于将缓冲区中各个动作的数据取出，按照滤波组合规则进行结合，并使用贝叶斯优化算法选择多个能达到局部最优的参数，由于可以产生与原始数据差异较大且分类相同的数据，这就适当解决了生成对抗网络可能存在的模式崩溃问题；Expanding the simulated data space module 4 is used to take out the data of each action in the buffer, combine them according to the filter combination rule, and use the Bayesian optimization algorithm to select multiple parameters that can achieve local optimality. Since data with large differences from the original data and the same classification can be generated, this appropriately solves the problem of mode collapse that may exist in the generative adversarial network;

数据结合与替换模块5，用于按传感器拟真数据生成照行为动作序列将拟真数据进行填入，在移动设备底层Hook传感器数据分发接口，将批量的传感器数据进行替换，从移动端数据发布环节开始保护传感器数据的隐私安全。The data combination and replacement module 5 is used to fill in the simulated data according to the behavioral action sequence generated by the sensor simulated data, hook the sensor data distribution interface at the bottom layer of the mobile device, replace the batch of sensor data, and protect the privacy and security of the sensor data from the mobile terminal data release link.

下面结合具体实施例对本发明的技术方案作进一步的描述。The technical solution of the present invention is further described below in conjunction with specific embodiments.

实施例1：Embodiment 1:

本发明提供的传感器数据保护方法通过全时刻全方位使用传感器拟真数据序列替换传感器数据，达到在全生命周期且全匿名的隐私保护效果。针对基于多传感器拟真数据替换的传感器数据隐私保护方法与多传感器拟真数据生成方法进行改进。The sensor data protection method provided by the present invention replaces sensor data with a sensor simulated data sequence at all times and in all directions, thereby achieving a privacy protection effect in the entire life cycle and with full anonymity. Improvements are made to the sensor data privacy protection method based on multi-sensor simulated data replacement and the multi-sensor simulated data generation method.

本发明传感器数据隐私保护方法通过对现有的隐私保护的差分隐私保护方法的分析，针对当前方案中使用真实数据处理导致可能的用户背景泄露风险的局限，使用高度拟真的多传感器数据对移动设备传感器进行全时刻、全方位的替换。在整个过程中服务端无法获取用户真实数据，服务端可以对传入的拟真数据进行动作分类等分类任务，但无法获取用户的任何背景信息与隐私数据。The sensor data privacy protection method of the present invention analyzes the differential privacy protection method of existing privacy protection, and uses highly simulated multi-sensor data to replace the mobile device sensors at all times and in all directions, targeting the limitation of the current scheme that uses real data processing to cause possible user background leakage risks. During the whole process, the server cannot obtain the user's real data. The server can perform classification tasks such as action classification on the incoming simulated data, but cannot obtain any background information and privacy data of the user.

本发明多传感器拟真数据生成方法为了生成符合预定义分布的动作序列，设计了基于马尔科夫链蒙特卡洛法的随机游走算法，通过引入动作之间转移概率作为构造马尔科夫矩阵的建议分布，改进构造马尔科夫链过程中的接收分布，使随机游走算法结束后能够生成符合预定义分布的行为动作序列；为了生成在动作分类任务中表现良好且重复率低的多传感器拟真数据，设计了基于时间序列的生成对抗网络模型结合基于贝叶斯优化的滤波组合方法，通过时间序列生成对抗网络产生符合预定义分类的数据传感器数据，引入滤波组合的方法增大拟真数据的空间，并使用贝叶斯优化搜索符合要求的滤波组合参数，同时解决了生成对抗网络的模式崩溃问题。In order to generate an action sequence that conforms to a predefined distribution, the multi-sensor simulated data generation method of the present invention designs a random walk algorithm based on the Markov chain Monte Carlo method, and introduces the transition probability between actions as a suggested distribution for constructing the Markov matrix to improve the receiving distribution in the process of constructing the Markov chain, so that after the random walk algorithm is completed, a behavioral action sequence that conforms to the predefined distribution can be generated; in order to generate multi-sensor simulated data that performs well in action classification tasks and has a low repetition rate, a time series-based generative adversarial network model combined with a filtering combination method based on Bayesian optimization is designed, and data sensor data that conforms to the predefined classification is generated through a time series generative adversarial network, a filtering combination method is introduced to increase the space of simulated data, and Bayesian optimization is used to search for filtering combination parameters that meet the requirements, while solving the mode collapse problem of the generative adversarial network.

使用以上步骤生成符合真实场景下的移动设备多传感器的拟真数据序列。Use the above steps to generate realistic data sequences of multiple sensors of mobile devices in real scenarios.

实施例2：Embodiment 2:

本发明基于多传感器拟真数据替换的安卓终端传感器数据保护方法的Andr_oid平台的传感器数据替换与拟真数据生成方法相结合，从移动终端的数据产生环节开始保护传感器数据的隐私安全，同时在应用服务端防止第三方对用户隐私的恶意窃取与分析；具体包括如下步骤：The present invention combines the sensor data replacement of the Android terminal sensor data protection method based on multi-sensor simulated data replacement with the simulated data generation method of the _Android id platform, protects the privacy security of sensor data from the data generation link of the mobile terminal, and prevents the malicious theft and analysis of user privacy by a third party at the application service end; specifically, it includes the following steps:

步骤一，动作序列生成：采用基于马尔科夫链蒙特卡洛法的随机游走算法，根据用户预设的动作比例构建状态转移矩阵，实现虚假行为动作序列的生成；使用蒙特卡洛方法，构建马尔可夫转移矩阵采用转移核公式为：Step 1: Action sequence generation: A random walk algorithm based on the Markov chain Monte Carlo method is used to construct a state transfer matrix according to the action ratio preset by the user to generate a false behavior action sequence; the Monte Carlo method is used to construct the Markov transfer matrix using the transfer kernel formula:

p(x，x′)＝q(x，x′)α(x，x′)；p(x, x′)=q(x, x′)α(x, x′);

式中，式中q(x，x′)称为建议分布，α(x，x′)称为接收分布。假设建议分布是对称的，接收分布为：In the formula, q(x, x′) is called the proposed distribution, and α(x, x′) is called the received distribution. Assuming that the proposed distribution is symmetric, the received distribution is:

式中，p(x′)表示状态x′的占比，p(x)表示状态x的占比。建议分布为从状态x到状态x′的转移概率，满足式中，X表示与状态x相邻的状态集合，且包括状态x。In the formula, p(x′) represents the proportion of state x′, and p(x) represents the proportion of state x. The recommended distribution is the transition probability from state x to state x′, satisfying Wherein, X represents the set of states adjacent to state x and including state x.

步骤二，初步生成拟真数据：采用时序生成对抗网络来生成多传感器拟真数据，包括：生成器、判别器，其中，所述生成器包括嵌入功能模块、恢复功能模块、嵌入恢复损失计算模块、多尺度循环模块、时序功能模块；所述嵌入功能模块用于将数据从原始空间下的低维度映射到潜在空间下的高维度；所述恢复功能模块与所述嵌入功能模块相连接，用于将数据从高维潜在空间精确地恢复到低维的真实空间；所述嵌入恢复损失计算模块用于计算真实数据经过所述嵌入功能模块与所述恢复功能模块处理后，与原始数据的差异，用于重复训练所述嵌入功能模块与恢复功能模块，使原始数据能精准地在高维空间表达；所述多尺度循环模块用于学习多传感器各个维度的时域特性以及各个维度之间时域特征的相关性；所述时序功能模块，用于在对抗训练过程中更好地在高维潜在空间表示生成器输出的合成数据；Step 2, preliminarily generate simulated data: use a time series generative adversarial network to generate multi-sensor simulated data, including: a generator, a discriminator, wherein the generator includes an embedding function module, a recovery function module, an embedding recovery loss calculation module, a multi-scale cycle module, and a time series function module; the embedding function module is used to map data from a low dimension in the original space to a high dimension in the latent space; the recovery function module is connected to the embedding function module, and is used to accurately restore data from a high-dimensional latent space to a low-dimensional real space; the embedding recovery loss calculation module is used to calculate the difference between the real data after being processed by the embedding function module and the recovery function module and the original data, and is used to repeatedly train the embedding function module and the recovery function module so that the original data can be accurately expressed in a high-dimensional space; the multi-scale cycle module is used to learn the time domain characteristics of each dimension of the multi-sensor and the correlation between the time domain characteristics of each dimension; the time series function module is used to better represent the synthetic data output by the generator in a high-dimensional latent space during adversarial training;

以及所述判别器包括二元判断功能模块、相似度计算模块；所述二元判断功能模块用于在对抗训练过程中区分真实数据与合成数据；所述相似度计算模块与所述二元判断功能模块相连接，用于计算在低维原始空间合成数据与真实数据之间的余弦相似度。The discriminator includes a binary judgment function module and a similarity calculation module; the binary judgment function module is used to distinguish real data from synthetic data during adversarial training; the similarity calculation module is connected to the binary judgment function module and is used to calculate the cosine similarity between synthetic data and real data in a low-dimensional original space.

模型的价值函数为：The value function of the model is:

其中，公式等号右侧第一部分表示判别器在高维潜在空间表示的真实数据上训练的期望，第二部分表示判别器在由生成器合成的高位潜在空间的合成数据上训练的期望；其中，G表示生成器网络，D表示判别器网络，E表示期望，x～p_data(x)表示从真是数据集中采样的真实数据，log表示对数函数，x表示真实数据，X表示高维潜在空间表示的真实数据，z～p_z(z)表示从正态分布采样的随机噪声向量，z表示随机噪声向量。Among them, the first part on the right side of the formula represents the expectation of the discriminator being trained on the real data represented by the high-dimensional latent space, and the second part represents the expectation of the discriminator being trained on the synthetic data of the high-dimensional latent space synthesized by the generator; wherein G represents the generator network, D represents the discriminator network, E represents the expectation, x~p _data (x) represents the real data sampled from the real data set, log represents the logarithmic function, x represents the real data, X represents the real data represented by the high-dimensional latent space, z~p _z (z) represents the random noise vector sampled from the normal distribution, and z represents the random noise vector.

嵌入功能模块、恢复功能模块均由多尺度循环神经网络和全连接网络层构成，所述多尺度循环神经网络由不同大小的一维循环神经网络层构成，所述多尺度循环神经网络最后一层每个节点的输出作为全连接层的输入。The embedding function module and the recovery function module are both composed of a multi-scale recurrent neural network and a fully connected network layer. The multi-scale recurrent neural network is composed of one-dimensional recurrent neural network layers of different sizes. The output of each node in the last layer of the multi-scale recurrent neural network is used as the input of the fully connected layer.

时序功能模块包括全连接网络和GRU网络。The timing function module includes a fully connected network and a GRU network.

嵌入恢复损失计算模块采用以下公式计算原始数据与经过所述嵌入功能模块、所述恢复功能模块处理的数据之间的差异度：The embedding recovery loss calculation module uses the following formula to calculate the difference between the original data and the data processed by the embedding function module and the recovery function module:

式中，l_R表示原始数据与恢复后数据的差异度，E表示数学期望，x_t表示原始数据，表示原始数据从原始空间映射到潜在空间，并从潜在空间映射到原始空间的数据，||...||₂表示L2范数。In the formula, l _R represents the difference between the original data and the restored data, E represents the mathematical expectation, x _t represents the original data, represents the data mapped from the original space to the latent space and from the latent space to the original space, ||...|| ₂ represents the L2 norm.

步骤三，扩充拟真数据空间：采用滤波组合方法来实现数据空间的扩充，同时采用贝叶斯优化的方法来寻找滤波组合的各个参数。Step 3: Expand the simulated data space: Use the filter combination method to expand the data space, and use the Bayesian optimization method to find the parameters of the filter combination.

其中，滤波组合方法为根据生成对抗网络产生的拟真数据，设定截至频率与组合比例后按照以下公式进行原始数据与滤波数据的组合：Among them, the filter combination method is to combine the original data and the filtered data according to the following formula after setting the cutoff frequency and combination ratio based on the simulated data generated by the generative adversarial network:

f1(x₁，x₂，x₃)＝x₁*filter(x₂，data)+x₃*data；f1(x ₁ , x ₂ , x ₃ )=x ₁ *filter(x ₂ , data)+x ₃ *data;

式中，公式等号右边第一部分表示一定比例的原始数据，第二部分表示一定比例的滤波后的数据；x₁表示组合数据中滤波数据的比例，x₂表示滤波器的截止频率，data表示原始数据，filter(x_２，data)表示滤波处理后的数据，x₃表示组合数据中原始数据的比例；公式等号左边表示滤波组合后的结果。In the formula, the first part on the right side of the equal sign represents a certain proportion of the original data, and the second part represents a certain proportion of the filtered data; _x1 represents the proportion of filtered data in the combined data, _x2 represents the cutoff frequency of the filter, data represents the original data, filter( _x2 , data) represents the data after filtering, and _x3 represents the proportion of the original data in the combined data; the left side of the equal sign represents the result after filtering combination.

使用贝叶斯优化算法来寻找参数(x₁，x₂，x₃)，包括优化表达式、拟合模型、采集函数。A Bayesian optimization algorithm is used to find parameters (x ₁ , x ₂ , x ₃ ), including optimizing expressions, fitting models, and collecting functions.

确定以下公式为优化表达式，高斯过程作为拟合模型，概率提升函数作为采集函数：Determine the following formula as the optimization expression, Gaussian process as the fitting model, and probability boost function as the acquisition function:

式中，公式等号右边表示滤波组合后数据与原始数据的距离，公式等号左边表示距离的具体数值；其中，dtw表示动态时间调整距离计算函数，f₁(x₁，x₂，x₃)表示滤波组合数据，data表示原始数据。In the formula, the right side of the equal sign represents the distance between the filtered combined data and the original data, and the left side of the equal sign represents the specific value of the distance; wherein dtw represents the dynamic time adjustment distance calculation function, f ₁ (x ₁ , x ₂ , x ₃ ) represents the filtered combined data, and data represents the original data.

步骤四，传感器数据拦截与替换：Android系统下应用程序均由Zygote进程孵化而来；Zygote进程启动所对应的可执行程序是app_process，通过替换系统的app_process可执行文件以及虚拟机动态链接库，让Zygote在启动应用程序进程时注入模块代码。通过Hook实现传感器监控模块，对传感器传递数据接口进行底层拦截与替换；通过在Android8.0系统源码中找到控制分发传感器数据的模块类android.hardware.SystemSensorManager，在模块类中找到具体的传感器处理子类SensorEventQueue，与其中的分发函数dispatchSensorEvent；对系统服务进程中的SystemSensorManager下的dispatchSensorEvent方法进行Hook，并加载预先编译好的替换函数模块，使用合成数据对传感器接口进行替换。Step 4: Sensor data interception and replacement: All applications under the Android system are hatched by the Zygote process; the executable program corresponding to the startup of the Zygote process is app_process. By replacing the system's app_process executable file and the virtual machine dynamic link library, Zygote can inject the module code when starting the application process. Implement the sensor monitoring module through Hook, and perform bottom-level interception and replacement of the sensor data transmission interface; find the module class android.hardware.SystemSensorManager that controls the distribution of sensor data in the Android8.0 system source code, find the specific sensor processing subclass SensorEventQueue in the module class, and the dispatch function dispatchSensorEvent therein; Hook the dispatchSensorEvent method under SystemSensorManager in the system service process, load the pre-compiled replacement function module, and replace the sensor interface with synthetic data.

在本发明的步骤一中，系统初始化具体包括：In step 1 of the present invention, system initialization specifically includes:

(1)移动端用户输入包括站立、走路、跑步、坐、躺、上楼下楼等几种动作的比例，图6示出了实施例中定义的比例与转移概率。(1) The mobile terminal user input includes the proportion of several actions such as standing, walking, running, sitting, lying, going up and down stairs, etc. FIG6 shows the proportion and transition probability defined in the embodiment.

(2)通过图6中的动作分布比例与动作间转移概率，通过公式p(x，x′)＝q(x，x′)α(x，x′)计算状态转移矩阵P_ij＝p(i，j) i，j∈S，式中，S表示所有行为动作状态。通过初始化向量λ₀＝{1，0，0，0，0，0}，带入公式λ_t＝λ_t-1P，式中P表示状态转移矩阵，得到t轮迭代时的分布，图7示出了实施例中的分布收敛情况。(2) The state transfer matrix P _ij = p(i, j) i, j∈S is calculated by the action distribution ratio and the transition probability between actions in FIG6 through the formula p(x, x′) = q(x, x′) α(x, x′), where S represents all behavior action states. By initializing the vector λ ₀ = {1, 0, 0, 0, 0, 0}, substituting into the formula λ _t = λ _t-1 P, where P represents the state transfer matrix, the distribution at the time of t rounds of iteration is obtained. FIG7 shows the distribution convergence in the embodiment.

在本发明的步骤二中，拟真动作序列构建具体包括：In step 2 of the present invention, the construction of the simulated action sequence specifically includes:

使用基于马尔科夫链蒙特卡罗方法的随机游走算法进行动作序列的生成，在随机游走算法中直接使用所述步骤一中提供的接收分布：The random walk algorithm based on the Markov chain Monte Carlo method is used to generate the action sequence. The receiving distribution provided in step 1 is directly used in the random walk algorithm:

式中，p(x′)表示状态x′的分布，p(x)表示状态x的分布。Where p(x′) represents the distribution of state x′, and p(x) represents the distribution of state x.

以下用伪代码的形式详细说明方法的生成过程。The following is a pseudo-code description of the method generation process.

以上详细介绍了随机游走算法的过程。The above introduces the process of random walk algorithm in detail.

在本发明的步骤三中，图8示出了根据本发明一个实施例的系统结构。由图中可见，本发明的子系统100包括生成器101和判别器102，生成器101的目标是充分利用传感器数据自身潜在的时域频域特性来学习传感器真实数据的分布特征，从而能够生成更加接近真实分布的传感器拟真数据；判别器102的目标是结合真实数据与合成数据进行二元分类，在对抗训练过程中强化分类器的效果，衡量生成器效果。传感器拟真数据生成具体包括：In step three of the present invention, FIG8 shows the system structure according to an embodiment of the present invention. As can be seen from the figure, the subsystem 100 of the present invention includes a generator 101 and a discriminator 102. The goal of the generator 101 is to make full use of the potential time-domain and frequency-domain characteristics of the sensor data itself to learn the distribution characteristics of the sensor's real data, so as to generate sensor simulation data that is closer to the real distribution; the goal of the discriminator 102 is to combine the real data with the synthetic data for binary classification, strengthen the effect of the classifier during the adversarial training process, and measure the effect of the generator. Sensor simulation data generation specifically includes:

(1)将真实数据集中的数据进行min-max归一化，并保存真实数据集的最小值与最大值，为模型产生拟真数据后还原为原始尺度做好数据准备；(1) Perform min-max normalization on the data in the real dataset and save the minimum and maximum values of the real dataset to prepare data for the model to generate simulated data and then restore it to the original scale;

(2)在模型训练过程中，首先训练图8示出的生成器101下的自动编解码器1011，自动编解码器1011的目的是能够精确地将数据从低维原始空间映射到高维潜在空间，并准确地将高维潜在空间的数据恢复到低维原始空间；将训练使用的真实数据带入嵌入功能模块，将低维原始空间的真实数据映射到高维潜在空间，将真实数据的高维形式带入恢复功能模块，得到原始维度的数据；生成器101下的嵌入恢复损失计算模块1012的损失函数为：(2) In the model training process, the automatic encoder/decoder 1011 under the generator 101 shown in FIG8 is first trained. The purpose of the automatic encoder/decoder 1011 is to accurately map data from the low-dimensional original space to the high-dimensional latent space and accurately restore the data in the high-dimensional latent space to the low-dimensional original space; the real data used for training is brought into the embedding function module, the real data in the low-dimensional original space is mapped to the high-dimensional latent space, and the high-dimensional form of the real data is brought into the recovery function module to obtain the data of the original dimension; the loss function of the embedding recovery loss calculation module 1012 under the generator 101 is:

该公式表示训练自动编解码器的损失函数，X_t表示t批次的原始数据，表示t批次恢复后的数据，计算L2范数，∑表示求和。This formula represents the loss function for training the automatic encoder-decoder, _Xt represents the original data of t batches, It represents the data after t batches of recovery, calculates the L2 norm, and ∑ represents the sum.

(3)图8所示生成器101下的时序功能模块的目的是捕获真实数据在高维潜在空间的特征，使用真实数据通过所述自动编解码器1011下的嵌入功能模块处理，再带入时序功能模块1014并输出，将输出结果与高维空间结果进行二元交叉熵运算，时序功能模块1014的损失函数为：(3) The purpose of the time series function module under the generator 101 shown in FIG8 is to capture the characteristics of the real data in the high-dimensional latent space. The real data is processed by the embedding function module under the automatic codec 1011, and then brought into the time series function module 1014 and output. The output result is subjected to a binary cross entropy operation with the high-dimensional space result. The loss function of the time series function module 1014 is:

式中h_t表示t时刻真实数据在高维潜在空间的表示，g_X表示时序功能模块函数，h_t-1表示t-1时刻真实数据在高位潜在空间的表示，z_t表示t时刻的随机数据。根据本发明的一个实施例，多尺度循环模块1013输入输出维度如下：Wherein h _t represents the representation of real data in high-dimensional latent space at time t, g _X represents the temporal function module function, h _t-1 represents the representation of real data in high-dimensional latent space at time t-1, and z _t represents random data at time t. According to one embodiment of the present invention, the input and output dimensions of the multi-scale cycle module 1013 are as follows:

时域循环神经网络输入维度(三维)：[64，128，9]；Time domain recurrent neural network input dimension (3D): [64, 128, 9];

时域循环神经网络输出维度(三维)：[64，128，64]；Time domain recurrent neural network output dimension (three-dimensional): [64, 128, 64];

时域特征全连接网络输入维度(三维)：[64，128，64]；Time domain feature fully connected network input dimension (three-dimensional): [64, 128, 64];

时域特征全连接网络输出维度(三维)：[64，128，64]；Time domain feature fully connected network output dimension (three-dimensional): [64, 128, 64];

(4)图8所示判别器102下的二元功能判别器1021目的是区分真实数据与生成器产生的合成数据，二元功能判别器1021需要在高维空间下区分真实数据和生成器产生的合成数据，判别器对真实数据的处理结果以及对合成数据的处理结果需要满足非监督损失函数公式：(4) The purpose of the binary function discriminator 1021 under the discriminator 102 shown in FIG8 is to distinguish between real data and synthetic data generated by the generator. The binary function discriminator 1021 needs to distinguish between real data and synthetic data generated by the generator in a high-dimensional space. The processing results of the discriminator on the real data and the processing results on the synthetic data need to satisfy the unsupervised loss function formula:

式中，y_t表示判别器对真实数据的处理结果，表示对合成数据的处理结果。根据本发明的一个实施例，多尺度循环模块1013输入输出维度如下：In the formula, _yt represents the processing result of the discriminator on the real data, According to one embodiment of the present invention, the input and output dimensions of the multi-scale cycle module 1013 are as follows:

时域循环神经网络输入维度(三维)：[64，128，64]；Time domain recurrent neural network input dimension (3D): [64, 128, 64];

分类全连接层输入维度(三维)：[64，128，64]；Classification fully connected layer input dimension (3D): [64, 128, 64];

分类全连接层输出维度(三维)：[64，128，1]。Classification fully connected layer output dimension (3D): [64, 128, 1].

(5)图8所示判别器102下的相似度计算模块1022目的是验证合成数据与原始数据的分布，需要计算真实数据与真实数据的相似度、真实数据与拟真数据的相似度，若两者数值接近则说明拟真数据的分布接近真实数据的分布；以及需要计算拟真数据与拟真数据的相似度，保证拟真数据之间有差距；需要计算真实数据与合成数据的最大相似度，这让本发明了解到在最相似的情况下，本发明生成的数据与真实数据的相似程度，对于确保用户的隐私得到保护很有价值。如果某些拟真数据与原始数据的最大相似度高于80％，需要使用步骤五进行数据处理操作。表1示出合成50组数据时各个动作在各个指标下的余弦相似度。(5) The purpose of the similarity calculation module 1022 under the discriminator 102 shown in Figure 8 is to verify the distribution of the synthetic data and the original data. It is necessary to calculate the similarity between the real data and the real data, and the similarity between the real data and the simulated data. If the two values are close, it means that the distribution of the simulated data is close to the distribution of the real data; and it is necessary to calculate the similarity between the simulated data and the simulated data to ensure that there is a gap between the simulated data; it is necessary to calculate the maximum similarity between the real data and the synthetic data, which allows the present invention to understand that in the most similar case, the similarity between the data generated by the present invention and the real data is very valuable for ensuring that the user's privacy is protected. If the maximum similarity between some simulated data and the original data is higher than 80%, step five needs to be used for data processing operations. Table 1 shows the cosine similarity of each action under each indicator when synthesizing 50 sets of data.

表1Table 1

活动Activity 真实数据对真实数据相似度Real data to real data similarity 合成数据对合成数据相似度Synthetic data to synthetic data similarity 真实数据对介成数据相似度Similarity between real data and intermediate data 真实数据对合成数据相似度最大值The maximum similarity between real data and synthetic data 下楼梯Down the stairs 0.67900.6790 0.29180.2918 0.30110.3011 0.79980.7998 上楼梯Go up the stairs 0.37110.3711 0.13260.1326 0.19970.1997 0.79970.7997 走路walk 0.91500.9150 0.22300.2230 0.12370.1237 0.79970.7997 跑步running 0.28290.2829 0.10670.1067 0.06270.0627 0.78010.7801 站立Standing 0.42800.4280 0.34590.3459 0.38980.3898 0.79910.7991 平均average 0.53520.5352 0.22000.2200 0.21540.2154 0.79570.7957

以下用伪代码的形式详细说明训练过程。The following is a pseudo-code description of the training process.

以上详细介绍了对抗训练的过程The above details the process of adversarial training.

在步骤四中，扩充拟真数据空间具体包括：In step 4, expanding the simulated data space specifically includes:

(1)对拟真数据进行合成处理的公式为：(1) The formula for synthesizing the simulated data is:

x_o(n)＝r_a*x(n)+r_b*filter(x(n)，f_t)；x _o (n) = r _a *x (n) + r _b *filter (x (n), f _t );

式中，等号右边第一部分表示组合数据中一定比例的原始数据，第二部分表示一定比例的滤波数据；r_a表示原始数据的比例，x(n)表示原始数据，r_b表示滤波数据的比例，filter(n，f_t)表示滤波数据，f_t表示截止频率；等号左边表示滤波组合的结果。In the formula, the first part on the right side of the equal sign represents a certain proportion of original data in the combined data, and the second part represents a certain proportion of filtered data; _ra represents the proportion of original data, x(n) represents original data, _rb represents the proportion of filtered data, filter(n, _ft ) represents filtered data, _ft represents the cutoff frequency; the left side of the equal sign represents the result of the filter combination.

(2)贝叶斯优化目标函数为：(2) The Bayesian optimization objective function is:

式中，x₁表示滤波数据比例，x₂表示截止频率，x₃表示原始数据比例，data表示原始数据，dtw为动态时间规整距离计算函数，以计算两个时间序列的相似度，尤其适用于不同长度、不同节奏的时间序列，作为衡量滤波组合数据与原始数据差异的指标。图12示出了上楼梯的动作下，加速度计x轴的低通、高通滤波数据与原始数据的对比。In the formula, _x1 represents the ratio of filtered data, _x2 represents the cutoff frequency, _x3 represents the ratio of original data, data represents the original data, and dtw is the dynamic time warping distance calculation function to calculate the similarity of two time series, which is especially suitable for time series of different lengths and rhythms as an indicator to measure the difference between the filtered combination data and the original data. Figure 12 shows the comparison between the low-pass and high-pass filtered data and the original data of the accelerometer x-axis under the action of climbing stairs.

以下用伪代码的形式详细说明步骤四的训练过程。The following pseudocode explains the training process of step 4 in detail.

以上详细介绍了步骤四的训练过程。The above introduces the training process of step 4 in detail.

在步骤五中，数据结合与替换具体包括：In step 5, data combination and replacement specifically include:

(1)移动端用户在设定起止时间以及行为动作比例后，将从算法模型处获取拼接完善的传感器数据集合，等待进行实时替换。(1) After setting the start and end time and the action ratio, the mobile user will obtain a complete set of sensor data from the algorithm model and wait for real-time replacement.

(2)本发明使用Hook来实现拦截模块，实现对实时传感器数据的替换。从Zygote进程开始监听系统中分发传感器的类android.hardware.SystemSensorManager，以及其下的传感器数据处理子类SensorEventQueue、分发方法dispatchSensorEvent，等待数据替换模块进行操作。(2) The present invention uses Hook to implement the interception module to replace the real-time sensor data. The Zygote process starts to monitor the class android.hardware.SystemSensorManager that distributes sensors in the system, as well as its sensor data processing subclass SensorEventQueue and the dispatch method dispatchSensorEvent, waiting for the data replacement module to operate.

(3)结合获取的拟真数据，借助Java迭代器的形式完成数据替换模块，对Android系统中传感器数据分发接口封装，使系统每调用一次接口则消耗一组拟真数据。(3) Combined with the acquired simulated data, the data replacement module is completed with the help of Java iterator, and the sensor data distribution interface in the Android system is encapsulated so that the system consumes a set of simulated data each time the interface is called.

应当注意，本发明的实施方式可以通过硬件、软件或者软件和硬件的结合来实现。硬件部分可以利用专用逻辑来实现；软件部分可以存储在存储器中，由适当的指令执行系统，例如微处理器或者专用设计硬件来执行。本领域的普通技术人员可以理解上述的设备和方法可以使用计算机可执行指令和/或包含在处理器控制代码中来实现，例如在诸如磁盘、CD或DVD-ROM的载体介质、诸如只读存储器(固件)的可编程的存储器或者诸如光学或电子信号载体的数据载体上提供了这样的代码。本发明的设备及其模块可以由诸如超大规模集成电路或门阵列、诸如逻辑芯片、晶体管等的半导体、或者诸如现场可编程门阵列、可编程逻辑设备等的可编程硬件设备的硬件电路实现，也可以用由各种类型的处理器执行的软件实现，也可以由上述硬件电路和软件的结合例如固件来实现。It should be noted that the embodiments of the present invention can be implemented by hardware, software, or a combination of software and hardware. The hardware part can be implemented using dedicated logic; the software part can be stored in a memory and executed by an appropriate instruction execution system, such as a microprocessor or dedicated design hardware. It can be understood by a person of ordinary skill in the art that the above-mentioned devices and methods can be implemented using computer executable instructions and/or contained in a processor control code, such as a carrier medium such as a disk, CD or DVD-ROM, a programmable memory such as a read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. Such code is provided on the carrier medium. The device and its modules of the present invention can be implemented by hardware circuits such as very large-scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, etc., or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., can also be implemented by software executed by various types of processors, and can also be implemented by a combination of the above-mentioned hardware circuits and software, such as firmware.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，都应涵盖在本发明的保护范围之内。The above description is only a specific implementation mode of the present invention, but the protection scope of the present invention is not limited thereto. Any modifications, equivalent substitutions and improvements made by any technician familiar with the technical field within the technical scope disclosed by the present invention and within the spirit and principle of the present invention should be covered by the protection scope of the present invention.

Claims

1. A sensor data protection method, characterized in that the sensor data protection method comprises the following steps:

The first step is to initialize the system. The user inputs the action distribution, which includes the proportion of standing, walking, running, sitting, lying, and going up and down stairs. The transfer matrix is constructed by the predefined action transition probability and the action distribution input by the user, and multiple rounds of iterations are performed to verify whether a stable distribution is achieved, so as to provide feasibility support for the subsequent generation of action sequences. Calculate the state transition matrix , where Represents all behavior action states; through the initialization vector , into the formula , where Represents the state transfer matrix, and we get Distribution during round iterations;

The second step is to construct a simulated action sequence. The proposed distribution and acceptance distribution of the constructed transfer matrix are used in combination with the random walk algorithm to generate a behavior action sequence that conforms to the predefined action distribution, providing data support for the subsequent sensor simulation data arrangement rules; the random walk algorithm based on the Markov chain Monte Carlo method is used to generate the action sequence, and the acceptance distribution is directly used in the random walk algorithm:

;

In the formula, Indicates status Distribution, Indicates status Distribution;

The third step is to generate sensor simulated data. The adversarial network model is trained with real data in advance, so that the accuracy of the simulated data generated by the model in the action recognition task reaches more than 90%. For each action, multiple sets of data are generated as a buffer to provide the original data template for the subsequent simulated data space expansion task.

The fourth step is to expand the simulated data space, take out the data of each action in the buffer, combine them according to the filter combination rules, and use the Bayesian optimization algorithm to select multiple parameters that can achieve local optimality;

The fifth step is data combination and replacement. The simulated data is filled in according to the behavioral action sequence generated by the sensor simulated data. The sensor data distribution interface is hooked at the bottom layer of the mobile device to replace the batch of sensor data. The privacy and security of sensor data are protected from the mobile terminal data release link.

2. The sensor data protection method according to claim 1 is characterized in that the sensor data protection method replaces the sensor data by using the sensor simulated data sequence at all times and in all directions.

3. The sensor data protection method according to claim 1 is characterized in that the multi-sensor simulation data generation adopts a random walk algorithm based on the Markov chain Monte Carlo method, and by introducing the transition probability between actions as a suggested distribution for constructing the Markov matrix, the receiving distribution in the process of constructing the Markov chain is improved, so that after the random walk algorithm is completed, a behavior action sequence that conforms to the predefined distribution can be generated;

The multi-sensor simulated data generation adopts a time series-based generative adversarial network model combined with a filter combination method based on Bayesian optimization. The time series generative adversarial network generates data sensor data that conforms to a predefined classification, introduces a filter combination method and uses Bayesian optimization to search for filter combination parameters that meet the requirements.

4. The sensor data protection method according to claim 1 is characterized in that the second step of generating a simulated action sequence is: using a random walk algorithm based on the Markov chain Monte Carlo method, constructing a state transfer matrix according to the action ratio preset by the user, and realizing the generation of a false behavior action sequence; using the Monte Carlo method, constructing a Markov transfer matrix using a transfer kernel formula of:

;

In the formula is called the proposal distribution, is called the receiving distribution; it is suggested that the distribution is symmetric and the receiving distribution is:

;

In the formula, Indicates status The proportion of Indicates status The recommended distribution is from state To status The transition probability satisfies , where Representation and Status A set of adjacent states, including the state ;

The value function of the adversarial network model in the third step is:

;

The first part on the right side of the formula represents the expectation of the discriminator trained on real data represented by the high-dimensional latent space, and the second part represents the expectation of the discriminator trained on synthetic data of the high-dimensional latent space synthesized by the generator; represents the generator network, represents the discriminator network, Express expectations, represents the real data sampled from the real dataset, represents the logarithmic function, Represents real data, represents the real data represented in a high-dimensional latent space, represents a random noise vector sampled from a normal distribution, represents a random noise vector;

The embedding recovery loss calculation uses the following formula to calculate the difference between the original data and the data processed by the embedding function module and the recovery function module:

;

In the formula, Indicates the difference between the original data and the restored data. represents the mathematical expectation, Represents the original data, Represents the data that is mapped from the original space to the latent space, and from the latent space to the original space, represents the L2 norm;

The binary judgment module uses the following loss function to calculate the difference between real data and synthetic data during training:

;

In the formula, Represents the cross entropy function between real data and synthetic data, Represents real data, Represents synthetic data.

5. The sensor data protection method according to claim 1, characterized in that the fourth step of expanding the simulated data space adopts a filter combination method to achieve the expansion of the data space, and at the same time adopts a Bayesian optimization method to find each parameter of the filter combination;

Among them, the filter combination method is to combine the original data and the filtered data according to the formula after the frequency and combination ratio are cut off according to the simulated data generated by the generative adversarial network:

;

In the formula, the first part on the right side of the equal sign represents a certain proportion of the original data, and the second part represents a certain proportion of the filtered data; represents the proportion of filtered data in the combined data, represents the cutoff frequency of the filter, Represents the original data, Represents the data after filtering. Indicates the proportion of original data in the combined data; the left side of the formula equal sign indicates the result after filtering combination;

Use Bayesian optimization to find parameters , including optimizing expressions, fitting models, and collecting functions;

Determine the optimization expression, use the Gaussian process as the fitting model, and the probability boost function as the acquisition function:

;

In the formula, the right side of the formula equal sign represents the distance between the filtered combined data and the original data, and the left side of the formula equal sign represents the specific value of the distance; represents the dynamic time-adjusted distance calculation function, represents the filtered combined data, Show original data;

The interception and replacement of sensor data in the fifth step: implement the sensor monitoring module through Hook, and perform bottom-level interception and replacement on the sensor data transmission interface; find the module class android.hardware.SystemSensorManager that controls the distribution of sensor data in the Android8.0 system source code, find the specific sensor processing subclass SensorEventQueue in the module class, and the distribution function dispatchSensorEvent therein; Hook the dispatchSensorEvent method under SystemSensorManager in the system service process, load the pre-compiled replacement function module, and replace the sensor interface with synthetic data.

6. A computer device, characterized in that the computer device comprises a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the steps of the sensor data protection method according to any one of claims 1 to 5.

7. An information data processing terminal, characterized in that the information data processing terminal is used to implement the sensor data protection method described in any one of claims 1 to 5.

8. A sensor data protection system for implementing the sensor data protection method according to any one of claims 1 to 5, characterized in that the sensor data protection system comprises:

The system initialization module is used to realize the user input action distribution, build a transfer matrix through the predefined action transition probability and the user input action distribution, and perform multiple rounds of iterations to verify whether a stable distribution can be achieved, providing feasibility support for the subsequent generation of action sequences;

The simulated action sequence construction module is used to use the proposed distribution and the accepted distribution of the constructed transfer matrix, combined with the random walk algorithm to generate a behavior action sequence that conforms to the predefined action distribution, providing data support for the subsequent sensor simulated data arrangement rules;

The sensor simulation data generation module is used to pre-train the generative adversarial network model with real data, so that the accuracy of the simulation data generated by the model in the action recognition task reaches more than 90%. For each action, multiple sets of data are generated as a buffer to provide the original data template for the subsequent simulation data space expansion task;

Expand the simulated data space module to extract the data of each action in the buffer, combine them according to the filter combination rules, and use the Bayesian optimization algorithm to select multiple parameters that can achieve local optimality. Since data with large differences from the original data but the same classification can be generated, this appropriately solves the problem of mode collapse that may exist in the generative adversarial network;

The data combination and replacement module is used to generate simulated data according to the behavioral action sequence generated by the sensor simulated data, hook the sensor data distribution interface at the bottom layer of the mobile device, replace the batch of sensor data, and protect the privacy and security of sensor data from the mobile terminal data release link.

9. The sensor data protection system according to claim 8, characterized in that the sensor data protection system further comprises: a generator, a discriminator;

The generator includes an embedding function module, a recovery function module, an embedding recovery loss calculation module, a multi-scale circulation module, and a timing function module;

The embedding function module is used to map data from a low dimension in the original space to a high dimension in the latent space; the recovery function module is connected to the embedding function module and is used to accurately restore data from a high-dimensional latent space to a low-dimensional real space; the embedding recovery loss calculation module is used to calculate the difference between the real data and the original data after being processed by the embedding function module and the recovery function module, and is used to repeatedly train the embedding function module and the recovery function module so that the original data can be accurately expressed in the high-dimensional space; the multi-scale cycle module is used to learn the time domain characteristics of each dimension of the multi-sensor and the correlation between the time domain characteristics of each dimension; the timing function module is used to better represent the synthetic data output by the generator in the high-dimensional latent space during the adversarial training process;

The discriminator includes a binary judgment function module and a similarity calculation module; the binary judgment function module is used to distinguish real data from synthetic data during adversarial training; the similarity calculation module is connected to the binary judgment function module and is used to calculate the cosine similarity between the synthetic data and the real data in the low-dimensional original space;

The embedding function module and the restoration function module are both composed of a multi-scale recurrent neural network and a fully connected network layer. The multi-scale recurrent neural network is composed of one-dimensional recurrent neural network layers of different sizes. The output of each node in the last layer of the multi-scale recurrent neural network is used as the input of the fully connected layer.

The timing function module includes a fully connected network and a GRU network;

The embedding recovery loss calculation module uses the following formula to calculate the difference between the original data and the data processed by the embedding function module and the recovery function module;

The binary judgment module uses the following loss function to calculate the difference between real data and synthetic data during training.