CN115168900A - Track data privacy protection method and system for intelligent traffic system - Google Patents
Track data privacy protection method and system for intelligent traffic system Download PDFInfo
- Publication number
- CN115168900A CN115168900A CN202210866318.2A CN202210866318A CN115168900A CN 115168900 A CN115168900 A CN 115168900A CN 202210866318 A CN202210866318 A CN 202210866318A CN 115168900 A CN115168900 A CN 115168900A
- Authority
- CN
- China
- Prior art keywords
- trajectory
- synthetic
- data
- centroid
- trajectory data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 239000011159 matrix material Substances 0.000 claims abstract description 33
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 21
- 238000013136 deep learning model Methods 0.000 claims abstract description 18
- 238000003064 k means clustering Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 64
- 239000013598 vector Substances 0.000 claims description 62
- 238000012549 training Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 24
- 230000008569 process Effects 0.000 claims description 23
- 238000004590 computer program Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000006403 short-term memory Effects 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 239000002131 composite material Substances 0.000 claims description 3
- 230000007787 long-term memory Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005336 cracking Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioethics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域technical field
本发明涉及数据安全保护技术领域,特别涉及一种用于智慧交通系统的轨迹数据隐私保护方法及系统。The invention relates to the technical field of data security protection, in particular to a method and system for privacy protection of trajectory data used in an intelligent transportation system.
背景技术Background technique
智慧交通系统是将信息技术、计算机技术、数据通信技术、传感器技术、电子控制技术、人工智能等先进的科学技术有效地综合运用于交通运输、服务控制和车辆制造,加强车辆、道路、使用者三者之间的联系,从而形成一种保障安全、提高效率、改善环境、节约能源的综合运输系统;轨迹数据在智慧交通系统中是具有重要作用的隐私数据,比如在城市交通系统中,可以通过统计分析车辆的移动轨迹,制定更加合理的城市交通方案来避免城市拥堵,导航软件可以通过分析车辆的轨迹来推测车主的驾驶偏好以便于推荐最优的行驶路线。Intelligent transportation system is the effective and comprehensive application of advanced science and technology such as information technology, computer technology, data communication technology, sensor technology, electronic control technology, artificial intelligence, etc. in transportation, service control and vehicle manufacturing, strengthening vehicles, roads, users The connection between the three forms a comprehensive transportation system that guarantees safety, improves efficiency, improves the environment, and saves energy; trajectory data is an important privacy data in the smart transportation system. For example, in the urban transportation system, it can be Through statistical analysis of the moving trajectory of the vehicle, a more reasonable urban traffic plan can be formulated to avoid urban congestion. The navigation software can infer the driving preference of the owner by analyzing the trajectory of the vehicle in order to recommend the optimal driving route.
然而,轨迹数据在提供便利的同时也伴随着相应的隐私安全问题,轨迹数据经过收集后通常是对大众开放以便研究者对数据进行挖掘分析;但是,在轨迹数据的收集和使用过程中,攻击者可能会跟踪用户的移动,或者进一步推知出用户的工作场所,家庭住址,社会关系,身体状况,爱好等私人信息,将其卖给第三方以获取经济利益,使得用户的隐私泄露,因此如何在保护用户隐私的条件下同时使得轨迹数据发挥作用的问题尤为重要。However, while the trajectory data provides convenience, it is also accompanied by corresponding privacy and security issues. After the trajectory data is collected, it is usually open to the public for researchers to mine and analyze the data; however, during the collection and use of trajectory data, attacks The attacker may track the user's movement, or further infer the user's workplace, home address, social relations, physical condition, hobbies and other private information, and sell it to a third party for economic benefits, so that the user's privacy is leaked, so how? The problem of making trajectory data work while protecting user privacy is particularly important.
目前通常采用匿名算法k-anonymity对数据进行隐私保护,k-anonymity要求每个用户在某个时间和空间范围内与其它至少k-1个用户不可区分,使攻击者不能从至少k个用户中识别攻击目标进而推断出其准确位置;但由于轨迹坐标点相对离散,很难满足k-anonymity的使用条件,而且有研究表明k-anonymity在真实数据上使用隐私保护算法,依然无法抵制像组合攻击,确切攻击,或背景知识攻击等隐私窃取手段;并且这一类算法在使用的过程中需要频繁地使用聚类算法,会给轨迹精度造成比较大的影响;而且当数据集变更时,这类模型需要重新构建因此不具备重复利用的特点,会因此大大增加额外的时间开销。At present, the anonymity algorithm k-anonymity is usually used to protect the privacy of data. k-anonymity requires each user to be indistinguishable from at least k-1 other users within a certain time and space, so that attackers cannot obtain data from at least k users. Identify the attack target and then infer its exact location; however, due to the relative discreteness of the trajectory coordinate points, it is difficult to meet the use conditions of k-anonymity, and some studies have shown that k-anonymity uses privacy protection algorithms on real data, but it still cannot resist combined attacks. , exact attack, or privacy stealing means such as background knowledge attack; and this type of algorithm needs to use clustering algorithm frequently in the process of use, which will have a relatively large impact on the trajectory accuracy; and when the data set changes, such The model needs to be rebuilt so it does not have the characteristics of reuse, which will greatly increase the extra time overhead.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于克服现有技术中的不足,提供一种用于智慧交通系统的轨迹数据隐私保护方法及系统,解决了现有技术中存在的因在真实数据集上频繁使用聚类算法和不可复用性带来的轨迹数据集发布方法准确率不够、效率不足以及未充分保护轨迹数据隐私的技术问题。The purpose of the present invention is to overcome the deficiencies in the prior art, provide a trajectory data privacy protection method and system for an intelligent transportation system, and solve the problems in the prior art due to frequent use of clustering algorithms and real data sets on real data sets. The non-reusability brings about the technical problems of insufficient accuracy and efficiency of trajectory data set publishing methods, and insufficient protection of trajectory data privacy.
为解决上述技术问题,本发明是采用下述技术方案实现的:In order to solve the above-mentioned technical problems, the present invention adopts the following technical solutions to realize:
第一方面,本发明提供一种用于智慧交通系统的轨迹数据隐私保护方法,所述方法包括:In a first aspect, the present invention provides a trajectory data privacy protection method for an intelligent transportation system, the method comprising:
获取用户实时的真实轨迹数据集;Obtain real-time real-time trajectory data sets of users;
将所述真实轨迹数据集载入预先构建并训练好的端到端深度学习模型中生成合成轨迹数据集;Load the real trajectory dataset into a pre-built and trained end-to-end deep learning model to generate a synthetic trajectory dataset;
使用k-means聚类算法基于欧氏距离对每一时间戳下所述合成轨迹数据集的轨迹点进行聚类,并将聚类后不同时间戳下的轨迹簇心通过随机组合的方式生成泛化轨迹;Use the k-means clustering algorithm to cluster the trajectory points of the synthetic trajectory data set under each timestamp based on the Euclidean distance, and generate the cluster centers of the trajectory clusters under different timestamps after the clustering by random combination. trajectories;
对所述泛化轨迹的计数矩阵添加Laplace噪声和一致性约束后得到噪声量受限的差分隐私计数矩阵并进行发布。After adding Laplace noise and consistency constraints to the counting matrix of the generalized trajectory, a differentially private counting matrix with limited noise is obtained and published.
结合第一方面,优选的,所述端到端深度学习模型的训练过程包括以下步骤:In combination with the first aspect, preferably, the training process of the end-to-end deep learning model includes the following steps:
采集不同用户的历史真实轨迹数据作为训练模型的原始轨迹数据;Collect the historical real trajectory data of different users as the original trajectory data of the training model;
将所述原始轨迹数据通过质心标准化处理得到每个轨迹点的质心偏差坐标,并获得编码后的标准原始轨迹;The original trajectory data is processed by centroid standardization to obtain the centroid deviation coordinates of each trajectory point, and the encoded standard original trajectory is obtained;
通过线性整流函数将每个所述质心偏差坐标均处理成64维向量;Each of the centroid deviation coordinates is processed into a 64-dimensional vector by a linear rectification function;
利用64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测来获取合成轨迹数据,并采用tanh双曲正切函数对所述合成轨迹数据的质心偏差坐标进行解码并获得解码后的合成轨迹;Use 64 LSTM Cell long short-term memory network units to perform time series prediction on the 64-dimensional vector to obtain synthetic trajectory data, and use the tanh hyperbolic tangent function to decode the centroid deviation coordinates of the synthetic trajectory data and obtain the decoded synthetic trajectory;
结合所述标准原始轨迹通过轨迹损失函数计算所述合成轨迹的轨迹相似性损失值;Calculate the trajectory similarity loss value of the synthetic trajectory through a trajectory loss function in combination with the standard original trajectory;
通过sigmod激活函数对所述轨迹相似性损失值进行二分类处理,获得所述合成轨迹的判别结果;Perform binary classification processing on the trajectory similarity loss value through the sigmod activation function to obtain the discrimination result of the synthetic trajectory;
若所述判别结果为假,则将所述原始轨迹数据重新输入模型进行重复训练,并利用反向传播算法解决模型优化问题来进行更新模型的网络参数,直至判别结果为真时停止训练,将判别结果为真的合成轨迹输出形成合成轨迹数据集。If the judgment result is false, then re-input the original trajectory data into the model for repeated training, and use the backpropagation algorithm to solve the model optimization problem to update the network parameters of the model, until the judgment result is true, stop training, and set the The synthetic trajectory output of the discriminant result is true to form a synthetic trajectory dataset.
结合第一方面,优选的,所述通过线性整流函数将每个所述质心偏差坐标均处理成64维向量之后还包括:对所述向量添加随机噪声,使所述原始轨迹每组向量都保持相同的长度。With reference to the first aspect, preferably, after processing each of the centroid deviation coordinates into a 64-dimensional vector by using a linear rectification function, the method further includes: adding random noise to the vector, so that each set of vectors of the original trajectory is maintained. the same length.
结合第一方面,优选的,结合所述标准原始轨迹通过轨迹损失函数计算所述合成轨迹的轨迹相似性损失的步骤包括:With reference to the first aspect, preferably, the step of calculating the trajectory similarity loss of the synthetic trajectory by using the trajectory loss function in combination with the standard original trajectory includes:
对所述合成轨迹进行质心标准化处理得到每个轨迹点的对应质心偏差坐标,并获得编码后的标准合成轨迹;Performing centroid normalization processing on the synthetic trajectory to obtain the corresponding centroid deviation coordinates of each trajectory point, and obtaining an encoded standard synthetic trajectory;
通过线性整流函数将每个所述标准原始轨迹和所述标准合成轨迹的质心偏差坐标分别处理成对应的64维向量;The centroid deviation coordinates of each of the standard original trajectory and the standard synthetic trajectory are respectively processed into corresponding 64-dimensional vectors by a linear rectification function;
结合两组所述对应的64维向量及轨迹损失函数,通过公式(1)计算训练得到所述合成轨迹的轨迹相似性损失值TLoss:Combined with the corresponding 64-dimensional vectors and the trajectory loss function of the two groups, the trajectory similarity loss value TLoss of the synthetic trajectory is obtained through the calculation and training of formula (1):
TLoss=αLBCE(lt,lp)+βLGPS(tt,tp) (1)TLoss=αL BCE (l t ,l p )+βL GPS (t t ,t p ) (1)
式中,lt和lp分别表示标准原始轨迹和标准合作轨迹轨的判别标签;tt和tp分别表示标准原始轨迹和对应的标准合作轨迹轨的64维向量;LBCE表示二元交叉熵损失函数,LGPS表示使用最小二乘误差来衡量两条轨迹之间的相似程度的损失函数;α和β分别表示LBCE和LGPS的权重。where l t and l p represent the discriminative labels of the standard original trajectory and standard cooperative trajectory, respectively; t t and t p represent the 64-dimensional vector of the standard original trajectory and the corresponding standard cooperative trajectory, respectively; L BCE represents the binary crossover Entropy loss function, L GPS represents a loss function that uses least squares error to measure the similarity between two trajectories; α and β represent the weights of L BCE and L GPS , respectively.
结合第一方面,优选的,所述通过线性整流函数将每个所述质心偏差坐标均处理成64维向量的计算公式如下:In combination with the first aspect, preferably, the calculation formula for processing each of the centroid deviation coordinates into a 64-dimensional vector through a linear rectification function is as follows:
式中,表示编号为i的轨迹点的64维向量,△lati,和△loni分别表示编号为i的轨迹点的经度偏差和纬度偏差,fGPS表示线性整流函数,WGPS表示轨迹点i的质心偏差坐标(△lati,△loni)的向量权重。In the formula, Represents the 64-dimensional vector of the trajectory point numbered i, △lat i , and △lon i represent the longitude deviation and latitude deviation of the trajectory point numbered i, respectively, f GPS represents the linear rectification function, W GPS represents the centroid of the trajectory point i Vector weights for the deviation coordinates (Δlat i , Δlon i ).
结合第一方面,优选的,所述利用64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测来获取合成轨迹数据的计算公式如下:In combination with the first aspect, preferably, the calculation formula for obtaining synthetic trajectory data by using 64 LSTM Cell long short-term memory network units to perform time series prediction on the 64-dimensional vector is as follows:
O=LSTM(T,Wlstm) (3)O=LSTM(T,W lstm ) (3)
式中:T={t1,t2,…,ti,…,tmaxlength},T表示原始轨迹数据中所有轨迹点的坐标特征,其中ti表示编号为i的轨迹点的64维向量,maxlength表示原始轨迹数据中单条轨迹的最长长度;Wlstm表示输入的数据的权重矩阵;O表示合成轨迹数据,其中O内包含的数据表示为O={o1,o2,…,oi,…,omaxlength},oi表示ti在经过LSTM处理后的输出的坐标合成值。In the formula: T={t 1 ,t 2 ,…,t i ,…,t maxlength }, T represents the coordinate features of all trajectory points in the original trajectory data, and t i represents the 64-dimensional vector of the trajectory point numbered i , maxlength represents the longest length of a single trajectory in the original trajectory data; W lstm represents the weight matrix of the input data; O represents the synthetic trajectory data, where the data contained in O is represented as O={o 1 ,o 2 ,...,o i ,...,o maxlength }, o i represents the coordinate composite value of the output of t i after LSTM processing.
结合第一方面,优选的,所述采用tanh双曲正切函数对所述合成轨迹数据的质心偏差坐标进行解码的计算公式如下:In combination with the first aspect, preferably, the calculation formula for decoding the centroid deviation coordinates of the synthetic trajectory data using the tanh hyperbolic tangent function is as follows:
式中,表示合成轨迹数据中轨迹点i的质心偏差坐标的解码坐标,和分别表示轨迹点i的经度偏差和纬度偏差;WdGPS表示坐标向量的解码矩阵权重;DGPS为tanh双曲正切函数。In the formula, is the decoded coordinate representing the centroid deviation coordinate of the trajectory point i in the synthetic trajectory data, and respectively represent the longitude deviation and latitude deviation of the track point i; W dGPS represents the decoding matrix weight of the coordinate vector; D GPS is the tanh hyperbolic tangent function.
第二方面,本发明提供一种用于智慧交通系统的轨迹数据隐私保护系统,所述系统包括:In a second aspect, the present invention provides a trajectory data privacy protection system for an intelligent transportation system, the system comprising:
获取模块,用于获取用户实时的真实轨迹数据集;The acquisition module is used to acquire the real-time real trajectory data set of the user;
合成轨迹模块,用于将所述真实轨迹数据集载入预先构建并训练好的端到端深度学习模型中生成合成轨迹数据集;a synthetic trajectory module, used to load the real trajectory data set into a pre-built and trained end-to-end deep learning model to generate a synthetic trajectory data set;
聚类模块,用于使用k-means聚类算法基于欧氏距离对每一时间戳下所述合成轨迹数据集的轨迹点进行聚类,并将聚类后不同时间戳下的轨迹簇心通过随机组合的方式生成泛化轨迹;The clustering module is configured to use the k-means clustering algorithm to cluster the trajectory points of the synthetic trajectory data set under each timestamp based on the Euclidean distance, and to pass the clustered trajectory cluster centers under different timestamps after the clustering. Generate generalization trajectories in a random combination;
加噪发布模块,用于对所述泛化轨迹的计数矩阵添加Laplace噪声和一致性约束后得到噪声量受限的差分隐私计数矩阵并进行发布。A noise-adding publishing module is used to add Laplace noise and consistency constraints to the counting matrix of the generalized trajectory to obtain a differentially private counting matrix with limited noise and publish it.
结合第二方面,优选的,所述合成轨迹模块包括采集单元;所述端到端深度学习模型包括轨迹生成器和轨迹辨别器;所述轨迹生成器包括第一输入层、第一嵌入层、第一LSTM建模层、第一输出层;所述轨迹辨别器包括第二输入层、第二嵌入层、第二LSTM建模层、第二输出层;其中:With reference to the second aspect, preferably, the synthetic trajectory module includes an acquisition unit; the end-to-end deep learning model includes a trajectory generator and a trajectory discriminator; the trajectory generator includes a first input layer, a first embedding layer, a first LSTM modeling layer, a first output layer; the trajectory discriminator includes a second input layer, a second embedding layer, a second LSTM modeling layer, and a second output layer; wherein:
采集单元,用于采集不同用户的历史真实轨迹数据作为训练模型的原始轨迹数据;The acquisition unit is used to collect the historical real trajectory data of different users as the original trajectory data of the training model;
第一输入层,用于将所述原始轨迹数据通过质心标准化处理得到每个轨迹点的质心偏差坐标,并获得编码后的标准原始轨迹;The first input layer is used to obtain the centroid deviation coordinates of each trajectory point by performing the centroid normalization process on the original trajectory data, and obtain the encoded standard original trajectory;
第一嵌入层,用于利用多层感知器MLP通过线性整流函数,将每个所述质心偏差坐标均处理成64维向量;并对所述向量添加随机噪声,使所述原始轨迹中的每组向量都保持与最长轨迹相同的长度;The first embedding layer is used to use the multilayer perceptron MLP to process each of the centroid deviation coordinates into a 64-dimensional vector through a linear rectification function; and add random noise to the vector, so that each The group vectors all keep the same length as the longest trajectory;
第一LSTM建模层,用于通过64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测处理,得到合成轨迹数据;The first LSTM modeling layer is used to perform time series prediction processing on the 64-dimensional vector through 64 LSTM Cell long-term and short-term memory network units to obtain synthetic trajectory data;
第一输出层,用于使用两个密集层Den通过tanh双曲正切函数对所述合成轨迹数据的经纬度偏差进行解码,获得解码后的合成轨迹;The first output layer is used to decode the longitude and latitude deviation of the synthetic trajectory data by using the two dense layers Den through the tanh hyperbolic tangent function to obtain the decoded synthetic trajectory;
第二输入层,用于将所述标准原始轨迹及所述合成轨迹作为所述轨迹辨别器的输入数据,并对所述合成轨迹进行质心标准化处理得到每个轨迹点的对应质心偏差坐标,并获得编码后的标准合成轨迹;The second input layer is configured to use the standard original trajectory and the synthesized trajectory as the input data of the trajectory discriminator, and perform centroid normalization on the synthesized trajectory to obtain the corresponding centroid deviation coordinates of each trajectory point, and Obtain the encoded standard synthetic trajectory;
第二嵌入层,用于利用多层感知器MLP通过线性整流函数,将每个所述标准原始轨迹和所述标准合成轨迹的质心偏差坐标分别处理成对应的64维向量;The second embedding layer is used for using the multi-layer perceptron MLP to process the centroid deviation coordinates of each of the standard original trajectory and the standard synthetic trajectory into a corresponding 64-dimensional vector respectively through a linear rectification function;
第二LSTM建模层,用于通过64个LSTM Cell利用两组所述对应的64维向量通过轨迹损失函数计算轨迹相似性损失值;The second LSTM modeling layer is used to calculate the trajectory similarity loss value through the trajectory loss function using the corresponding 64-dimensional vectors of the two groups through 64 LSTM Cells;
第二输出层,用于使用一个密集层Den通过sigmod激活函数对所述轨迹相似性损失值进行二分类处理,获得所述合成轨迹的判别结果;若判别结果为假,则将所述原始轨迹数据重新输入轨迹生成器进行训练,并利用反向传播算法解决模型优化问题来进行更新轨迹生成器中第一LSTM建模层中的网络参数,直到判别结果为真时停止训练,将判别结果为真的合成轨迹输出形成合成轨迹数据集。The second output layer is used to use a dense layer Den to perform binary classification processing on the trajectory similarity loss value through the sigmod activation function to obtain the discrimination result of the synthetic trajectory; if the discrimination result is false, the original trajectory The data is re-entered into the trajectory generator for training, and the back-propagation algorithm is used to solve the model optimization problem to update the network parameters in the first LSTM modeling layer in the trajectory generator. The training is stopped until the judgment result is true, and the judgment result is The real synthetic trajectory output forms the synthetic trajectory dataset.
第三方面,本发明提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现如第一方面任一项所述的用于智慧交通系统的轨迹数据隐私保护方法的步骤。In a third aspect, the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, realizes the trajectory data for an intelligent transportation system according to any one of the first aspects The steps of the privacy protection method.
与现有技术相比,本发明所达到的有益效果:Compared with the prior art, the beneficial effects achieved by the present invention:
本发明提供的利用端到端深度学习模型经过轨迹损失函数判断轨迹相似性损失达标后生成合成轨迹用于替换真实轨迹,避免在真实轨迹数据集在进行数据加密发布时发生隐私泄露,并且在真实轨迹数据集发生更新时,不需要重新训练模型,只需要更换输入就可以获取合成轨迹,大大减少了数据保护过程中的时间;另外,在模型训练过程中对所述向量添加随机噪声,使所述原始轨迹每组向量都保持相同的长度,加快训练过程,提高了计算效率;最后在可信第三方获取合成轨迹后,使用k-means算法基于欧氏距离对单位时间内的轨迹点进行聚类,将聚类后不同时间戳下的轨迹簇心通过随机组合的方式生成泛化轨迹,并且最终的轨迹数目加上Laplace噪声和一致性约束进行发布,满足差分隐私需求,本发明能够保证轨迹数据隐私性的同时提高轨迹数据发布的有用性。The end-to-end deep learning model provided by the present invention uses the trajectory loss function to determine that the trajectory similarity loss meets the standard to generate a synthetic trajectory to replace the real trajectory, so as to avoid privacy leakage when the real trajectory data set is encrypted and released, and in the real trajectory When the trajectory data set is updated, the model does not need to be retrained, and the synthetic trajectory can be obtained only by changing the input, which greatly reduces the time in the data protection process; in addition, random noise is added to the vector during the model training process, so that all Each set of vectors in the original trajectory maintains the same length, which speeds up the training process and improves the computational efficiency. Finally, after obtaining the synthetic trajectory from a trusted third party, the k-means algorithm is used to cluster the trajectory points per unit time based on the Euclidean distance. The generalized trajectories are generated by random combination of the trajectory cluster centers under different timestamps after clustering, and the final trajectory number plus Laplace noise and consistency constraints are published to meet the differential privacy requirements, and the present invention can ensure the trajectory Data privacy while improving the usefulness of trajectory data release.
附图说明Description of drawings
图1是本发明实施例提供的用于智慧交通系统的轨迹数据隐私保护方法流程图;1 is a flowchart of a method for protecting trajectory data privacy for an intelligent transportation system provided by an embodiment of the present invention;
图2是本发明实施例提供的用于智慧交通系统的轨迹数据隐私保护方法中差分隐私发布流程图;2 is a flowchart of differential privacy publishing in the trajectory data privacy protection method for an intelligent transportation system provided by an embodiment of the present invention;
图3是本发明实施例提供的用于智慧交通系统的轨迹数据隐私保护系统的结构原理框图;3 is a structural principle block diagram of a trajectory data privacy protection system for an intelligent transportation system provided by an embodiment of the present invention;
图4是本发明实施例提供的端到端深度学习模型训练的结构原理框图。FIG. 4 is a structural principle block diagram of an end-to-end deep learning model training provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面通过附图以及具体实施例对本发明技术方案做详细的说明,应当理解本申请实施例以及实施例中的具体特征是对本申请技术方案的详细的说明,而不是对本申请技术方案的限定,在不冲突的情况下,本申请实施例以及实施例中的技术特征可以相互组合。The technical solutions of the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. If there is no conflict, the embodiments of the present application and the technical features in the embodiments may be combined with each other.
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符"/",一般表示前后关联对象是一种“或”的关系。The term "and/or" in this article is only an association relationship to describe the associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, A and B exist at the same time, and A and B exist independently B these three cases. In addition, the character "/" in this article generally indicates that the related objects before and after are an "or" relationship.
实施例一:Example 1:
如图1所示,本实施例介绍一种用于智慧交通系统的轨迹数据隐私保护方法,具体包括如下步骤:As shown in FIG. 1 , this embodiment introduces a trajectory data privacy protection method for an intelligent transportation system, which specifically includes the following steps:
步骤一:获取用户实时的真实轨迹数据集;Step 1: Obtain the real-time real trajectory data set of the user;
步骤二:将所述真实轨迹数据集载入预先构建并训练好的端到端深度学习模型中生成合成轨迹数据集;Step 2: Load the real trajectory data set into a pre-built and trained end-to-end deep learning model to generate a synthetic trajectory data set;
步骤三:使用k-means聚类算法基于欧氏距离对每一时间戳下所述合成轨迹数据集的轨迹点进行聚类,并将聚类后不同时间戳下的轨迹簇心通过随机组合的方式生成泛化轨迹;Step 3: Use the k-means clustering algorithm to cluster the trajectory points of the synthetic trajectory data set under each timestamp based on the Euclidean distance, and cluster the trajectory cluster centers under different timestamps after the clustering through random combinations. way to generate generalization trajectories;
步骤四:对所述泛化轨迹的计数矩阵添加Laplace噪声和一致性约束后得到噪声量受限的差分隐私计数矩阵并进行发布。Step 4: After adding Laplace noise and consistency constraints to the counting matrix of the generalized trajectory, a differential privacy counting matrix with limited noise is obtained and published.
本发明实施例提供的步骤2中涉及到的端到端深度学习模型的训练过程包括以下步骤:The training process of the end-to-end deep learning model involved in
步骤1:采集不同用户的历史真实轨迹数据作为训练模型的原始轨迹数据;Step 1: Collect the historical real trajectory data of different users as the original trajectory data of the training model;
步骤2:将所述原始轨迹数据通过质心标准化处理得到每个轨迹点的质心偏差坐标,并获得编码后的标准原始轨迹;Step 2: standardize the original trajectory data through the centroid to obtain the centroid deviation coordinates of each trajectory point, and obtain the encoded standard original trajectory;
步骤3:通过线性整流函数将每个所述质心偏差坐标均处理成64维向量;其计算公式如下:Step 3: Process each of the centroid deviation coordinates into a 64-dimensional vector through a linear rectification function; the calculation formula is as follows:
式中,表示编号为i的轨迹点的64维向量,△lati,和△loni分别表示编号为i的轨迹点的经度偏差和纬度偏差,fGPS表示线性整流函数,WGPS表示轨迹点i的质心偏差坐标(△lati,△loni)的向量权重。In the formula, Represents the 64-dimensional vector of the trajectory point numbered i, △lat i , and △lon i represent the longitude deviation and latitude deviation of the trajectory point numbered i, respectively, f GPS represents the linear rectification function, W GPS represents the centroid of the trajectory point i Vector weights for the deviation coordinates (Δlat i , Δlon i ).
作为本发明的一种实施例,在本步骤中对所述向量添加随机噪声,即将空轨迹点填充到每条轨迹,使所述原始轨迹中的每组向量都保持与最长轨迹相同的长度;可以加快模型训练的过程,提高计算效率;As an embodiment of the present invention, random noise is added to the vector in this step, that is, empty track points are filled into each track, so that each set of vectors in the original track has the same length as the longest track ; It can speed up the model training process and improve the computing efficiency;
步骤4:利用64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测来获取合成轨迹数据,并采用tanh双曲正切函数对所述合成轨迹数据的质心偏差坐标进行解码并获得解码后的合成轨迹;Step 4: Use 64 LSTM Cell long short-term memory network units to perform time series prediction on the 64-dimensional vector to obtain synthetic trajectory data, and use the tanh hyperbolic tangent function to decode the centroid deviation coordinates of the synthetic trajectory data and obtain. Decoded synthetic trajectory;
其中,所述利用64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测来获取合成轨迹数据的计算公式如下:Wherein, the calculation formula for obtaining synthetic trajectory data by using 64 LSTM Cell long short-term memory network units to perform time series prediction on the 64-dimensional vector is as follows:
O=LSTM(T,Wlstm) (2)O=LSTM(T,W lstm ) (2)
式中:T={t1,t2,…,ti,…,tmaxlength},T表示原始轨迹数据中所有轨迹点的坐标特征,其中ti表示编号为i的轨迹点的64维向量,maxlength表示原始轨迹数据中单条轨迹的最长长度;Wlstm表示输入的数据的权重矩阵;O表示合成轨迹数据,其中O内包含的数据表示为O={o1,o2,…,oi,…,omaxlength},oi表示ti在经过LSTM处理后的输出的坐标合成值。In the formula: T={t 1 ,t 2 ,…,t i ,…,t maxlength }, T represents the coordinate features of all trajectory points in the original trajectory data, and t i represents the 64-dimensional vector of the trajectory point numbered i , maxlength represents the longest length of a single trajectory in the original trajectory data; W lstm represents the weight matrix of the input data; O represents the synthetic trajectory data, where the data contained in O is represented as O={o 1 ,o 2 ,...,o i ,...,o maxlength }, o i represents the coordinate composite value of the output of t i after LSTM processing.
所述采用tanh双曲正切函数对所述合成轨迹数据的质心偏差坐标进行解码的计算公式如下:The calculation formula for decoding the centroid deviation coordinates of the synthetic trajectory data using the tanh hyperbolic tangent function is as follows:
式中,表示合成轨迹数据中轨迹点i的质心偏差坐标的解码坐标,和分别表示轨迹点i的经度偏差和纬度偏差;WdGPS表示坐标向量的解码矩阵权重;DGPS为tanh双曲正切函数。In the formula, is the decoded coordinate representing the centroid deviation coordinate of the trajectory point i in the synthetic trajectory data, and respectively represent the longitude deviation and latitude deviation of the track point i; W dGPS represents the decoding matrix weight of the coordinate vector; D GPS is the tanh hyperbolic tangent function.
步骤5:结合所述标准原始轨迹通过轨迹损失函数计算所述合成轨迹的轨迹相似性损失值;Step 5: Calculate the trajectory similarity loss value of the synthetic trajectory through the trajectory loss function in combination with the standard original trajectory;
作为本发明的一种实施例,步骤5中结合所述标准原始轨迹通过轨迹损失函数计算所述合成轨迹的轨迹相似性损失的步骤包括:As an embodiment of the present invention, the step of calculating the trajectory similarity loss of the synthetic trajectory by using the trajectory loss function in combination with the standard original trajectory in step 5 includes:
步骤5.1:对所述合成轨迹进行质心标准化处理得到每个轨迹点的对应质心偏差坐标,并获得编码后的标准合成轨迹;Step 5.1: perform centroid normalization processing on the synthetic trajectory to obtain the corresponding centroid deviation coordinates of each trajectory point, and obtain an encoded standard synthetic trajectory;
步骤5.2:通过线性整流函数将每个所述标准原始轨迹和所述标准合成轨迹的质心偏差坐标分别处理成对应的64维向量;Step 5.2: The centroid deviation coordinates of each of the standard original trajectory and the standard synthetic trajectory are respectively processed into corresponding 64-dimensional vectors through a linear rectification function;
步骤5.3:结合两组所述对应的64维向量及轨迹损失函数,通过公式(4)计算训练得到所述合成轨迹的轨迹相似性损失值TLoss:Step 5.3: Combine the two sets of corresponding 64-dimensional vectors and the trajectory loss function, and calculate and train through formula (4) to obtain the trajectory similarity loss value TLoss of the synthetic trajectory:
TLoss=αLBCE(lt,lp)+βLGPS(tt,tp) (4)TLoss=αL BCE (l t ,l p )+βL GPS (t t ,t p ) (4)
式中,lt和lp分别表示标准原始轨迹和标准合作轨迹轨的判别标签;tt和tp分别表示标准原始轨迹和对应的标准合作轨迹轨的64维向量;LBCE表示二元交叉熵损失函数,LGPS表示使用最小二乘误差来衡量两条轨迹之间的相似程度的损失函数;α和β分别表示LBCE和LGPS的权重;where l t and l p represent the discriminative labels of the standard original trajectory and standard cooperative trajectory, respectively; t t and t p represent the 64-dimensional vector of the standard original trajectory and the corresponding standard cooperative trajectory, respectively; L BCE represents the binary crossover Entropy loss function, L GPS represents a loss function that uses least squares error to measure the similarity between two trajectories; α and β represent the weights of L BCE and L GPS , respectively;
进一步说明的是,本发明通过使用轨迹损失函数,来用判断本文提出的原始轨迹数据通过端到端深度学习模型训练得到的合成轨迹的轨迹相似性损失值,可以根据不同的场景设置不同的权重参数来改变相似性损失值的判断标准,加大了适用范围;It is further explained that the present invention uses the trajectory loss function to judge the trajectory similarity loss value of the synthetic trajectory obtained by training the original trajectory data proposed in this paper through the end-to-end deep learning model, and different weights can be set according to different scenarios. parameters to change the judgment standard of the similarity loss value, and the scope of application has been enlarged;
步骤6:通过sigmod激活函数对所述轨迹相似性损失值进行二分类处理,获得所述合成轨迹的判别结果;Step 6: Perform a binary classification process on the trajectory similarity loss value through a sigmod activation function to obtain a discrimination result of the synthetic trajectory;
若所述判别结果为假,则将所述原始轨迹数据重新输入模型进行重复训练,并利用反向传播算法解决模型优化问题来进行更新模型的网络参数,直至判别结果为真时停止训练,将判别结果为真的合成轨迹输出形成合成轨迹数据集。If the judgment result is false, then re-input the original trajectory data into the model for repeated training, and use the backpropagation algorithm to solve the model optimization problem to update the network parameters of the model, until the judgment result is true, stop training, and set the The synthetic trajectory output of the discriminant result is true to form a synthetic trajectory dataset.
下面通过几组数据进行试验来进一步阐明本发明提供的方法,首先原始轨迹数据通过质心标准化处理得到每个轨迹点的质心偏差坐标(△lati,△loni),其中△lati,和△loni分别表示编号为i的轨迹点的经度偏差和纬度偏差,并且得到编码后的标准原始轨迹,数据如下所示:The method provided by the present invention is further clarified by conducting experiments on several groups of data. First, the centroid deviation coordinates (Δlat i , Δlon i ) of each trajectory point are obtained by standardizing the original trajectory data through the centroid, where Δlat i , and Δ lon i represents the longitude deviation and latitude deviation of the track point numbered i respectively, and the encoded standard original track is obtained, and the data is as follows:
标准原始轨迹T1`中的三个轨迹点质心偏差坐标分别为:{(0.07202433,0.02669937),(0.07295694,0.02329249),(0.07202433,0.02669937)};The centroid deviation coordinates of the three trajectory points in the standard original trajectory T1` are: {(0.07202433, 0.02669937), (0.07295694, 0.02329249), (0.07202433, 0.02669937)};
标准原始轨迹T2`中的三个轨迹点质心偏差坐标分别为:{(-0.07308236,-0.01193083),(-0.01476175,-0.01333096),(-0.00440719,0.01410267)},The centroid deviation coordinates of the three trajectory points in the standard original trajectory T2` are: {(-0.07308236,-0.01193083),(-0.01476175,-0.01333096),(-0.00440719,0.01410267)},
标准原始轨迹T3`中的三个轨迹点质心偏差坐标分别为:{(0.07202433,0.02669937),(0.06020328,0.0145211),(-0.04868138,0.18410108)},The centroid deviation coordinates of the three trajectory points in the standard original trajectory T3` are: {(0.07202433, 0.02669937), (0.06020328, 0.0145211), (-0.04868138, 0.18410108)},
标准原始轨迹T4`中的三个轨迹点质心偏差坐标分别为:{(-0.07105364,-0.01321791),(-0.05255251,-0.02247193),(0.06672003,0.04290826)},The centroid deviation coordinates of the three trajectory points in the standard original trajectory T4` are: {(-0.07105364,-0.01321791),(-0.05255251,-0.02247193),(0.06672003,0.04290826)},
标准原始轨迹T5`中的三个轨迹点质心偏差坐标分别为:{(0.03229337,0.0198541),(0.03229337,0.0198541),(0.01348513,0.00932372);The centroid deviation coordinates of the three trajectory points in the standard original trajectory T5` are: {(0.03229337, 0.0198541), (0.03229337, 0.0198541), (0.01348513, 0.00932372);
标准原始轨迹T6`中的三个轨迹点质心偏差坐标分别为:{(0.04715935,0.02345322),(0.029829,0.02150825),(0.03253291,0.01968764)。The centroid deviation coordinates of the three trajectory points in the standard original trajectory T6` are: {(0.04715935, 0.02345322), (0.029829, 0.02150825), (0.03253291, 0.01968764).
接着,通过线性整流函数将上述六条标准轨迹的每个质心偏差坐标均处理成64维向量;再利用64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测来获取合成轨迹数据,并采用tanh双曲正切函数对所述合成轨迹数据的质心偏差坐标进行解码得到对应的解码坐标构成解码后的合成轨迹,得到结果如下:Next, each centroid deviation coordinate of the above six standard trajectories is processed into a 64-dimensional vector by a linear rectification function; then 64 LSTM Cell long short-term memory network units are used to perform time series prediction on the 64-dimensional vector to obtain synthetic trajectory data. , and use the tanh hyperbolic tangent function to decode the centroid deviation coordinates of the synthetic trajectory data to obtain the corresponding decoded coordinates The decoded synthetic trajectory is formed, and the results are as follows:
合成轨迹T1中的相应的三个轨迹点解码坐标分别为:{(0.07200033,0.02660037),(0.07295114,0.02322249),(0.07201233,0.02661237)},The corresponding decoding coordinates of the three trajectory points in the synthetic trajectory T1 are: {(0.07200033, 0.02660037), (0.07295114, 0.02322249), (0.07201233, 0.02661237)},
合成轨迹T2中的相应的三个轨迹点解码坐标分别为:{(-0.07308326,-0.01193803),(-0.01471675,-0.01330396),(-0.00447019,0.01412067)},The corresponding decoding coordinates of the three trajectory points in the synthetic trajectory T2 are: {(-0.07308326, -0.01193803), (-0.01471675, -0.01330396), (-0.00447019, 0.01412067)},
合成轨迹T3中的相应的三个轨迹点解码坐标分别为:{(0.07204233,0.02666937),(0.06023028,0.0142511),(-0.04886138,0.18411008)},The corresponding decoding coordinates of the three trajectory points in the synthetic trajectory T3 are: {(0.07204233, 0.02666937), (0.06023028, 0.0142511), (-0.04886138, 0.18411008)},
合成轨迹T4中的相应的三个轨迹点解码坐标分别为:{(-0.07103564,-0.01312791),(-0.05252551,-0.02274193),(0.06670203,0.04209826)},The corresponding decoding coordinates of the three trajectory points in the synthetic trajectory T4 are: {(-0.07103564, -0.01312791), (-0.05252551, -0.02274193), (0.06670203, 0.04209826)},
合成轨迹T5中的相应的三个轨迹点解码坐标分别为:{(0.03223937,0.0195841),(0.03223937,0.0195841),(0.01345813,0.00933272),The corresponding decoding coordinates of the three trajectory points in the synthetic trajectory T5 are: {(0.03223937, 0.0195841), (0.03223937, 0.0195841), (0.01345813, 0.00933272),
合成轨迹T6中的相应的三个轨迹点解码坐标分别为:{(0.04719535,0.02354322),(0.029289,0.02158025),(0.03235291,0.01986764)。The corresponding decoding coordinates of the three track points in the synthetic track T6 are: {(0.04719535, 0.02354322), (0.029289, 0.02158025), (0.03235291, 0.01986764).
再经过本发明提供的方法中的步骤5的操作后得到相应的相似性损失值,步骤6通过sigmod激活函数对上述六条轨迹相似性损失值进行二分类处理,获取上述六条合成轨迹的判别结果;其中,判别结果的标记值为0时为假轨迹,需要进行模型重新训练与预测;判别结果的值为1时为真实轨迹;将辨别结果的值为1的合成轨迹数据集输出。Then, the corresponding similarity loss value is obtained after the operation of step 5 in the method provided by the present invention, and step 6 performs binary classification processing on the similarity loss values of the above six trajectories through the sigmod activation function, and obtains the discrimination results of the above six synthetic trajectories; Among them, when the label value of the discrimination result is 0, it is a false trajectory, and the model needs to be retrained and predicted; when the value of the discrimination result is 1, it is the real trajectory; the synthetic trajectory data set with the value of 1 in the discrimination result is output.
接着,对于上述输出的六条合成轨迹数据集用于使用k-means聚类算法基于欧氏距离对每一时间戳下所述合成轨迹数据集的轨迹点进行聚类,并将聚类后不同时间戳下的轨迹簇心通过随机组合的方式生成泛化轨迹;通过随机组合簇心,生成新轨迹补齐数据集中的轨迹数量n,统计泛化轨迹的数量,生成泛化轨迹的计数矩阵C={ck|c1,c2,…cn},其中ck表示对应计数矩阵中第k条泛化轨迹的统计数据;计数矩阵C如下表所示:Next, the six synthetic trajectory data sets output above are used to use the k-means clustering algorithm to cluster the trajectory points of the synthetic trajectory data set under each timestamp based on the Euclidean distance, and cluster the different time points after the clustering. The stamped trajectory cluster centers are randomly combined to generate generalized trajectories; by randomly combining the cluster centers, a new trajectory is generated to complement the number of trajectories in the data set n, the number of generalized trajectories is counted, and the count matrix of generalized trajectories is generated C = {c k |c 1 ,c 2 ,…c n }, where c k represents the statistics of the k-th generalization trajectory in the corresponding count matrix; the count matrix C is shown in the following table:
对计数矩阵C添加参数为ε的拉普拉斯Laplace噪声得到差分隐私计数矩阵并对添加一致性约束,其表示对应计数矩阵中第k条泛化轨迹的统计数据。将差分隐私计数矩阵进行排序得到排序后的计数矩阵Sk={sk|s1,s2,…sn},结合Sk计算中间变量Lk和Qk获取轨迹数据结果集 Add Laplace noise with parameter ε to the count matrix C to get the differentially private count matrix and to Add consistency constraints, which Represents statistics corresponding to the k-th generalization trajectory in the count matrix. The differential privacy count matrix Perform sorting to obtain a sorted count matrix S k ={s k |s 1 ,s 2 ,...s n }, and combine Sk to calculate intermediate variables L k and Q k to obtain a result set of trajectory data
式中:k∈[1,n],代数符号m、j、z均为自然数;In the formula: k∈[1,n], algebraic symbols m, j, z are all natural numbers;
将中间变量矩阵{Qk|Q1,Q2,…,Qn}的元素进行排序得到令矩阵{Lk|L1,L2,…,Ln}与矩阵内的元素对应sk的顺序保持一致获得轨迹数据结果集其中表示噪声量受限的差分隐私计数矩阵中第k条泛化轨迹的统计数据;轨迹数据结果集如下表所示:Sort the elements of the intermediate variable matrix {Q k |Q 1 ,Q 2 ,…,Q n } to get Let the matrix {L k |L 1 ,L 2 ,…,L n } be the same as the matrix The order of the elements in the corresponding sk is consistent to obtain the result set of trajectory data in Represents the statistics of the k-th generalized trajectory in the differentially private count matrix with limited amount of noise; the result set of trajectory data As shown in the table below:
本发明提供的方法中差分隐私发布流程如图2所示,以泛化轨迹T11->T21->T31为例,是由合成轨迹T1,T2通过聚类后得到,端到端深度学习模型输出的合成轨迹可以保证T1,T2中的具体坐标信息不会被逆向破解,再通过轨迹发布机制对泛化轨迹T11->T21->T31的统计学计数进行噪声受限的差分隐私保护,进一步地保护了轨迹数据地统计学隐私,能够更好地抵御针对轨迹数据集统计学信息的隐私破解攻击。The differential privacy publishing process in the method provided by the present invention is shown in Figure 2. Taking the generalized trajectory T11->T21->T31 as an example, it is obtained by clustering the synthetic trajectories T1 and T2, and the end-to-end deep learning model outputs The synthetic trajectory of T1 and T2 can ensure that the specific coordinate information in T1 and T2 will not be reversely cracked, and then perform noise-limited differential privacy protection on the statistical count of the generalized trajectory T11->T21->T31 through the trajectory publishing mechanism, and further The geostatistical privacy of the trajectory data is protected, and the privacy cracking attack on the statistical information of the trajectory data set can be better resisted.
综上所述,本发明实施例提供的用于智慧交通系统的轨迹数据隐私保护方法利用端到端深度学习模型输出的合成轨迹替代真实轨迹,这些合成轨迹可以用于可信第三方进行隐私保护处理时所需要的真实轨迹的替代品,用于数据共享和数据发布;并且本发明方法利用了机器学习的黑盒属性一定程度上解决了现有技术可以被逆向破解的弊端;另外,本发明方法最终发布轨迹数据集时对轨迹数据进行一次聚类算法能够较好的保证轨迹数据的隐私性并同时保证轨迹数据集的高可用性,而且模型还具有可复用性,在轨迹数据集发生更新时,不需要重新训练模型,只需要更换输入即可得到合成轨迹数据,大大减少了数据保护所消耗的时间,提高了计算效率;此外,本发明方法在最终的轨迹数目加上Laplace噪声和一致性约束进行发布,满足差分隐私需求,保证轨迹数据隐私性的同时提高轨迹数据发布的有用性。To sum up, the trajectory data privacy protection method for an intelligent transportation system provided by the embodiments of the present invention uses synthetic trajectories output by an end-to-end deep learning model to replace real trajectories, and these synthetic trajectories can be used for privacy protection by trusted third parties. The substitute of the real trajectory required for processing is used for data sharing and data release; and the method of the present invention utilizes the black box property of machine learning to solve the disadvantage that the existing technology can be reversely cracked to a certain extent; in addition, the present invention Method When the trajectory data set is finally released, a clustering algorithm on the trajectory data can better ensure the privacy of the trajectory data and ensure the high availability of the trajectory data set, and the model is also reusable, when the trajectory data set is updated. When the model is not retrained, the synthetic trajectory data can be obtained only by changing the input, which greatly reduces the time consumed by data protection and improves the calculation efficiency; It can be released according to the sexual constraints to meet the differential privacy requirements, ensure the privacy of trajectory data and improve the usefulness of trajectory data release.
实施例二:Embodiment 2:
参照图3和图4,本发明实施例提供了用于智慧交通系统的轨迹数据隐私保护系统,可以用于实施实施例一所述的方法,具体包括:Referring to FIG. 3 and FIG. 4 , an embodiment of the present invention provides a trajectory data privacy protection system for an intelligent transportation system, which can be used to implement the method described in
获取模块,用于获取用户实时的真实轨迹数据集;The acquisition module is used to acquire the real-time real trajectory data set of the user;
合成轨迹模块,用于将所述真实轨迹数据集载入预先构建并训练好的端到端深度学习模型中生成合成轨迹数据集;a synthetic trajectory module, used to load the real trajectory data set into a pre-built and trained end-to-end deep learning model to generate a synthetic trajectory data set;
聚类模块,用于使用k-means聚类算法基于欧氏距离对每一时间戳下所述合成轨迹数据集的轨迹点进行聚类,并将聚类后不同时间戳下的轨迹簇心通过随机组合的方式生成泛化轨迹;The clustering module is configured to use the k-means clustering algorithm to cluster the trajectory points of the synthetic trajectory data set under each timestamp based on the Euclidean distance, and to pass the clustered trajectory cluster centers under different timestamps after the clustering. Generate generalization trajectories in a random combination;
加噪发布模块,用于对所述泛化轨迹的计数矩阵添加Laplace噪声和一致性约束后得到噪声量受限的差分隐私计数矩阵并进行发布。A noise-adding publishing module is used to add Laplace noise and consistency constraints to the counting matrix of the generalized trajectory to obtain a differentially private counting matrix with limited noise and publish it.
作为本发明的一种实施例,所述合成轨迹模块包括采集单元;如图4所示,所述端到端深度学习模型包括轨迹生成器和轨迹辨别器;所述轨迹生成器包括第一输入层、第一嵌入层、第一LSTM建模层、第一输出层;所述轨迹辨别器包括第二输入层、第二嵌入层、第二LSTM建模层、第二输出层;其中:As an embodiment of the present invention, the synthetic trajectory module includes a collection unit; as shown in FIG. 4 , the end-to-end deep learning model includes a trajectory generator and a trajectory discriminator; the trajectory generator includes a first input layer, a first embedding layer, a first LSTM modeling layer, and a first output layer; the trajectory discriminator includes a second input layer, a second embedding layer, a second LSTM modeling layer, and a second output layer; wherein:
采集单元,用于采集不同用户的历史真实轨迹数据作为训练模型的原始轨迹数据;The acquisition unit is used to collect the historical real trajectory data of different users as the original trajectory data of the training model;
第一输入层,用于将所述原始轨迹数据通过质心标准化处理得到每个轨迹点的质心偏差坐标,并获得编码后的标准原始轨迹;The first input layer is used to obtain the centroid deviation coordinates of each trajectory point by performing the centroid normalization process on the original trajectory data, and obtain the encoded standard original trajectory;
第一嵌入层,用于利用多层感知器MLP通过线性整流函数,将每个所述质心偏差坐标均处理成64维向量;并对所述向量添加随机噪声,使所述原始轨迹中的每组向量都保持与最长轨迹相同的长度;The first embedding layer is used to use the multilayer perceptron MLP to process each of the centroid deviation coordinates into a 64-dimensional vector through a linear rectification function; and add random noise to the vector, so that each The group vectors all keep the same length as the longest trajectory;
第一LSTM建模层,用于通过64个LSTM Cell长短期记忆网络单元对所述64维向量进行时间序列预测处理,得到合成轨迹数据;The first LSTM modeling layer is used to perform time series prediction processing on the 64-dimensional vector through 64 LSTM Cell long-term and short-term memory network units to obtain synthetic trajectory data;
第一输出层,用于使用两个密集层Den通过tanh双曲正切函数对所述合成轨迹数据的经纬度偏差进行解码,获得解码后的合成轨迹;The first output layer is used to decode the latitude and longitude deviation of the synthetic trajectory data by using two dense layers Den through the tanh hyperbolic tangent function to obtain the decoded synthetic trajectory;
第二输入层,用于将所述标准原始轨迹及所述合成轨迹作为所述轨迹辨别器的输入数据,并对所述合成轨迹进行质心标准化处理得到每个轨迹点的对应质心偏差坐标,并获得编码后的标准合成轨迹;The second input layer is configured to use the standard original trajectory and the synthesized trajectory as the input data of the trajectory discriminator, and perform centroid normalization on the synthesized trajectory to obtain the corresponding centroid deviation coordinates of each trajectory point, and Obtain the encoded standard synthetic trajectory;
第二嵌入层,用于利用多层感知器MLP通过线性整流函数,将每个所述标准原始轨迹和所述标准合成轨迹的质心偏差坐标分别处理成对应的64维向量;The second embedding layer is used for using the multi-layer perceptron MLP to process the centroid deviation coordinates of each of the standard original trajectory and the standard synthetic trajectory into a corresponding 64-dimensional vector respectively through a linear rectification function;
第二LSTM建模层,用于通过64个LSTM Cell利用两组所述对应的64维向量通过轨迹损失函数计算轨迹相似性损失值;The second LSTM modeling layer is used to calculate the trajectory similarity loss value through the trajectory loss function using the corresponding 64-dimensional vectors of the two groups through 64 LSTM Cells;
第二输出层,用于使用一个密集层Den通过sigmod激活函数对所述轨迹相似性损失值进行二分类处理,获得所述合成轨迹的判别结果;若判别结果为假,则将所述原始轨迹数据重新输入轨迹生成器进行训练,并利用反向传播算法解决模型优化问题来进行更新轨迹生成器中第一LSTM建模层中的网络参数,直到判别结果为真时停止训练,并将判别结果为真的合成轨迹输出,形成合成轨迹数据集。The second output layer is used to use a dense layer Den to perform binary classification processing on the trajectory similarity loss value through the sigmod activation function to obtain the discrimination result of the synthetic trajectory; if the discrimination result is false, the original trajectory The data is re-entered into the trajectory generator for training, and the back-propagation algorithm is used to solve the model optimization problem to update the network parameters in the first LSTM modeling layer in the trajectory generator. The training is stopped until the judgment result is true, and the judgment result is set. For true synthetic trajectory output, form a synthetic trajectory dataset.
本发明实施例提供的用于智慧交通系统的轨迹数据隐私保护系统与实施例一提供的用于智慧交通系统的轨迹数据隐私保护方法基于相同的技术构思,能够产生如实施例一所述的有益效果,在本实施例中未详尽描述的内容可以参见实施例一。The trajectory data privacy protection system for an intelligent transportation system provided by the embodiment of the present invention and the trajectory data privacy protection method for an intelligent transportation system provided in the first embodiment are based on the same technical concept, and can produce the beneficial effects as described in the first embodiment. For the content that is not described in detail in this embodiment, reference may be made to
实施例三:Embodiment three:
本发明实施例提供了一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时,实现如实现实施例一中任一项方法的步骤。An embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, steps such as implementing any one of the methods in the first embodiment are implemented.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变形,这些改进和变形也应视为本发明的保护范围。The above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the technical principle of the present invention, several improvements and modifications can also be made. These improvements and modifications It should also be regarded as the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210866318.2A CN115168900A (en) | 2022-07-22 | 2022-07-22 | Track data privacy protection method and system for intelligent traffic system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210866318.2A CN115168900A (en) | 2022-07-22 | 2022-07-22 | Track data privacy protection method and system for intelligent traffic system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115168900A true CN115168900A (en) | 2022-10-11 |
Family
ID=83496677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210866318.2A Pending CN115168900A (en) | 2022-07-22 | 2022-07-22 | Track data privacy protection method and system for intelligent traffic system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115168900A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952364A (en) * | 2023-03-07 | 2023-04-11 | 之江实验室 | Route recommendation method and device, storage medium and electronic equipment |
CN116595254A (en) * | 2023-05-18 | 2023-08-15 | 杭州绿城信息技术有限公司 | Data privacy and service recommendation method in smart city |
-
2022
- 2022-07-22 CN CN202210866318.2A patent/CN115168900A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115952364A (en) * | 2023-03-07 | 2023-04-11 | 之江实验室 | Route recommendation method and device, storage medium and electronic equipment |
CN116595254A (en) * | 2023-05-18 | 2023-08-15 | 杭州绿城信息技术有限公司 | Data privacy and service recommendation method in smart city |
CN116595254B (en) * | 2023-05-18 | 2023-12-12 | 杭州绿城信息技术有限公司 | Data privacy and service recommendation method in smart city |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jia et al. | Blockchain-enabled federated learning data protection aggregation scheme with differential privacy and homomorphic encryption in IIoT | |
Lv et al. | Deep-learning-enabled security issues in the internet of things | |
CN107977734B (en) | A prediction method based on moving Markov model under spatiotemporal big data | |
Xu et al. | Public bicycle traffic flow prediction based on a hybrid model | |
CN105760780B (en) | Track data method for secret protection based on road network | |
CN115168900A (en) | Track data privacy protection method and system for intelligent traffic system | |
CN111143838B (en) | A method for detecting abnormal behavior of database users | |
Xu et al. | Secure and reliable transfer learning framework for 6G-enabled Internet of Vehicles | |
CN110674858B (en) | A traffic public opinion detection method based on spatio-temporal correlation and big data mining | |
CN107832631A (en) | The method for secret protection and system of a kind of data publication | |
CN104836805A (en) | Network intrusion detection method based on fuzzy immune theory | |
Lu et al. | Federated clustering for recognizing driving styles from private trajectories | |
CN110162997A (en) | Anonymous method for secret protection based on interpolation point | |
Yang et al. | A method of intrusion detection based on Attention-LSTM neural network | |
Wang et al. | Improved KNN algorithm based on preprocessing of center in smart cities | |
Zhang | Financial data anomaly detection method based on decision tree and random forest algorithm | |
Yang et al. | Edge intelligence based digital twins for internet of autonomous unmanned vehicles | |
CN114092729A (en) | Heterogeneous electricity consumption data publishing method based on cluster anonymization and differential privacy protection | |
Zhang et al. | LGAN-DP: A novel differential private publication mechanism of trajectory data | |
Zhang et al. | Vertical federated learning across heterogeneous regions for industry 4.0 | |
Han et al. | Research on abnormal transaction detection method for blockchain | |
Sun et al. | Graph community infomax | |
Zhu et al. | Efficient gaussian kernel microcluster real-time clustering method for industrial internet of things (iiot) streams | |
CN117614693A (en) | Cloud internal security threat detection method based on behavior traffic | |
CN116862667A (en) | Fraud detection and credit assessment method based on comparison learning and graph neural decoupling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |