CN115168900A - Track data privacy protection method and system for intelligent traffic system - Google Patents

Track data privacy protection method and system for intelligent traffic system Download PDF

Info

Publication number
CN115168900A
CN115168900A CN202210866318.2A CN202210866318A CN115168900A CN 115168900 A CN115168900 A CN 115168900A CN 202210866318 A CN202210866318 A CN 202210866318A CN 115168900 A CN115168900 A CN 115168900A
Authority
CN
China
Prior art keywords
track
data
trajectory
synthetic
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210866318.2A
Other languages
Chinese (zh)
Inventor
徐小龙
张梓铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202210866318.2A priority Critical patent/CN115168900A/en
Publication of CN115168900A publication Critical patent/CN115168900A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a track data privacy protection method and a track data privacy protection system for an intelligent traffic system, wherein the method comprises the following steps: acquiring a real-time real track data set of a user; loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set; clustering the track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance, and generating generalized tracks by randomly combining track cluster centers under different timestamps after clustering; and adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a difference privacy counting matrix with limited noise quantity and issuing the difference privacy counting matrix. The synthetic track data set can ensure balance between data availability and privacy, track release data meet the differential privacy requirement, and a superior privacy guarantee is provided.

Description

Track data privacy protection method and system for intelligent traffic system
Technical Field
The invention relates to the technical field of data security protection, in particular to a track data privacy protection method and system for an intelligent traffic system.
Background
The intelligent transportation system effectively and comprehensively applies advanced scientific technologies such as information technology, computer technology, data communication technology, sensor technology, electronic control technology, artificial intelligence and the like to transportation, service control and vehicle manufacturing, and strengthens the relation among vehicles, roads and users, thereby forming a comprehensive transportation system which ensures safety, improves efficiency, improves environment and saves energy; the track data is privacy data with an important role in the intelligent traffic system, for example, in an urban traffic system, the moving track of a vehicle can be analyzed through statistics, a more reasonable urban traffic scheme is formulated to avoid urban congestion, and the navigation software can estimate the driving preference of a vehicle owner through analyzing the track of the vehicle so as to recommend an optimal driving route.
However, the track data is convenient and accompanied by the corresponding privacy security problem, and the track data is usually opened to the public after being collected so that researchers can mine and analyze the data; however, in the process of collecting and using the trajectory data, an attacker may track the movement of the user, or further deduce the private information of the user, such as workplace, home address, social relationship, physical conditions, hobbies, and the like, and sell the private information to a third party to obtain economic benefits, so that the privacy of the user is revealed, and therefore, the problem how to make the trajectory data play a role under the condition of protecting the privacy of the user is particularly important.
At present, an anonymous algorithm k-anonymity is generally adopted to protect the privacy of data, wherein the k-anonymity requires that each user is indistinguishable from at least k-1 other users within a certain time and space range, so that an attacker cannot identify an attack target from at least k users and further deduce the accurate position of the attack target; however, the use condition of the k-anonymity is difficult to satisfy due to the fact that the track coordinate points are relatively discrete, and researches show that the k-anonymity uses a privacy protection algorithm on real data, and privacy stealing means such as combination attack, exact attack or background knowledge attack cannot be resisted; in addition, the clustering algorithm is required to be frequently used in the using process of the algorithm, and the track precision is greatly influenced; moreover, when the data set is changed, the model needs to be reconstructed and therefore has no recycling characteristic, and therefore, additional time overhead is greatly increased.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a track data privacy protection method and system for an intelligent traffic system, and solves the technical problems that in the prior art, the accuracy rate of a track data set issuing method is insufficient, the efficiency is insufficient and the track data privacy is not fully protected due to frequent use of a clustering algorithm and non-reusability on a real data set.
In order to solve the technical problems, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a trajectory data privacy protection method for an intelligent transportation system, the method including:
acquiring a real-time real track data set of a user;
loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set;
clustering the track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance, and generating generalized tracks by randomly combining track cluster centers under different timestamps after clustering;
and adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a difference privacy counting matrix with limited noise quantity and issuing the difference privacy counting matrix.
With reference to the first aspect, preferably, the training process of the end-to-end deep learning model includes the following steps:
acquiring historical real track data of different users as original track data of a training model;
carrying out centroid standardization on the original trajectory data to obtain a centroid deviation coordinate of each trajectory point, and obtaining a standard original trajectory after coding;
processing each of the centroid deviation coordinates into a 64-dimensional vector by a linear rectification function;
performing time sequence prediction on the 64-dimensional vectors by using 64 LSTM Cell long-term and short-term memory network units to obtain synthetic track data, decoding centroid deviation coordinates of the synthetic track data by using a tanh hyperbolic tangent function, and obtaining a decoded synthetic track;
calculating a track similarity loss value of the synthesized track through a track loss function by combining the standard original track;
performing two-classification processing on the track similarity loss value through a sigmod activation function to obtain a judgment result of the synthesized track;
if the judgment result is false, inputting the original trajectory data into the model again for repeated training, solving the problem of model optimization by using a back propagation algorithm to update the network parameters of the model, stopping training until the judgment result is true, and outputting the synthetic trajectory with the judgment result being true to form a synthetic trajectory data set.
With reference to the first aspect, preferably, after processing each centroid deviation coordinate into a 64-dimensional vector through a linear rectification function, the method further includes: and adding random noise to the vectors to ensure that each group of vectors of the original track keeps the same length.
With reference to the first aspect, preferably, the step of calculating the track similarity loss of the synthesized track through the track loss function in combination with the standard original track includes:
carrying out centroid standardization processing on the synthesized track to obtain corresponding centroid deviation coordinates of each track point, and obtaining a coded standard synthesized track;
respectively processing the centroid deviation coordinates of each standard original track and each standard synthesized track into corresponding 64-dimensional vectors through a linear rectification function;
and calculating and training by combining the two groups of corresponding 64-dimensional vectors and the track loss functions through a formula (1) to obtain a track similarity loss value TLoss of the synthetic track:
TLoss=αL BCE (l t ,l p )+βL GPS (t t ,t p ) (1)
in the formula I t And l p Respectively representing standard originalsDistinguishing labels of the track and the standard cooperation track; t is t t And t p 64-dimensional vectors respectively representing the standard original trajectory and the corresponding standard cooperative trajectory; l is a radical of an alcohol BCE Representing a binary cross entropy loss function, L GPS A loss function representing a measure of similarity between two trajectories using a least squares error; alpha and beta each represent L BCE And L GPS The weight of (c).
With reference to the first aspect, preferably, the calculation formula for processing each centroid deviation coordinate into a 64-dimensional vector through a linear rectification function is as follows:
Figure BDA0003759324360000041
in the formula (I), the compound is shown in the specification,
Figure BDA0003759324360000042
a 64-dimensional vector, deltalt, representing the trace point numbered i i And Δ lon i Respectively representing the longitude deviation and the latitude deviation of the tracing point with the number i, f GPS Representing a linear rectification function, W GPS Expressing the centroid deviation coordinate (Delta lat) of the track point i i ,△lon i ) The vector weight of (2).
With reference to the first aspect, preferably, the calculation formula for obtaining the synthetic trajectory data by performing time series prediction on the 64-dimensional vectors by using 64 LSTM Cell long-short term memory network units is as follows:
O=LSTM(T,W lstm ) (3)
in the formula: t = { T = 1 ,t 2 ,…,t i ,…,t maxlength T represents the coordinate characteristics of all trace points in the original trace data, where T i Representing 64-dimensional vectors of track points with the number i, wherein maxlength represents the longest length of a single track in original track data; w lstm A weight matrix representing the input data; o denotes synthetic trajectory data, wherein the data contained in O is represented by O = { O = { 1 ,o 2 ,…,o i ,…,o maxlength },o i To representt i The coordinates of the output after the LSTM processing are combined.
With reference to the first aspect, preferably, the calculation formula for decoding the centroid deviation coordinate of the synthesized trajectory data by using the tanh hyperbolic tangent function is as follows:
Figure BDA0003759324360000051
in the formula (I), the compound is shown in the specification,
Figure BDA0003759324360000052
decoded coordinates representing the centroid deviation coordinates of the trajectory point i in the synthetic trajectory data,
Figure BDA0003759324360000053
and
Figure BDA0003759324360000054
respectively representing longitude deviation and latitude deviation of the track point i; w dGPS A decoding matrix weight representing a coordinate vector; d GPS Is tan h hyperbolic tangent function.
In a second aspect, the present invention provides a trajectory data privacy protection system for an intelligent transportation system, the system comprising:
the acquisition module is used for acquiring a real-time real track data set of a user;
the synthetic track module is used for loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set;
the clustering module is used for clustering track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance and generating generalized tracks by clustering centers of tracks under different timestamps after clustering in a random combination mode;
and the noise adding and issuing module is used for adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a differential privacy counting matrix with limited noise quantity and issuing the differential privacy counting matrix.
With reference to the second aspect, preferably, the synthetic track module includes an acquisition unit; the end-to-end deep learning model comprises a trajectory generator and a trajectory discriminator; the track generator comprises a first input layer, a first embedding layer, a first LSTM modeling layer and a first output layer; the track discriminator comprises a second input layer, a second embedding layer, a second LSTM modeling layer and a second output layer; wherein:
the acquisition unit is used for acquiring historical real track data of different users as original track data of the training model;
the first input layer is used for carrying out centroid standardization on the original track data to obtain a centroid deviation coordinate of each track point and obtain a standard original track after coding;
the first embedding layer is used for processing each centroid deviation coordinate into a 64-dimensional vector through a linear rectification function by utilizing a multilayer perceptron MLP; adding random noise to the vectors to enable each group of vectors in the original track to keep the same length as the longest track;
the first LSTM modeling layer is used for carrying out time series prediction processing on the 64-dimensional vectors through 64 LSTM Cell long-term and short-term memory network units to obtain synthetic track data;
the first output layer is used for decoding the longitude and latitude deviation of the synthetic track data through tanh hyperbolic tangent function by using the two dense layers Den to obtain a decoded synthetic track;
the second input layer is used for taking the standard original track and the synthesized track as input data of the track discriminator, carrying out centroid standardization processing on the synthesized track to obtain a corresponding centroid deviation coordinate of each track point and obtaining a coded standard synthesized track;
the second embedding layer is used for processing the centroid deviation coordinates of each standard original track and each standard synthesized track into corresponding 64-dimensional vectors through a linear rectification function by utilizing a multilayer perceptron MLP;
the second LSTM modeling layer is used for calculating a track similarity loss value through a track loss function by using the two groups of corresponding 64-dimensional vectors through 64 LSTM cells;
the second output layer is used for carrying out binary processing on the track similarity loss value through a sigmod activating function by using a dense layer Den to obtain a judgment result of the synthesized track; and if the judgment result is false, inputting the original track data into the track generator again for training, solving the problem of model optimization by using a back propagation algorithm to update the network parameters in a first LSTM modeling layer in the track generator, stopping training until the judgment result is true, and outputting the synthetic track with the judgment result being true to form a synthetic track data set.
In a third aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the trajectory data privacy protection method for an intelligent transportation system according to any one of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the end-to-end deep learning model is utilized to generate the synthetic track for replacing the real track after the track similarity loss is judged to reach the standard through the track loss function, so that privacy disclosure is avoided when a real track data set is subjected to data encryption and release, and when the real track data set is updated, the model does not need to be retrained, the synthetic track can be obtained only by replacing input, and the time in the data protection process is greatly reduced; in addition, random noise is added to the vectors in the model training process, so that each group of vectors of the original track keeps the same length, the training process is accelerated, and the calculation efficiency is improved; and finally, after a trusted third party acquires the synthetic track, clustering track points in unit time by using a k-means algorithm based on Euclidean distance, generating generalized tracks by using track cluster centers under different time stamps after clustering in a random combination mode, and issuing the final track number plus Laplace noise and consistency constraint to meet the requirement of differential privacy.
Drawings
Fig. 1 is a flowchart of a track data privacy protection method for an intelligent transportation system according to an embodiment of the present invention;
fig. 2 is a flowchart of differential privacy publishing in a track data privacy protection method for an intelligent transportation system according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a track data privacy protection system for an intelligent transportation system according to an embodiment of the present invention;
fig. 4 is a structural schematic block diagram of end-to-end deep learning model training provided by the embodiment of the present invention.
Detailed Description
The technical solutions of the present invention are described in detail below with reference to the accompanying drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples are described in detail in the technical solutions of the present invention, but not limited to the technical solutions of the present invention, and the technical features in the embodiments and examples may be combined with each other without conflict.
The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The first embodiment is as follows:
as shown in fig. 1, the present embodiment introduces a track data privacy protection method for an intelligent transportation system, which specifically includes the following steps:
the method comprises the following steps: acquiring a real-time real track data set of a user;
step two: loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set;
step three: clustering track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance, and generating generalized tracks by randomly combining track cluster centers under different timestamps after clustering;
step four: and adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a differential privacy counting matrix with limited noise quantity and issuing the differential privacy counting matrix.
The training process of the end-to-end deep learning model related in the step 2 provided by the embodiment of the invention comprises the following steps:
step 1: acquiring historical real track data of different users as original track data of a training model;
and 2, step: carrying out centroid standardization on the original trajectory data to obtain a centroid deviation coordinate of each trajectory point, and obtaining a standard original trajectory after coding;
and 3, step 3: processing each said centroid deviation coordinate into a 64-dimensional vector by a linear rectification function; the calculation formula is as follows:
Figure BDA0003759324360000091
in the formula (I), the compound is shown in the specification,
Figure BDA0003759324360000092
a 64-dimensional vector, deltalt, representing a trace point numbered i i And Δ lon i Respectively representing the longitude deviation and the latitude deviation of the track point with the number i, f GPS Representing a linear rectification function, W GPS Representing centroid deviation coordinates (Deltalat) of trace point i i ,△lon i ) The vector weights of (c).
As an embodiment of the present invention, in this step, random noise is added to the vectors, that is, empty track points are filled into each track, so that each group of vectors in the original track keeps the same length as the longest track; the model training process can be accelerated, and the calculation efficiency is improved;
and 4, step 4: using 64 LSTM Cell long-short term memory network units to perform time sequence prediction on the 64-dimensional vectors to obtain synthetic track data, and decoding centroid deviation coordinates of the synthetic track data by adopting a tanh hyperbolic tangent function to obtain decoded synthetic tracks;
the calculation formula for obtaining the synthetic trajectory data by using 64 LSTM Cell long-short term memory network units to perform time series prediction on the 64-dimensional vectors is as follows:
O=LSTM(T,W lstm ) (2)
in the formula: t = { T = { (T) 1 ,t 2 ,…,t i ,…,t maxlength T represents the coordinate characteristics of all trace points in the original trace data, where T i Representing 64-dimensional vectors of the track points with the number i, wherein maxlength represents the longest length of a single track in original track data; w is a group of lstm A weight matrix representing the input data; o denotes synthetic track data, where data contained within O is represented as O = { O = { (O) } 1 ,o 2 ,…,o i ,…,o maxlength },o i Denotes t i The coordinates of the output after the LSTM processing are combined.
The calculation formula for decoding the centroid deviation coordinate of the synthesized track data by adopting the tanh hyperbolic tangent function is as follows:
Figure BDA0003759324360000101
in the formula (I), the compound is shown in the specification,
Figure BDA0003759324360000102
a decoded coordinate representing the centroid deviation coordinate of the track point i in the synthetic track data,
Figure BDA0003759324360000103
and
Figure BDA0003759324360000104
respectively representing longitude deviation and latitude deviation of the track point i; w is a group of dGPS A decoding matrix weight representing a coordinate vector; d GPS Is a tanh hyperbolic tangent function.
And 5: calculating a track similarity loss value of the synthesized track through a track loss function by combining the standard original track;
as an embodiment of the present invention, the step of calculating the trajectory similarity loss of the synthesized trajectory by the trajectory loss function in combination with the standard original trajectory in step 5 includes:
step 5.1: carrying out centroid standardization processing on the synthesized track to obtain corresponding centroid deviation coordinates of each track point, and obtaining a coded standard synthesized track;
and step 5.2: respectively processing the centroid deviation coordinates of each standard original trajectory and each standard synthesized trajectory into corresponding 64-dimensional vectors through a linear rectification function;
step 5.3: and calculating and training by combining the two groups of corresponding 64-dimensional vectors and the track loss functions through a formula (4) to obtain a track similarity loss value TLoss of the synthetic track:
TLoss=αL BCE (l t ,l p )+βL GPS (t t ,t p ) (4)
in the formula I t And l p Respectively representing the distinguishing labels of the standard original track and the standard cooperative track; t is t t And t p 64-dimensional vectors respectively representing a standard original trajectory and a corresponding standard cooperative trajectory; l is a radical of an alcohol BCE Representing a binary cross entropy loss function, L GPS A loss function representing a measure of similarity between two trajectories using a least squares error; alpha and beta each represent L BCE And L GPS The weight of (c);
further, the method uses the track loss function to judge the track similarity loss value of the synthetic track obtained by training the original track data provided by the method through the end-to-end deep learning model, can set different weight parameters according to different scenes to change the judgment standard of the similarity loss value, and enlarges the application range;
and 6: performing two-classification processing on the track similarity loss value through a sigmod activation function to obtain a judgment result of the synthesized track;
and if the judgment result is false, inputting the original trajectory data into the model again for repeated training, solving the problem of model optimization by using a back propagation algorithm to update the network parameters of the model, stopping training until the judgment result is true, and outputting the synthetic trajectory with the judgment result being true to form a synthetic trajectory data set.
The method provided by the invention is further clarified by a plurality of data tests, and firstly, the centroid deviation coordinate (delta lat) of each track point is obtained by the original track data through centroid standardization processing i ,△lon i ) Wherein Δ lat i And Δ lon i Respectively representing longitude deviation and latitude deviation of a track point with the number i, and obtaining a standard original track after coding, wherein the data is as follows:
the deviation coordinates of the centroids of the three track points in the standard original track T1' are { (0.07202433, 0.02669937), (0.07295694, 0.02329249), (0.07202433, 0.02669937) };
the deviation coordinates of the centers of mass of the three track points in the standard original track T2' are { (-0.07308236, -0.01193083), (-0.01476175, -0.01333096), (-0.40004719, 0.01410267) },
the deviation coordinates of the centroids of the three track points in the standard original track T3' are { (0.07202433, 0.02669937), (0.06020328, 0.0145211), (-0.04868138, 0.18410108) },
the deviation coordinates of the centroids of the three trajectory points in the standard original trajectory T4' are { (-0.07105364, -0.01321791), (-0.05255251, -0.02247193), (0.06672003, 0.04290826) },
the deviation coordinates of the centers of mass of three track points in the standard original track T5' are { (0.03229337, 0.0198541), (0.01348513, 0.00932372);
the deviation coordinates of the centroids of the three trajectory points in the standard original trajectory T6' are { (0.04715935, 0.02345322), (0.029829, 0.02150825), (0.03253291, and 0.01968764), respectively.
Then, processing each centroid deviation coordinate of the six standard tracks into 64-dimensional vectors through a linear rectification function; and then 64 LSTM Cell long short term memory network units are utilized to carry out pair on the 64 dimensionsThe vector is subjected to time series prediction to obtain synthetic track data, and the tan hyperbolic tangent function is adopted to decode the centroid deviation coordinate of the synthetic track data to obtain a corresponding decoding coordinate
Figure BDA0003759324360000121
The decoded composite track is constructed, resulting in the following:
the decoding coordinates of the corresponding three track points in the synthesized track T1 are { (0.07200033, 0.02660037), (0.07295114, 0.02322249), (0.07201233, 0.02661237) },
the corresponding three track point decoding coordinates in the synthetic track T2 are { (-0.07308326, -0.01193803), (-0.01471675, -0.01330396), (-0.00447019, 0.01412067) },
the decoding coordinates of the corresponding three track points in the synthesized track T3 are { (0.07204233, 0.02666937), (0.06023028, 0.0142511), (-0.04886138, 0.18411008) },
the corresponding three track point decoding coordinates in the synthetic track T4 are { (-0.07103564, -0.01312791), (-0.05252551, -0.02274193), (0.06670203, 0.04209826) },
the corresponding three track point decoding coordinates in the synthetic track T5 are { (0.03223937, 0.0195841), (0.01345813, 0.00933272),
the decoded coordinates of the corresponding three track points in the synthetic track T6 are { (0.04719535, 0.02354322), (0.029289, 0.02158025), (0.03235291, 0.01986764), respectively.
Obtaining corresponding similarity loss values after the operation of the step 5 in the method provided by the invention is carried out, and carrying out two-classification processing on the six track similarity loss values through a sigmod activation function to obtain a judgment result of the six synthetic tracks; wherein, the mark value of the discrimination result is a false track when being 0, and the model needs to be retrained and predicted again; when the value of the discrimination result is 1, the real track is obtained; and outputting the composite track data set with the value of 1 of the identification result.
Next, the six synthetic trajectory data sets for the above outputs are used to determine the Euclidean distance for each time using a k-means clustering algorithmClustering the track points of the synthesized track data set under the timestamp, and generating generalized tracks by randomly combining track cluster centers under different timestamps after clustering; generating the number n of the tracks in the new track completion data set by randomly combining cluster centers, counting the number of the generalized tracks, and generating a counting matrix C = { C } of the generalized tracks k |c 1 ,c 2 ,…c n In which c is k Representing statistical data corresponding to a k-th generalized trajectory in the count matrix; the count matrix C is shown in the following table:
generalized locus Synthetic track True count
T11->T21->T31 T1,T2 3
T12->T21->T31 null 2
T11->T21->T32 T3 1
T12->T21->T32 T6 2
T11->T22->T31 null 0
T12->T22->T31 T5 1
T11->T22->T32 null 0
T12->T22->T32 T4 1
Adding Laplacian Laplace noise with the parameter of epsilon to the counting matrix C to obtain a differential privacy counting matrix
Figure BDA0003759324360000131
And are aligned with
Figure BDA0003759324360000132
Adding a consistency constraint which
Figure BDA0003759324360000133
Representing the statistical data corresponding to the k-th generalized trace in the count matrix. Differentiating privacy count matrices
Figure BDA0003759324360000134
Sorting to obtain a sorted counting matrix S k ={s k |s 1 ,s 2 ,…s n Is combined with S k Calculating the intermediate variable L k And Q k Obtaining a trace data result set
Figure BDA0003759324360000135
Figure BDA0003759324360000136
In the formula: k belongs to [1, n ], and algebraic symbols m, j and z are natural numbers;
intermediate variable matrix Q k |Q 1 ,Q 2 ,…,Q n Get the elements of
Figure BDA0003759324360000137
Order matrix { L k |L 1 ,L 2 ,…,L n And matrix
Figure BDA0003759324360000138
Element within corresponds to s k To obtain a trace data result set
Figure BDA0003759324360000139
Wherein
Figure BDA00037593243600001310
Statistical data representing a k-th generalized trajectory in the noise-quantity-limited differential privacy count matrix; trajectory data result set
Figure BDA0003759324360000141
As shown in the following table:
generalized locus Synthetic track Differential privacy counting
T11->T21->T31 T1,T2 3
T12->T21->T31 null 1
T11->T21->T32 T3 2
T12->T21->T32 T6 1
T11->T22->T31 null 1
T12->T22->T31 T5 0
T11->T22->T32 null 0
T12->T22->T32 T4 2
The differential privacy publishing process in the method provided by the invention is shown in fig. 2, taking generalized tracks T11- > T21- > T31 as an example, the generalized tracks T1 and T2 are obtained after clustering, the synthesized tracks output by the end-to-end deep learning model can ensure that the specific coordinate information in T1 and T2 cannot be reversely cracked, and then the track publishing mechanism is used for carrying out noise-limited differential privacy protection on the statistical count of the generalized tracks T11- > T21- > T31, thereby further protecting the statistical privacy of track data and better resisting the privacy cracking attack aiming at the statistical information of track data sets.
In summary, the track data privacy protection method for the intelligent transportation system provided by the embodiment of the invention replaces real tracks with synthetic tracks output by an end-to-end deep learning model, and the synthetic tracks can be used as substitutes of the real tracks required by a trusted third party for privacy protection processing and used for data sharing and data publishing; the method utilizes the black box attribute of machine learning to solve the defect that the prior art can be cracked reversely to a certain extent; in addition, when the track data set is finally issued, the track data is subjected to the primary clustering algorithm, so that the privacy of the track data can be better ensured, the high availability of the track data set is ensured, the model also has reusability, when the track data set is updated, the model does not need to be retrained, the synthesized track data can be obtained only by replacing input, the time consumed by data protection is greatly reduced, and the calculation efficiency is improved; in addition, the method of the invention publishes the final track number together with Laplace noise and consistency constraint, meets the requirement of difference privacy, ensures the privacy of track data and improves the publishing usefulness of the track data.
The second embodiment:
referring to fig. 3 and 4, an embodiment of the present invention provides a trajectory data privacy protection system for an intelligent transportation system, which may be used to implement the method according to the first embodiment, and specifically includes:
the acquisition module is used for acquiring a real-time real track data set of a user;
the synthetic track module is used for loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set;
the clustering module is used for clustering track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance, and generating generalized tracks by clustering track clusters under different timestamps after clustering in a random combination mode;
and the noise adding and issuing module is used for adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a differential privacy counting matrix with limited noise quantity and issuing the differential privacy counting matrix.
As an embodiment of the present invention, the synthetic track module includes an acquisition unit; as shown in fig. 4, the end-to-end deep learning model includes a trajectory generator and a trajectory discriminator; the track generator comprises a first input layer, a first embedding layer, a first LSTM modeling layer and a first output layer; the track discriminator comprises a second input layer, a second embedding layer, a second LSTM modeling layer and a second output layer; wherein:
the acquisition unit is used for acquiring historical real track data of different users as original track data of the training model;
the first input layer is used for carrying out centroid standardization processing on the original track data to obtain a centroid deviation coordinate of each track point and obtain a standard original track after coding;
the first embedding layer is used for processing each centroid deviation coordinate into a 64-dimensional vector through a linear rectification function by utilizing a multilayer perceptron MLP; adding random noise to the vectors to enable each group of vectors in the original track to keep the same length as the longest track;
the first LSTM modeling layer is used for carrying out time series prediction processing on the 64-dimensional vectors through 64 LSTM Cell long-term and short-term memory network units to obtain synthetic track data;
the first output layer is used for decoding the longitude and latitude deviation of the synthetic track data through a tanh hyperbolic tangent function by using the two dense layers Den to obtain a decoded synthetic track;
the second input layer is used for taking the standard original track and the synthesized track as input data of the track discriminator, carrying out centroid standardization processing on the synthesized track to obtain a corresponding centroid deviation coordinate of each track point and obtaining a coded standard synthesized track;
the second embedding layer is used for processing the centroid deviation coordinates of each standard original track and each standard synthesized track into corresponding 64-dimensional vectors through a linear rectification function by utilizing a multilayer perceptron MLP;
the second LSTM modeling layer is used for calculating a track similarity loss value through a track loss function by using the two groups of corresponding 64-dimensional vectors through 64 LSTM cells;
the second output layer is used for carrying out binary processing on the track similarity loss value through a sigmod activating function by using a dense layer Den to obtain a judgment result of the synthesized track; and if the judgment result is false, inputting the original track data into the track generator again for training, solving the model optimization problem by using a back propagation algorithm to update the network parameters in the first LSTM modeling layer in the track generator, stopping training until the judgment result is true, and outputting the synthetic track with the judgment result being true to form a synthetic track data set.
The track data privacy protection system for the intelligent transportation system provided by the embodiment of the invention and the track data privacy protection method for the intelligent transportation system provided by the first embodiment of the invention are based on the same technical concept, and the beneficial effects described in the first embodiment can be produced, and the content which is not described in detail in the first embodiment of the invention can be referred to in the first embodiment of the invention.
Example three:
an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is configured to, when executed by a processor, implement the steps of a method as in any one of the embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various improvements and modifications without departing from the technical principle of the present invention, and those improvements and modifications should be considered as the protection scope of the present invention.

Claims (10)

1. A trajectory data privacy protection method for an intelligent transportation system, the method comprising:
acquiring a real-time real track data set of a user;
loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set;
clustering track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance, and generating generalized tracks by randomly combining track cluster centers under different timestamps after clustering;
and adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a difference privacy counting matrix with limited noise quantity and issuing the difference privacy counting matrix.
2. The method for protecting the privacy of the trajectory data of the intelligent traffic system according to claim 1, wherein the training process of the end-to-end deep learning model comprises the following steps:
acquiring historical real track data of different users as original track data of a training model;
carrying out centroid standardization on the original trajectory data to obtain a centroid deviation coordinate of each trajectory point, and obtaining a standard original trajectory after coding;
processing each said centroid deviation coordinate into a 64-dimensional vector by a linear rectification function;
using 64 LSTM Cell long-short term memory network units to perform time sequence prediction on the 64-dimensional vectors to obtain synthetic track data, and decoding centroid deviation coordinates of the synthetic track data by adopting a tanh hyperbolic tangent function to obtain decoded synthetic tracks;
calculating a track similarity loss value of the synthesized track through a track loss function by combining the standard original track;
performing two-classification processing on the track similarity loss value through a sigmod activation function to obtain a judgment result of the synthesized track;
and if the judgment result is false, inputting the original trajectory data into the model again for repeated training, solving the problem of model optimization by using a back propagation algorithm to update the network parameters of the model, stopping training until the judgment result is true, and outputting the synthetic trajectory with the judgment result being true to form a synthetic trajectory data set.
3. The trajectory data privacy protection method for the intelligent transportation system according to claim 2, wherein after processing each centroid deviation coordinate into a 64-dimensional vector through a linear rectification function, the method further comprises: and adding random noise to the vectors to ensure that each group of vectors in the original track keeps the same length as the longest track.
4. The trajectory data privacy protection method for the intelligent transportation system according to claim 2, wherein the step of calculating the trajectory similarity loss of the synthesized trajectory through a trajectory loss function in combination with the standard original trajectory comprises:
carrying out centroid standardization processing on the synthesized track to obtain a corresponding centroid deviation coordinate of each track point, and obtaining a coded standard synthesized track;
respectively processing the centroid deviation coordinates of each standard original trajectory and each standard synthesized trajectory into corresponding 64-dimensional vectors through a linear rectification function;
and combining the two groups of corresponding 64-dimensional vectors and the track loss functions, and calculating and training by using a formula (1) to obtain a track similarity loss value TLoss of the synthetic track:
TLoss=αL BCE (l t ,l p )+βL GPS (t t ,t p ) (1)
in the formula I t And l p Judging labels respectively representing a standard original track and a standard cooperative track; t is t t And t p 64-dimensional vectors respectively representing a standard original trajectory and a corresponding standard cooperative trajectory; l is BCE Representing a binary cross-entropy loss function, L GPS A loss function representing a measure of similarity between two trajectories using a least squares error; alpha is alphaAnd beta each represents L BCE And L GPS The weight of (c).
5. The trajectory data privacy protection method for the intelligent transportation system according to claim 2, wherein the calculation formula for processing each centroid deviation coordinate into a 64-dimensional vector through a linear rectification function is as follows:
Figure FDA0003759324350000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003759324350000032
a 64-dimensional vector, deltalt, representing a trace point numbered i i And Δ lon i Respectively representing the longitude deviation and the latitude deviation of the tracing point with the number i, f GPS Representing a linear rectification function, W GPS Expressing the centroid deviation coordinate (Delta lat) of the track point i i ,△lon i ) The vector weight of (2).
6. The method as claimed in claim 5, wherein the calculation formula for obtaining the synthetic trajectory data by using 64 LSTMCell long-short term memory network units to perform time-series prediction on the 64-dimensional vector is as follows:
O=LSTM(T,W lstm ) (3)
in the formula: t = { T = 1 ,t 2 ,…,t i ,…,t maxlength T represents the coordinate characteristics of all trace points in the original trace data, where T i Representing 64-dimensional vectors of the track points with the number i, wherein maxlength represents the longest length of a single track in original track data; w lstm A weight matrix representing the input data; o denotes synthetic track data, where data contained within O is represented as O = { O = { (O) } 1 ,o 2 ,…,o i ,…,o maxlength },o i Represents t i After being subjected to LSTM treatmentThe output coordinate composition value of (1).
7. The trajectory data privacy protection method for the intelligent transportation system according to claim 6, wherein the formula for decoding the centroid deviation coordinate of the synthesized trajectory data by using the tanh hyperbolic tangent function is as follows:
Figure FDA0003759324350000033
in the formula (I), the compound is shown in the specification,
Figure FDA0003759324350000034
decoded coordinates representing the centroid deviation coordinates of the trajectory point i in the synthetic trajectory data,
Figure FDA0003759324350000035
and
Figure FDA0003759324350000036
respectively representing longitude deviation and latitude deviation of the track point i; w is a group of dGPS A decoding matrix weight representing a coordinate vector; d GPS Is tan h hyperbolic tangent function.
8. A trajectory data privacy protection system for an intelligent transportation system, the system comprising:
the acquisition module is used for acquiring a real-time real track data set of a user;
the synthetic track module is used for loading the real track data set into a pre-constructed and trained end-to-end deep learning model to generate a synthetic track data set;
the clustering module is used for clustering track points of the synthetic track data set under each timestamp by using a k-means clustering algorithm based on Euclidean distance and generating generalized tracks by clustering centers of tracks under different timestamps after clustering in a random combination mode;
and the noise adding and issuing module is used for adding Laplace noise and consistency constraint to the counting matrix of the generalized track to obtain a differential privacy counting matrix with limited noise quantity and issuing the differential privacy counting matrix.
9. The trajectory data privacy protection system for the intelligent transportation system of claim 8, wherein the synthetic trajectory module includes an acquisition unit; the end-to-end deep learning model comprises a track generator and a track discriminator; the track generator comprises a first input layer, a first embedding layer, a first LSTM modeling layer and a first output layer; the track discriminator comprises a second input layer, a second embedding layer, a second LSTM modeling layer and a second output layer; wherein:
the acquisition unit is used for acquiring historical real track data of different users as original track data of the training model;
the first input layer is used for carrying out centroid standardization processing on the original track data to obtain a centroid deviation coordinate of each track point and obtain a standard original track after coding;
a first embedding layer, for processing each centroid offset coordinate into a 64-dimensional vector by a linear rectification function using a multilayer perceptron MLP; adding random noise to the vectors to enable each group of vectors in the original track to keep the same length as the longest track;
the first LSTM modeling layer is used for carrying out time series prediction processing on the 64-dimensional vectors through 64 LSTM Cell long-term and short-term memory network units to obtain synthetic track data;
the first output layer is used for decoding the longitude and latitude deviation of the synthetic track data through tanh hyperbolic tangent function by using the two dense layers Den to obtain a decoded synthetic track;
the second input layer is used for taking the standard original track and the synthesized track as input data of the track discriminator, carrying out centroid standardization processing on the synthesized track to obtain a corresponding centroid deviation coordinate of each track point and obtaining a coded standard synthesized track;
the second embedding layer is used for processing the centroid deviation coordinates of each standard original track and each standard synthesized track into corresponding 64-dimensional vectors through a linear rectification function by utilizing a multilayer perceptron MLP;
the second LSTM modeling layer is used for calculating a track similarity loss value through a track loss function by using the two groups of corresponding 64-dimensional vectors through 64 LSTM cells;
the second output layer is used for carrying out binary processing on the track similarity loss value through a sigmod activating function by using a dense layer Den to obtain a judgment result of the synthesized track; if the judgment result is false, inputting the original track data into the track generator again for training, solving the model optimization problem by using a back propagation algorithm to update the network parameters in the first LSTM modeling layer in the track generator, stopping training until the judgment result is true, and outputting the synthetic track with the judgment result being true to form a synthetic track data set.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the trajectory data privacy protection method for an intelligent transportation system according to any one of claims 1 to 7.
CN202210866318.2A 2022-07-22 2022-07-22 Track data privacy protection method and system for intelligent traffic system Pending CN115168900A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210866318.2A CN115168900A (en) 2022-07-22 2022-07-22 Track data privacy protection method and system for intelligent traffic system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210866318.2A CN115168900A (en) 2022-07-22 2022-07-22 Track data privacy protection method and system for intelligent traffic system

Publications (1)

Publication Number Publication Date
CN115168900A true CN115168900A (en) 2022-10-11

Family

ID=83496677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210866318.2A Pending CN115168900A (en) 2022-07-22 2022-07-22 Track data privacy protection method and system for intelligent traffic system

Country Status (1)

Country Link
CN (1) CN115168900A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952364A (en) * 2023-03-07 2023-04-11 之江实验室 Route recommendation method and device, storage medium and electronic equipment
CN116595254A (en) * 2023-05-18 2023-08-15 杭州绿城信息技术有限公司 Data privacy and service recommendation method in smart city

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115952364A (en) * 2023-03-07 2023-04-11 之江实验室 Route recommendation method and device, storage medium and electronic equipment
CN116595254A (en) * 2023-05-18 2023-08-15 杭州绿城信息技术有限公司 Data privacy and service recommendation method in smart city
CN116595254B (en) * 2023-05-18 2023-12-12 杭州绿城信息技术有限公司 Data privacy and service recommendation method in smart city

Similar Documents

Publication Publication Date Title
CN115168900A (en) Track data privacy protection method and system for intelligent traffic system
CN109191922B (en) Large-scale four-dimensional track dynamic prediction method and device
CN108632097A (en) Recognition methods, terminal device and the medium of abnormal behaviour object
CN107977734B (en) Prediction method based on mobile Markov model under space-time big data
CN110609881A (en) Vehicle trajectory deviation detection method, system and storage medium
Li et al. A new clustering algorithm for processing GPS-based road anomaly reports with a mahalanobis distance
CN109960738B (en) Large-scale remote sensing image content retrieval method based on depth countermeasure hash learning
Xu et al. Public bicycle traffic flow prediction based on a hybrid model
CN111126658A (en) Coal mine gas prediction method based on deep learning
Kaveh et al. An efficient two‐stage method for optimal sensor placement using graph‐theoretical partitioning and evolutionary algorithms
CN110619084B (en) Method for recommending books according to borrowing behaviors of library readers
CN108733003A (en) Slewing parts process working hour prediction technique based on kmeans clustering algorithms and system
CN114707431B (en) Method and system for predicting residual service life of rotating multiple components and storage medium
CN112733890A (en) Online vehicle track clustering method considering space-time characteristics
CN113033899A (en) Unmanned adjacent vehicle track prediction method
CN101964061B (en) Binary kernel function support vector machine-based vehicle type recognition method
CN106251004B (en) The Target cluster dividing method divided based on room for improvement distance
CN110110339A (en) A kind of hydrologic forecast error calibration method and system a few days ago
CN106961441A (en) A kind of user's dynamic accesses control method for Hadoop cloud platform
CN116811895B (en) Vehicle running speed determination model processing method and vehicle running speed determination method
CN109344171A (en) A kind of nonlinear system characteristic variable conspicuousness mining method based on Data Stream Processing
CN115691140B (en) Analysis and prediction method for space-time distribution of automobile charging demand
CN115565376B (en) Vehicle journey time prediction method and system integrating graph2vec and double-layer LSTM
Wang et al. Dynamic traffic prediction based on traffic flow mining
Wang et al. A Novel Multi‐Input AlexNet Prediction Model for Oil and Gas Production

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination