CN113438603B

CN113438603B - Track data release method and system based on differential privacy protection

Info

Publication number: CN113438603B
Application number: CN202110346868.7A
Authority: CN
Inventors: 徐小龙; 孔诚恺; 段卫华
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2024-01-23
Anticipated expiration: 2041-03-31
Also published as: CN113438603A

Abstract

The invention discloses a track data release method and a track data release system based on differential privacy protection. The method and the system provided by the invention improve the privacy protection degree of the track data on the premise of ensuring the availability of the data.

Description

Track data release method and system based on differential privacy protection

Technical Field

The invention relates to the technical field of track privacy protection, in particular to a track data distribution method and system based on differential privacy protection.

Background

The trace big data is an important branch of the space-time big data, and is composed of a series of coordinates with time stamps, and the space-time sequence is a remarkable characteristic that the trace big data is distinguished compared with the general space-time big data. Mobile internet and location based services (Location Based Service, LBS) are now being developed for a long time, and track data is being generated at all times. The big track data contains potential for helping society solve problems caused by human development of traffic, environment, resources and the like. While providing great convenience, it unfortunately constitutes a considerable potential threat to the privacy of the user. Track data generally contains sensitive information about users, and massive track data contains privacy information such as behavior characteristics, personal hobbies, health conditions, social relations and the like of the users, so that the users can release and use the sensitive information in a random manner, and the risk of exposing the privacy is extremely high. In 2014 the data scientist Tocher identified the start/end of his journey or even the cost by simply taking a picture of a taxi by a well known person on public data and public news.

Any distribution of unprocessed trajectory data may have disastrous consequences to the user. The potential risk of privacy exposure results in both users and companies that are not willing to provide and release track data, which greatly limits the ability of the academy to analyze research track data and analyze valuable information to the public. In fact, very little trace data is indeed published without any protection. Therefore, a track data distribution scheme capable of strongly protecting privacy on the premise of guaranteeing data availability is urgently needed.

Disclosure of Invention

The invention aims to: the invention provides a track data release method and a track data release system with high privacy protection degree on the premise of guaranteeing the availability of data.

The technical scheme is as follows: the track data issuing method based on differential privacy protection is used for acquiring a generalization track of a target group moving in a target area within a preset time period and a count of the generalization track, wherein the target group comprises a plurality of target individuals; the method comprises the following steps:

step 1: obtaining a moving track of each target individual in a target area in a preset time period, and taking the moving track as an original track; then enter step 2;

step 2: acquiring position coordinates of each original track under each preset time stamp in a preset time period;

dividing the position coordinates on each original track according to the time stamps to obtain position coordinate sets corresponding to the time stamps respectively;

clustering each coordinate in the set to be processed by taking each position coordinate set as the set to be processed, acquiring each cluster corresponding to the set to be processed, and further acquiring the cluster corresponding to each position coordinate set;

sequentially connecting cluster centers of clusters corresponding to the time stamps along a time sequence, further obtaining a group of generalization tracks, extracting the generalization tracks corresponding to the original tracks, taking the generalization tracks as original generalization tracks, and taking the rest as standby generalization tracks;

then enter step 3;

step 3: based on the generalization tracks corresponding to the original tracks, obtaining original generalization tracks which are different from each other, using the original generalization tracks as non-repeated original generalization tracks, and obtaining the number of the non-repeated original generalization tracks; randomly selecting a preset number of generalization tracks from all standby generalization tracks to serve as filling generalization tracks: assigning a count value of each filling generalized track to zero; the filling generalization track and the non-repeated generalization track are taken as target generalization tracks together, and a filling count matrix M is constructed based on the count of the target generalization tracks ₍₁₎ The method comprises the steps of carrying out a first treatment on the surface of the Differential privacy noise generation algorithm based pair filling count matrix M ₍₁₎ Performing Haer wavelet transformation and adding Laplace noise to obtain a reconstruction counting matrix M'; and carrying out consistency constraint on the reconstructed count matrix M 'to obtain a target generalized track count matrix M'.

As a preferred embodiment of the present invention, after step 3, the method further comprises:

step 4: and the target generalization track counting matrix M' are issued together.

As a preferred embodiment of the present invention, in step 2, the coordinates in each coordinate set are clustered using a K-means clustering algorithm.

As a preferred embodiment of the present invention, in step 3, the method for obtaining the reconstructed count matrix M' includes the steps of:

step 3.1: constructing an initial counting matrix M, M= { tc according to the number of each non-repeated original generalization track _i |i＝1,...J'}，tc _i For the ith element in the initial counting matrix M, J' is the number of non-repeated original generalization tracks; then enter step 3.2;

step 3.2: randomly selecting J-J' filling generalization tracks from the standby generalization tracks, wherein the number of each filling generalization track is 0; wherein J is the number of original tracks; the number of each filling track is taken as a filling elementAdding the filling count matrix M into the count matrix M to obtain a filling count matrix M ₍₁₎ ；

Using 0 value as a supplementary element, the count matrix M will be filled ₍₁₎ The number of the elements is added to 2 ^l Obtaining a matrix after element supplementation; wherein the index l is such that J.ltoreq.2 ^l Is the minimum of (2);

then enter step 3.3;

step 3.3: performing one-dimensional haar wavelet transformation on the matrix after element supplementation to obtain a wavelet coefficient matrix C, and adding corresponding Laplacian noise to each element in the wavelet coefficient matrix to obtain a noise-added wavelet coefficient matrix C';

then enter step 3.4;

step 3.4: reconstructing a noise adding count matrix based on the noise adding wavelet coefficient matrix C ', deleting the supplementary elements in the noise adding count matrix, and further obtaining a reconstructed count matrix M ', M ' = { nc _i' |i'＝1,...J}，nc _i' The i 'th element in the reconstructed count matrix M' is reconstructed.

As a preferred embodiment of the present invention, the method for obtaining the target generalized trajectory count matrix m″ includes the steps of:

step 3.5: sorting the elements in the reconstructed counting matrix M 'according to the size of each track counting value in the reconstructed counting matrix M', and further obtaining a sequence S; then enter step 3.6;

step 3.6: according to the following formula:

and combine constraint L _m ＝Q _m Acquiring a count update value L of an mth element in the sequence S _m Further, obtaining a count update value of each element in the sequence S:

wherein Q is _m A count update value L representing the mth element in the sequence S _m Corresponding values of (2); i 'and j are the ith element and the jth element in the sequence S respectively, s| represents the number of elements in the sequence S, mean [ i', j]Representing sequencesThe mean value of the i 'th element to the j' th element in S;

step 3.7: based on the count update value of each element, a generalized track count matrix M is constructed.

The invention also provides a track data release system based on differential privacy protection, which comprises a data acquisition and processing module; the data acquisition and processing module is used for executing the following steps:

step 1: obtaining a moving track of each target individual in a target area in a preset time period, and taking the moving track as an original track;

clustering each coordinate in the set to be processed by taking the set to be processed as the set to be processed respectively, and acquiring each cluster corresponding to the set to be processed, thereby acquiring the cluster corresponding to each coordinate set;

step 3: based on the generalization tracks corresponding to the original tracks, obtaining original generalization tracks which are different from each other, using the original generalization tracks as non-repeated original generalization tracks, and obtaining the number of the non-repeated original generalization tracks; randomly selecting a preset number of generalization tracks from the standby generalization tracks to serve as filling generalization tracks: assigning a count of each padded generalized trace to zero; the filling generalization track and the non-repeated generalization track are taken as target generalization tracks together, and a filling counting matrix M is constructed based on counting of the target generalization tracks ₍₁₎ The method comprises the steps of carrying out a first treatment on the surface of the Differential privacy noise generation algorithm based pair filling count matrix M ₍₁₎ Performing haar wavelet transformation and adding Laplace noise to obtain a reconstruction counting matrix M'; reconstruction is carried outThe count matrix M 'performs a consistency constraint to obtain a target generalized trajectory count matrix M'.

As a preferred scheme of the invention, the system further comprises a release module, wherein the release module is used for acquiring the target generalization track and the target generalization track counting matrix M' in the data acquisition and processing module, and executing the following steps:

As a preferred embodiment of the present invention, the data acquisition and processing module includes a reconstruction count matrix construction module, and the module is configured to perform the following steps:

step 3.2: randomly selecting J-J' filling generalization tracks from the standby generalization tracks, wherein the number of each filling generalization track is 0; wherein J is the number of original tracks; adding the number of each filling track as filling element into a counting matrix M to obtain a filling counting matrix M ₍₁₎ 。

then enter step 3.3;

then enter step 3.4;

As a preferred solution of the present invention, the data acquisition and processing module further includes a target generalized trajectory count matrix building module, where the module is configured to perform the following steps:

step 3.6: according to the following formula:

wherein Q is _m A count update value L representing the mth element in the sequence S _m Corresponding values of (2); i 'and j are the ith element and the jth element in the sequence S respectively, s| represents the number of elements in the sequence S, mean [ i', j]Representing the mean value of the i 'th element to the j' th element in the sequence S;

The invention also provides a track data release system based on differential privacy protection, which comprises a processor and a storage medium;

the storage medium is used for storing instructions;

the processor being operative according to the instructions to perform the steps of the method of any one of claims 1 to 5.

The beneficial effects are that: according to the method and the system provided by the invention, a group of target generalized tracks are obtained based on the original data tracks, a filling count matrix is constructed based on the target generalized tracks, the elements in the filling count matrix are subjected to Hash wavelet transformation, laplace noise is added in the reconstruction process to obtain the reconstructed count matrix, and the reconstructed count matrix is subjected to one-factor constraint to obtain the target generalized track count matrix.

Drawings

FIG. 1 is a flow chart of a track data issuing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a result of clustering coordinates in coordinate sets of positions under each timestamp according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a trace data distribution result provided by an embodiment of the present invention;

FIG. 4 is a comparison chart of privacy protection degree index results of a track data distribution method according to an embodiment of the present invention;

FIG. 5 is a graph comparing the results of the track data distribution method according to the embodiment of the present invention;

FIG. 6 is a system architecture diagram of a track big data application management system according to an embodiment of the present invention;

FIG. 7 is a diagram of a first page of a track big data application management system according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a generalized trajectory data publishing interface according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of a cluster core query interface of a generalized trajectory data distribution interface provided by an embodiment of the present invention;

FIG. 10 is a generalized trajectory query pop-up diagram of a generalized trajectory data distribution interface provided by an embodiment of the present invention;

FIG. 11 is a schematic diagram of an administrator privacy budget setting page provided by an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

In a first aspect, the present invention provides a track data publishing method, referring to fig. 1, the method includes the following steps:

and sequentially connecting cluster centers of clusters corresponding to the time stamps along a time sequence, further obtaining a group of generalization tracks, extracting the generalization tracks corresponding to the original tracks, taking the generalization tracks as original generalization tracks, and taking the rest as standby generalization tracks.

Wherein the generalized trajectories corresponding to each original trajectory are: and replacing the position coordinates of the original track under the corresponding time stamps by using the cluster center of the cluster where the position coordinates of the original track under each time stamp are located, so as to obtain the generalization track corresponding to the original track.

And then proceeds to step 3.

In the method of step 2, a method for clustering the set of position coordinates is selected according to the following method:

obtaining clusters corresponding to each position coordinate set by using different clustering methods, and further generating a group of different generalization schemes;

the trace data set is marked as D, and the time stamp t _i The lower set of position coordinates is labeled T (T _i ) The generalization scheme corresponding to each clustering method is marked as p, a k-means clustering algorithm is used, and T (T _i ) The generalization scheme of the partitioning is noted as

According to the formula r= |t (T _i ) I acquisition T (T) _i ) Is defined according to the formula τ=g ^r And acquiring the size tau of all possible candidate generalization scheme sets, wherein g is the number of clusters acquired after clustering the position coordinate sets.

For each possible generalization scheme p in τ, a utility function u (D, p) is defined dxτ→r according to the following formula:

obtaining a utility function u (D, p) of the generalization scheme p;

wherein MeanDist (p) represents the average value of the generalization scheme p;is the GPS coordinate set divided into cluster k in generalization scheme p,/>Is a set of trajectories through these GPS coordinates, +.>Representing the number of tracks in the set, T _i Is->The ith track of (a),>is->Is included in the average trajectory of (a). Generalization scheme using k-means clustering algorithm>Mean value of>The calculation method of (2) is as shown in the formula.

The probability f (p) of each generalization scheme is calculated according to a utility function by using an exponential mechanism, and specifically, the probability f (p) is calculated according to the following formula:

acquiring the probability f (p) of the generalization scheme p, and further acquiring the probability of each generalization scheme;

wherein ε ₁ An exponential mechanism sensitivity value preset for the first preset privacy budget value, deltau.

According to the calculation formula of the probability of the generalization scheme, the probability f (p) of the generalization scheme p is proportional to

Selecting the generalization scheme with the highest probability in the generalization schemes as an optimal generalization scheme, wherein in the embodiment, the generalization scheme based on a k-means clustering algorithmIn the embodiment, a generalization scheme based on a k-means clustering algorithm is selected to obtain a generalization track, namely: t (T) is determined based on Euclidean distance using a k-means clustering algorithm _i ) Dividing into g clusters, and generating a group of generalized tracks based on cluster centers.

Step 3: based on the generalization tracks corresponding to the original tracks, obtaining original generalization tracks which are different from each other, using the original generalization tracks as non-repeated original generalization tracks, and obtaining the number of the non-repeated original generalization tracks; randomly selecting a preset number of generalization tracks from all standby generalization tracks to serve as filling generalization tracks: assigning a count of each padded generalized trace to zero; will beThe filling generalized track and the non-repeated generalized track are jointly used as target generalized tracks, and a filling matrix M is constructed based on the counting of the target generalized tracks ₍₁₎ The method comprises the steps of carrying out a first treatment on the surface of the For filling matrix M ₍₁₎ Performing haar wavelet transformation and adding Laplace noise to obtain a reconstruction counting matrix M'; and carrying out consistency constraint on the reconstructed count matrix M 'to obtain a target generalized track count matrix M'.

The method for obtaining the reconstructed count matrix M' comprises the following steps:

step 3.1: when the generalized track is acquired, the cluster center is used for replacing the position coordinates of the original track to further obtain the generalized track corresponding to the original track, and if the position coordinates of a group of original tracks under each time stamp are all located in the same cluster, the group of original tracks correspond to the same generalized track, that is, the same generalized track may correspond to a plurality of original tracks.

Constructing an initial counting matrix M, M= { tc according to the number of each non-repeated original generalization track _i |i＝1,...J'}，tc _i For the ith element in the initial counting matrix M, J' is the number of non-repeated original generalization tracks; then enter step 3.2;

step 3.2: randomly selecting J-J' filling generalized tracks from the standby generalized tracks, and assigning the number of each filling generalized track to be 0; wherein J is the number of original tracks; adding the number of each filling track as a filling element into a counting matrix M to obtain a filling matrix M ₍₁₎ ；

Using 0 values as supplementary elements, the matrix M will be filled ₍₁₎ The number of the elements is added to 2 ^l Obtaining a matrix after element supplementation, and marking the matrix as M ₍₂₎ ＝{tc _i |i＝1,...2 ^l -1}; wherein the index l is such that J.ltoreq.2 ^l Is the minimum of (2);

then enter step 3.3;

step 3.3: performing one-dimensional haar wavelet transformation on the matrix after element supplementation based on a differential privacy noise generation algorithm to obtain a wavelet coefficient matrix C, C= { C _i |i＝0,1,…,2 ^l -1, for each element in the wavelet coefficient matrixAdding corresponding Laplace noise to obtain a matrix of noisy wavelet coefficients C', specifically, adding each term followedDistributed noise, where q is c in the wavelet coefficient tree _i Is of a height of (2); w (c) _i ) Representing intermediate parameters, W (c) _i )＝2 ^q 。

Then enter step 3.4;

step 3.4: reconstructing a noise adding count matrix based on the noise adding wavelet coefficient matrix C ', deleting supplementary elements in the noise adding count matrix, reducing the number of the elements to J, and further obtaining a reconstructed count matrix M':

M'＝{nc _i' |i'＝1,...J}，nc _i' the i 'th element in the reconstructed count matrix M' is reconstructed.

In steps 3.1 to 3.4, a reconstructed count matrix M' is obtained based on a differential privacy noise generation algorithm, and then a target generalized trajectory count matrix m″ is obtained according to the method described in steps 3.5 to 3.7:

step 3.6: according to the following formula:

and combine constraint L _m ＝Q _m Acquiring a count update value L of an mth element in the sequence S _m Further, obtaining the updated count value of each element in the sequence S, and when obtaining the updated count value of each element in the sequence S, the corresponding constraint condition is: { L ₁ ,L ₂ ,…,L _|S| }＝{Q ₁ ,Q ₂ ,…,Q _|S| }。

Wherein Q is _m A count update value L representing the mth element in the sequence S _m Corresponding values of (2); l (L) _|S| Representing the count update value, Q, of the |S| element in the sequence S _|S| A corresponding value representing the count update value of the |s| element in the sequence S; i ' and j are the i ' th element and the j ' th element in the sequence S respectively, s| represents the number of elements in the sequence S, mean [ i ", j]Representing the average value of the ith element to the jth element in the sequence S;

In a second aspect, the invention also provides a track data release system based on differential privacy protection for implementing the method, which comprises a data acquisition and processing module; the data acquisition and processing module is used for executing the following steps:

step 3: based on the generalization tracks corresponding to the original tracks, obtaining original generalization tracks which are different from each other, using the original generalization tracks as non-repeated original generalization tracks, and obtaining the number of the non-repeated original generalization tracks; random selection from standby generalized trajectoriesTaking a preset number of generalization tracks as filling generalization tracks: assigning a count of each padded generalized trace to zero; the filling generalized track and the non-repeated generalized track are used as target generalized tracks together, and a filling matrix M is constructed based on the counting of the target generalized tracks ₍₁₎ The method comprises the steps of carrying out a first treatment on the surface of the For filling matrix M ₍₁₎ Performing haar wavelet transformation and adding Laplace noise to obtain a reconstruction counting matrix M'; and carrying out consistency constraint on the reconstructed count matrix M 'to obtain a target generalized track count matrix M'.

The system also comprises a release module, wherein the release module is used for acquiring the target generalization track and the target generalization track counting matrix M' in the data acquisition and processing module, and executing the following steps:

The data acquisition and processing module comprises a reconstruction counting matrix construction module and a target generalization track counting matrix construction module.

The reconstruction counting matrix construction module is used for executing the following steps:

step 3.2: randomly selecting J-J' filling generalized tracks from the standby generalized tracks, and marking the number of each filling track as 0; wherein J is the number of original tracks; adding the number of each filling track as a filling element into a counting matrix M to obtain a filling matrix M ₍₁₎ 。

Using 0 values as supplementary elements, the matrix M will be filled ₍₁₎ The number of the elements is added to 2 ^l Obtaining a matrix after element supplementation; wherein the index l is such that J.ltoreq.2 ^l Is the minimum of (2);

then enter step 3.3;

step 3.3: performing one-dimensional haar wavelet transformation on the matrix after element supplementation to obtain a wavelet coefficient matrix C, and adding corresponding Laplacian noise to each element in the wavelet coefficient matrix to obtain a noise-added wavelet coefficient matrix C'; then enter step 3.4;

The data acquisition and processing module further comprises a generalization track count matrix construction module for executing the following steps:

step 3.6: according to the following formula:

In a third aspect, the present invention also provides a track data publishing system based on differential privacy protection, the system comprising a processor and a storage medium;

the storage medium is used for storing instructions;

In one embodiment, taking taxies in the city of the capital as a target group and taking 1000 taxies for analysis as target individuals, 1000 original tracks T of each taxi under 20 continuous time stamps are obtained, namely each sample consists of 20 continuous coordinates.

The same time stamp T as for the trajectory dataset T _i Set T (T) of all coordinates on _i ) The number g of division clusters is preset to be 20, i.e., the goal is to divide it into 20 clusters. Let T (T) _i ) The partitioned generalization scheme is denoted as p, while using the k-means clustering algorithm, T (T _i ) The generalization scheme of the partitioning is noted asLet r denote T (T) _i ) R= |t (T) _i ) |=15. Let τ denote the set size of all possible candidate generalization schemes, τ=20 ¹⁵ An order of magnitude is too large, and computing each generalization scheme better or worse results in excessive overhead.

For each possible generalization scheme p in τ, a utility function u (D, p) is defined, dxτ→r, as follows:

wherein:

wherein the method comprises the steps ofIs the GPS coordinate set divided into cluster k in generalization scheme p,/>Is a set of trajectories through these GPS coordinates, T _i Is->The ith track of (a),>is->Is included in the average trajectory of (a).

The method uses an exponential mechanism to calculate the probability of adopting each generalization scheme according to a utility function, wherein the probability of p selected is proportional toThe calculation formula is +.>Therefore, the generalization scheme selects k-means clustering algorithm to use T (T) _i ) Dividing into 20 clusters, recording cluster labels and cluster center coordinates, and using cluster centers to replace the origin to generate a new generalized track, wherein the result is shown in fig. 2.

And generating a new generalization track by using a cluster core to replace an origin through track generalization, and merging a plurality of tracks in a data set into one generalization track. The 1000 tracks are subjected to generalization to generate 729 non-repeated generalization tracks in total, and 1000-729=271 generalization track filling tracks M need to be randomly selected from standby generalization tracks except for the non-repeated generalization tracks in a generalization track space ₍₁₎ The count value of these tracks is 0, thus obtaining a size of [1000,1]Is a filling count matrix M of (2) ₍₁₎ 。

To fill the count matrix M ₍₁₎ The Haer wavelet transformation is convenient, and the 0 value is used for filling the counting matrix M ₍₁₎ Is filled to 1024. For filling count matrix M ₍₁₎ Performing one-dimensional haar wavelet transformation to obtain a wavelet coefficient matrix C= { C _i I=0, 1, …,1023}. Laplace noise is added to the wavelet coefficient matrix C, and the addition of each term followsDistributed noise, wherein W (c _i )＝2 ^l L is c in the wavelet coefficient tree _i To obtain a matrix of noisy wavelet coefficients C'.

Reconstructing a noise adding count matrix M by using the noise adding wavelet coefficient matrix C ₍₃₎ ＝{nc _i I=0, 1, …,1023}, the noise count matrix M will be added ₍₃₎ The number of elements is cut down to 1000 from the tail.

Step 3: the real constraint, which is also larger for trajectories containing more trajectories, constrains the count matrix to be issued.

For M' = { nc _i I=1, …,1000} to obtain a sequence S, let

Wherein the method comprises the steps oftc _m Is the corresponding mth element in the initial count matrix. In S' = { L ₁ ,L ₂ ,…,L _|S| }＝{Q ₁ ,Q ₂ ,…,Q _|S| The constraint is }, for each element in M', a +.>Yield M "= { nc _i I=0, 1, …,999} for release, as shown in fig. 3.

The privacy loss of the distribution method is evaluated according to the mutual information between the calculated true count matrices M and M ", and the mutual information MI (X, Y) of the formal two discrete random variables X and Y can be defined as:

wherein X and Y represent the specific values of the variables X and Y, p (X, Y) represent the joint probability distribution of the variables X and Y, and p (X) and p (Y) represent the edge probability distribution of the variables X and Y, respectively.

As shown in fig. 4, the mutual information index of the method is generally lower than that of the same type of advanced method, and the privacy loss is reduced by about 80% to 91%.

The effect of noise generation on usability is measured using the average absolute error, the smaller the average absolute error, the higher the usability.

According to the following formula:

obtaining an average absolute error MAE of two discrete random variables X and Y;

comparing the average absolute errors of the real count and the noise adding count of the track data release method and the same type of advanced method under different g and epsilon, and the result is shown in figure 5, wherein the method and the same type of advanced method obtain relatively similar results on the usability index: although the average absolute error of the method is higher than that of the same type of advanced method when the privacy budget epsilon is smaller, the average absolute error of the method gradually decreases as epsilon increases, and epsilon=0.8 is even lower than that of the same type of advanced method. Referring to fig. 6, the publishing system is implemented by using a Web application technology, and is an interaction layer, a service layer and a data management layer from top to bottom. The man-machine interaction layer is realized by means of a Web front end frame Vue, and an interface for dynamically presenting and interactable data is provided for an administrator module and a user module. The functional module layer is a core component of the system and is mainly used for completing the functions of extracting data dynamic rendering pages from a database, setting privacy budget, inquiring track details and predicting travel time by subsequent administrators and users. The data management layer is a foundation for system construction and is used for storing related information such as generalized tracks and web page access of the system, and the data storage is realized by means of a relational database MySQL in the system.

The system adopts a development mode of front-end and back-end separation, and utilizes a HTTP library axios based on promise to realize front-end and back-end interaction. Taking the function of inquiring the cluster center as an example, the page provides interaction with a user, after the user inputs the timestamp and the cluster center number to be inquired, the front end packages the timestamp and the cluster center number into a JSON object to send a post request to an API port provided by a rear-end service layer, obtains a returned JSON format object, analyzes and renders the JSON format object, and displays the JSON format object on the page.

The front page all map-related application implementations are by means of the hundred degree map API. And the taxi data set in the city of the adult is published by using the track big data application management system, and the system structure diagram is shown in fig. 4. The system home interface is shown in fig. 7 and comprises a brief introduction to the system project and a generalized trajectory example. The map on the right side of the generalized trajectory example provides an intuitive visualization function: when any generalized track example on the left side is clicked by a mouse, the track route is displayed on the right map, the red corner mark in the map is the generalized track coordinate point, and the blue line along the road represents the track. Clicking the view all button with the mouse can directly jump to the generalized track data publishing interface.

Fig. 8, 9 and 10 show generalized track data distribution interfaces, which mainly consist of a track data distribution form of a main body and a cluster center query button at the upper right corner. In the generalized track data, P represents a time stamp, T represents a track number, and L represents which cluster center the generalized track T is replaced by on the P time stamp. The cluster center query in the upper right corner can select the cluster center that the user wants to query from two drop-down menus, and the query return result is shown in fig. 9. In addition, when the user wants to query the details of an entire generalized track, clicking on the corresponding row in the table directly, the popup window returns the details of an entire generalized track, including the coordinates of each timestamp and intuitively displaying the track on the map, as shown in fig. 10.

FIG. 11 illustrates setting a privacy budget interface whereby an administrator can change the degree of privacy protection of track data by changing different privacy budgets to generate new generalized track data. The background generates a new batch of generalized tracks and counts back under different privacy budgets to replace the original data in the database. The privacy budget interface main body is set to reflect noise distribution diagrams under different privacy budgets, and when the privacy budgets are different, the distribution diagrams on the interface can correspondingly change. Compared with the advanced track data release method of the same type, the method provided by the invention has the advantage that the privacy protection degree index is greatly improved under the condition of ensuring the data availability index.

Mutual information is selected as privacy protection degree index in the experimental process, and the privacy loss is reduced by about 80% to 91%; the degree and availability of the data privacy protection released by the method can be adjusted according to the needs by changing two parameters of privacy budget epsilon and the number g of k-means clusters. The system provided by the invention provides a way for users to deliberately learn managed trajectory data sets to be intuitively efficient, and for users to potentially identify new trends from interactions with the trajectory data. The data set manager can comprehensively control the system in the background and can control the privacy protection degree of the issued data set in the system.

The foregoing is merely a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that modifications and variations can be made without departing from the technical principles of the present invention, and the modifications and variations should also be regarded as the scope of the invention.

Claims

1. The track data release method based on differential privacy protection is characterized by being used for acquiring a generalization track of a target group moving in a target area within a preset time period and a count of the generalization track, wherein the target group comprises a plurality of target individuals; the method comprises the following steps:

step 1: acquiring a moving track of each target individual in a target area within a preset time period, and taking the moving track as an original track; then enter step 2;

then enter step 3;

step 3: based on the generalization tracks corresponding to the original tracks, obtaining original generalization tracks which are different from each other, using the original generalization tracks as non-repeated original generalization tracks, and obtaining the number of the non-repeated original generalization tracks; randomly selecting a preset number of generalization tracks from all standby generalization tracks to serve as filling generalization tracks: assigning a count of each padded generalized trace to zero; the filling generalization track and the non-repeated generalization track are taken as target generalization tracks together, and a filling count matrix M is constructed based on the count of the target generalization tracks ₍₁₎ The method comprises the steps of carrying out a first treatment on the surface of the Differential privacy noise generation algorithm based pair filling count matrix M ₍₁₎ Performing haar wavelet transformation and adding Laplace noise to obtain a reconstruction counting matrix M'; and carrying out consistency constraint on the reconstructed count matrix M 'to obtain a target generalized track count matrix M'.

2. The track data distribution method based on differential privacy protection according to claim 1, wherein after step 3, the method further comprises:

3. The trajectory data distribution method based on differential privacy protection according to claim 1, wherein in step 2, the coordinates in each coordinate set are clustered using a K-means clustering algorithm.

4. The track data distribution method based on differential privacy protection according to claim 1, wherein in step 3, the method of acquiring the reconstructed count matrix M' comprises the steps of:

step 3.2: randomly selecting J-J' filling generalized tracks from the standby generalized tracks, wherein the number of each filling generalized track is 0; wherein J is the number of original tracks; adding the number of each filling track as filling element into a counting matrix M to obtain a filling counting matrix M ₍₁₎ ；

then enter step 3.3;

then enter step 3.4;

step 3.4: reconstructing a noise adding count matrix based on the noise adding wavelet coefficient matrix C ', deleting supplementary elements in the noise adding count matrix, and further obtaining a reconstructed count matrix M ', M ' = { nc _i' |i'＝1,...J}，nc _i' The i 'th element in the reconstructed count matrix M' is reconstructed.

5. The track data distribution method based on differential privacy protection as set forth in claim 4, wherein the method for acquiring the target generalized track count matrix M "includes the steps of:

step 3.5: according to the size of each track count value in the reconstruction count matrix M ', sequencing elements in the reconstruction count matrix M', and further obtaining a sequence S; then enter step 3.6;

step 3.6: according to the following formula:

and combine constraint L _m ＝Q _m Acquiring a count update value L of an mth element in the sequence S _m Further, a count update value of each element in the sequence S is obtained:

6. The track data release system based on differential privacy protection is characterized by comprising a data acquisition and processing module; the data acquisition and processing module is used for executing the following steps:

step 1: acquiring a moving track of each target individual in a target area within a preset time period, and taking the moving track as an original track;

step 3: based on the generalization tracks corresponding to the original tracks, obtaining original generalization tracks which are different from each other, using the original generalization tracks as non-repeated original generalization tracks, and obtaining the number of the non-repeated original generalization tracks; randomly selecting a preset number of generalization tracks from the standby generalization tracks to serve as filling generalization tracks: assigning a count of each padded generalized trace to zero; the filling generalization track and the non-repeated generalization track are taken as target generalization tracks together, and a filling count matrix M is constructed based on the count of the target generalization tracks ₍₁₎ The method comprises the steps of carrying out a first treatment on the surface of the Differential privacy noise generation algorithm based pair filling count matrix M ₍₁₎ Performing haar wavelet transformation and adding Laplace noise to obtain a reconstruction counting matrix M'; and carrying out consistency constraint on the reconstructed count matrix M 'to obtain a target generalized track count matrix M'.

7. The differential privacy protection based trajectory data distribution system of claim 6, wherein the system further comprises a distribution module; the issuing module is used for acquiring the target generalization track and the target generalization track counting matrix M' in the data acquisition and processing module, and executing the following steps:

8. The differential privacy protection based trajectory data distribution system of claim 6, wherein the data acquisition and processing module includes a reconstruction count matrix construction module configured to perform the steps of:

step 3.2: randomly selecting J-J' filling generalization tracks from the standby generalization tracks, wherein each filling generalization trackThe number of (2) is 0; wherein J is the number of original tracks; adding the number of each filling track as filling element into a counting matrix M to obtain a filling counting matrix M _(1)；

then enter step 3.3;

then enter step 3.4;

9. The differential privacy protection based trajectory data distribution system of claim 8, wherein the data acquisition and processing module further comprises a target generalized trajectory count matrix construction module configured to perform the steps of:

step 3.6: according to the following formula:

10. The track data release system based on differential privacy protection is characterized by comprising a processor and a storage medium;

the storage medium is used for storing instructions;