CN112861171A - Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm - Google Patents

Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm Download PDF

Info

Publication number
CN112861171A
CN112861171A CN202110074474.0A CN202110074474A CN112861171A CN 112861171 A CN112861171 A CN 112861171A CN 202110074474 A CN202110074474 A CN 202110074474A CN 112861171 A CN112861171 A CN 112861171A
Authority
CN
China
Prior art keywords
grid
track
vector
particle swarm
swarm optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110074474.0A
Other languages
Chinese (zh)
Other versions
CN112861171B (en
Inventor
朱虹
余云凯
谢美意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110074474.0A priority Critical patent/CN112861171B/en
Publication of CN112861171A publication Critical patent/CN112861171A/en
Application granted granted Critical
Publication of CN112861171B publication Critical patent/CN112861171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a privacy protection method and system for correlation between tracks based on a particle swarm optimization algorithm, and belongs to the field of computers. The invention discretizes the longitude and latitude space where the track data is located, and avoids the interference of useless information as much as possible; extracting grid access frequency vectors of grid tracks based on grid space, and taking the similarity between the two vectors as the quantization of the correlation between the tracks, wherein the quantization time complexity is O (n) magnitude; the improved particle swarm algorithm is used for solving the objective function, so that the correlation between the track to be protected and the rest tracks can be considered and reduced at the same time, and the problem that the existing track correlation privacy protection technology can only protect two tracks is solved; the improved particle swarm optimization is embodied in each iteration, after the position vector and the speed vector of each particle are updated, the Laplace noise is added to the frequency vector corresponding to the position vector by adopting a sparse vector technology, and then the frequency vector is normalized to obtain the disturbed position vector, so that the safety of the iteration process is ensured.

Description

Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm
Technical Field
The invention belongs to the field of computers, and particularly relates to a privacy protection method and system for correlation between tracks based on a particle swarm optimization algorithm.
Background
The inter-track correlation of different users can be directly applied to many application scenarios, such as targeted advertisement recommendations and epidemiological surveys, etc. Although the correlation between tracks can bring many benefits, the correlation between tracks can reason the relationship between the users to which the tracks belong with a high probability, so that the relationship between the religious beliefs, health states and the like among the users to which the tracks belong can be easily inferred, and serious privacy threat is caused.
Currently, a few researchers have proposed privacy protection methods for inter-track correlation. However, these methods limit their application to scenarios where two tracks are published offline. Such as in the well-known application of drip-drop, Uber, in car-driving, these approaches can achieve the desired goal when two fellow passengers want to hide the correlation between their movement trajectories produced by the drip-drop or Uber. In fact, only two users are unlikely to be available on real social grid applications (e.g., microblogs), and thus when these social grid applications want to obtain services such as location-based recommendations from third-party providers, these applications typically publish far more than 2 tracks to the third-party providers. The existing method is not further developed based on the scene.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a privacy protection method and a privacy protection system for correlation among tracks based on a particle swarm optimization algorithm, aiming at reducing the similarity of grid access frequency vectors with other tracks as much as possible by keeping the statistic characteristic change of the grid access frequency vectors of input tracks before and after disturbance as small as possible based on the improved particle swarm optimization algorithm, thereby protecting the privacy of correlation among the tracks and ensuring that the track data issued after disturbance has higher data availability and safety.
To achieve the above object, according to a first aspect of the present invention, there is provided a privacy protection method for inter-track correlation based on a particle swarm optimization algorithm, the method comprising the following steps:
s1, carrying out grid division on geographic spaces where all tracks to be issued are located to obtain a grid domain G, wherein the tracks comprise longitude and latitude and timestamps, mapping each track from a longitude and latitude form to a grid form, and then counting grid access frequency vectors of each grid track in the grid domain G
Figure BDA0002907083400000021
S2, solving the target by adopting an improved particle swarm optimization algorithm for each grid track to obtain an updated grid access frequency vector
Figure BDA0002907083400000022
Wherein the position vector of the particle is
Figure BDA0002907083400000023
With the aim of keeping the grid trajectories as corresponding as possible
Figure BDA0002907083400000024
And
Figure BDA0002907083400000025
while minimizing the grid footprint
Figure BDA0002907083400000026
The improved particle swarm optimization algorithm updates the position vector and the velocity vector of each particle in each iteration, adds Laplace noise with the dimension of | G | to the frequency vector corresponding to the position vector of the particle by adopting a sparse vector technology, and normalizes the disturbed frequency vector to obtain the position vector of the disturbed particle;
s3, accessing frequency vectors for each updated grid
Figure BDA0002907083400000027
Bonding of
Figure BDA0002907083400000028
And the affiliated grid track and the grid domain G are mapped into a longitude and latitude form and then are issued after the correspondingly disturbed grid track is generated.
Has the advantages that:
(1) the invention discretizes the longitude and latitude space where the track data set is located to obtain a grid space, and avoids interference of useless information as much as possible.
(2) The grid access frequency vectors of grid tracks are extracted based on the grid space, the similarity between the two vectors is used as the quantization of the correlation between the tracks, the specific method for calculating the similarity of the vectors comprises cosine similarity, Pearson correlation coefficient and the like, the time complexity of the method for quantizing the correlation between the tracks in the mode is in the order of O (n), the calculation is more efficient, and therefore the method is more suitable for being used in the scene needing to repeatedly calculate the correlation between the tracks.
(3) Compared with the original particle swarm algorithm, the improved particle swarm algorithm is embodied in each iteration, after the position vector and the speed vector of each particle are updated, the Laplace noise is added to the frequency vector corresponding to the position vector of the particle by adopting a sparse vector technology, and then the disturbed frequency vector is normalized to obtain the position vector of the disturbed particle, so that the safety of the iteration process is ensured.
(4) The objective function solved by the improved particle swarm algorithm designed by the invention can simultaneously consider and reduce the correlation between the track to be protected and other tracks, and the problem that the existing track correlation privacy protection technology can only effectively protect two tracks is not needed.
(5) When the track is synthesized, the invention avoids the low-efficiency method of exhaustively exhausting all possible tracks, and realizes a method of balancing track synthesis efficiency and data availability of the synthesized track.
Preferably, in step S1, each mesh in the mesh domain G is further divided into L × L sub-meshes, where L and q (C)k) The direct-current voltage is in direct proportion,
Figure BDA0002907083400000031
wherein q (C)k) Representation grid CkThe location of the (c) is diverse,
Figure BDA0002907083400000032
denotes the ith grid track, ε1Representing the privacy budget, Lap (-) is a laplace density function.
Has the advantages that: the invention improves the existing self-adaptive mesh division method, and only considers the number of positions falling in each mesh when the existing self-adaptive mesh division method further divides the meshes in the mesh domain G, so that each mesh is easily excessively divided into over-dense sub-meshes.
Preferably, in step S2, the objective function of the improved particle swarm optimization algorithm is as follows:
Figure BDA0002907083400000033
the constraint conditions of the improved particle swarm optimization algorithm are as follows:
Figure BDA0002907083400000041
wherein the function sim (-) calculates the correlation coefficient between the two vectors; piRepresenting a grid access frequency vector corresponding to the input track; pi' represents the updated grid access frequency vector of the input trace; pjRepresents the grid access frequency vector, | P, corresponding to other tracksi' | denotes the dimension of the grid access frequency vector after the input trace update, Pi′[m]Representing the mth dimension of the trellis access frequency vector after the input trace update.
Has the advantages that: the method optimizes an objective function through specially designed constraints, the denominator of the objective function represents the correlation size or the reciprocal of the grid access frequency vector of an input track before and after updating, and the statistical characteristic of the grid access frequency vector of a given input track (or a track to be protected) is kept as far as possible; the numerator represents the magnitude of the correlation, or the sum of the reciprocals thereof, of the updated grid access frequency vector of the input trajectory with the grid access frequency vectors corresponding to other trajectories in order to minimize the correlation between a given input trajectory and the remaining trajectories in the trajectory data set. The meaning of the constraint is that the sum of the individual components of the trellis access frequency vector equals 1. The constraint condition is used for the updating process of the local optimal particles and the global optimal particles in the particle swarm optimization. Secondly, because the processing object of the objective function is a grid access frequency vector, and the vector is extracted based on the same grid space, the number of dimensions of all grid access frequency vectors is consistent and is irrelevant to the length of each track, and therefore, the problems that the track length and longitude and latitude can be independently processed do not need to be considered. Finally, the objective function is solved through an improved particle swarm optimization algorithm, and the obtained solution enables the objective function to obtain a minimum value, which is equivalent to that a numerator obtains a minimum value while a denominator obtains a maximum value. The denominator obtains a maximum value, which means that the statistical feature of the grid access frequency vector of the given input track is better maintained, and the data availability is facilitated; the fact that the molecules obtain the minimum value means that the correlation between the given input track and the rest of the tracks is well protected, and the safety of the disturbed tracks is facilitated.
Preferably, Pi' of each dimension Pi′[m]The value range of (a) needs to satisfy the following restrictions:
if the original grid corresponding to the jth node of the input track is CkThen the mesh corresponding to the updated jth node can only be selected from CkSelect from 9 grids of 3 x 3 around;
in pair PiAfter traversal, get the pair { C1,C2,…,C|G|Each net inThe maximum number of accesses possible for a lattice;
normalizing the maximum access times of each grid to obtain PiThe upper limit of each dimension is 0, and the lower limit of each dimension is 0.
Has the advantages that: the invention controls the original grid of each node of the input track to be only disturbed to one of the surrounding 9 grids by limiting the value range of each dimension in the grid access frequency vector, thereby effectively controlling the distance of the disturbed grid track deviating from the original grid track, enhancing the indistinguishability of the grid track before and after disturbance, being beneficial to maintaining the accuracy of requesting the position-based service as much as possible and ensuring the data availability.
Preferably, in step S2, the improved particle swarm optimization algorithm assigns a privacy budget calculation formula for each iteration as follows:
Figure BDA0002907083400000051
wherein itr represents the current iteration round executed by the improved particle swarm optimization algorithm, M represents the maximum iteration number of the improved particle swarm optimization algorithm, and epsilon2Representing an improved privacy budget allocated by the particle swarm optimization algorithm.
Has the advantages that: according to the invention, the appropriate privacy budget is allocated to each iteration of the improved particle swarm optimization algorithm through the reciprocal sequence of the trigonometric numbers, and the privacy budget allocated to each iteration is increased along with the increase of the iteration round, so that the more the privacy budget allocated to the later iteration round is, the less Laplace noise is added, the smaller the influence on the position vector of the particle of the iteration round is, and the more the particle is beneficial to approaching the solution of the objective function.
Preferably, step S3 includes:
s31, counting to obtain Pi' set of grids S with a Medium Access frequency greater than 0C
S32, sequentially collecting SCIn which the highest availability is selected for each nodeTo finally obtain a synthetic grid track with high usability
Figure BDA0002907083400000061
S33, for the grid corresponding to each node, selecting a sub-grid with the maximum position diversity from the grid, and selecting a position with the highest access frequency from a position set which is in the sub-grid and belongs to a user corresponding to a given input track as a disturbance position of the current time sequence, so that the position is converted into a track T in a longitude and latitude formi', and will Ti' as the final release track.
Has the advantages that: in the process of synthesizing the grid tracks, the invention selects a new grid with highest availability for each node, and dynamically evaluates the availability of the synthesized partial grid sequence, thereby ensuring that the finally synthesized complete grid track has higher availability, avoiding the problem of low efficiency when directly selecting one grid track from all possible synthesized grid tracks, and achieving better balance between algorithm execution efficiency and data availability. When the synthetic grid track is converted into the track in the form of longitude and latitude, the invention realizes that the converted track in the form of longitude and latitude has better usability through the position diversity of the sub-grid and the control of the access frequency of each position in the sub-grid, and can effectively keep the characteristics of the access preference (frequency) to different positions and the grid access frequency vector contained in the input track.
Preferably, in step S32, the availability of each grid is defined as follows:
Figure BDA00029070834000000610
wherein, tjDenotes the node number, CkDenoted as current node tjA selected grid;
Figure BDA0002907083400000062
the original mesh trajectory representing the input trajectory isNode tjThe grid of time is CkThe probability of (d);
Figure BDA0002907083400000063
representing synthesized mesh trajectories
Figure BDA0002907083400000064
With the original grid trajectory
Figure BDA0002907083400000065
By node tjThe similarity of the grid access frequency vectors;
Figure BDA0002907083400000066
function representation and grid CkCorresponding to
Figure BDA0002907083400000067
At SCThe rank in the corresponding entry for all the grids in the list,
Figure BDA0002907083400000068
representation and grid CkCorresponding to
Figure BDA0002907083400000069
At SCThe rank in the corresponding entry for all grids in (c).
Has the advantages that: the availability function is designed for each grid in the candidate grid set of the current node, and comprises two parts, as described above, the former part is used for selecting a new grid which best meets the access preference of the current node in the input track under the timestamp from the candidate grid set, and the latter part is used for selecting a new grid which enables the synthesized grid sequence and the input track to have the highest similarity to the current node from the candidate grid set, so that the grid access frequency vector similarity between the grid track before and after disturbance and the current node is the highest, and the spatial preference in the time period to which the timestamp of the current node belongs is kept as much as possible.
To achieve the above object, according to a second aspect of the present invention, there is provided a privacy protection system for inter-track correlation based on particle swarm optimization, including: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer-readable storage medium, and execute the inter-track correlation privacy protection method based on the particle swarm optimization algorithm according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
the method is based on the improved particle swarm optimization algorithm, and the similarity of the grid access frequency vectors of the input tracks before and after disturbance with the grid access frequency vectors of other tracks is reduced as much as possible while the statistic characteristic change of the grid access frequency vectors of the input tracks before and after disturbance is kept to be small, so that the correlation privacy among the tracks is protected, and the track data issued after disturbance has high data availability and safety. Compared with the existing comparison method, the method provided by the invention has higher data availability and stronger privacy protection degree in most of time. The present invention supports the use of multiple methods to calculate the correlation between two trajectories.
Drawings
FIG. 1 is a flowchart of a privacy protection method for inter-track correlation based on a particle swarm optimization algorithm according to the present invention;
FIG. 2 is a spatial distribution plot of a trajectory data set used in an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating calculation of an upper bound of values of each dimension of a grid access frequency vector corresponding to a track according to an embodiment of the present invention;
FIG. 4 is a graph comparing data availability of the present invention with AdaTrace, DPT and TGM;
FIG. 5 is a graph comparing privacy protection strength of AdaTrace, DPT and TGM in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the method of the present invention is implemented in three steps:
before describing the implementation, the trace data set and some key parameter settings used in the present embodiment will be described first.
The trajectory data set adopted by the embodiment is a data set Yonseii collected by Seoul university in Korea, and 9 researchers from university in Korea generate data generated by SmartDC, which is a mobile phone location service application, within two months of 2011. The embodiment uses a subset Yonsei-Seoul formed by check-in records in the Yonsei data set, wherein longitude and latitude coordinates of the check-in records are located in the Seoul city range. The Yonsei-Seoul data set contains only 9 users, but the average check-in number of each user is up to 4094.5, and the Yonsei-Seoul data set is a very dense track data set. As can be seen from fig. 2, this data set is spatially aggregated to a very high degree. Through preliminary experiments on the trajectory data set, it is observed that the improved particle swarm optimization fluctuates less when the maximum iteration number M is 2000, and the improved particle swarm optimization can be considered to be approximately converged. The present embodiment sets the M parameter to 2000.
In this embodiment, the privacy budget parameter epsilon is needed, and five different values, namely 0.1,1,3,5 and 10, are respectively taken in the experiment to observe the variation of the experiment results. In the method of the present invention, the grid partitioning step of location diversity adaptation and the updating step of grid access frequency vector based on particle swarm optimization algorithm need to consume privacy budgets, and the embodiment allocates privacy budgets epsilon for the two steps1、ε2The proportion of epsilon is 15% and 85%, respectively.
The method comprises the following steps: location diversity adaptive meshing
Inputting data: a data set Yonsei-Seoul (denoted as D) containing a plurality of pieces of trajectory data, the form of which is T { (x)1,y1,t1),…,(x|T|,y|T|,t|T|) -each position tuple is like (123.08,36.92,20201212164506), x, y denote longitude and latitude, respectively, and t denotes the time of generation of the (x, y) position; the scale division of the grid domain G, N × N, such as 8 × 8; total privacy budget parameter epsilon ═ epsilon12E.g.,. epsilon.0.1.
The treatment process comprises the following steps: finding out the maximum longitude and latitude and the minimum longitude and latitude of the geographical space where the track is located according to the D, and dividing the geographical area formed by the maximum longitude and latitude and the minimum longitude and latitude into N-N grid areas G; traversing each track in the D, and converting into a track in a grid form, such as a form
Figure BDA0002907083400000093
Based on the derived grid trajectories and the assigned privacy budget ε1And counting the position diversity result with noise of all grid tracks on each grid in the grid domain G, and further adaptively dividing each grid in the grid domain G based on the result to obtain sub-grids.
Outputting data: an original trajectory data set D; data set D consisting of grid tracesC(ii) a A mesh domain G; remaining privacy budget ε2
The present embodiment selects the division size of the trellis domain to be 8 × 8. Firstly, dividing a geographic space where Yonsei-Seoul track data to be issued is located into 64 grids to obtain a grid domain G. Then, the longitude and latitude form track is converted into a grid form track as follows: for each track TiTraversing each latitude and longitude coordinate position loc in the trajectorylDetermines which grid it falls within grid field G, and assumes that it falls within grid CkAfter traversal, get the corresponding TiGrid track T ofi CWhere i ∈ [1,9 ]],l∈[1,|Ti|],k∈[1,64]. Next, the grid division with adaptive position diversity is performed based on the density with laplacian noise calculated by the following formula, and each grid C in the grid domain G is divided intokDivided into a plurality of sub-grids.
Figure BDA0002907083400000092
Step two: updating of grid access frequency vector based on particle swarm optimization algorithm
Inputting data: an original trajectory data set D; data set D consisting of grid tracesC(ii) a A mesh domain G; remaining privacy budget ε2
The treatment process comprises the following steps: from the grid trajectory dataset DCExtracting a grid access frequency vector p of each grid track in a grid domain G, wherein the shape of the grid access frequency vector p is {0.1,0,0.3,0.3,0.25 and 0.05 }; and updating the grid access frequency vector of each grid track by using an improved particle swarm optimization algorithm to obtain an updated vector P ', so as to form a vector set P'.
Outputting data: raw trajectory dataset D and mesh trajectory dataset DC(ii) a A mesh domain G; and the updated grid access frequency vector set P' corresponds to each track one by one.
For each grid track Ti CCounting to obtain a 64-dimensional grid access frequency vector P of the track on the grid domain Gi. For the input track Ti CUsing its trellis to access the frequency vector PiAnd grid access frequency vector P of other tracksj,j∈[1,9]And j ≠ i } constructs a constrained optimization problem as follows,
Figure BDA0002907083400000103
Figure BDA0002907083400000104
wherein the first formula is an objective function and the second formula is a constraint. The sim (·) function calculates the correlation coefficient between two vectors, which is used in the present invention to quantify the correlation between two tracks, and the embodiment adopts the pearson correlation coefficient formula as the concrete implementation of the sim (·) function; piIndicating correspondence of input tracksA grid access frequency vector; pi' represents a grid access frequency vector after input track updating, and is a corresponding solution vector when a target function obtains a minimum value by using a constraint optimization algorithm; pjRepresenting the grid access frequency vectors corresponding to other tracks. The denominator of the objective function represents the correlation magnitude of the grid access frequency vector of the input track before and after updating or the reciprocal thereof, and the numerator represents the correlation magnitude of the grid access frequency vector of the input track after updating and the correlation magnitude of the grid access frequency vector corresponding to other tracks or the sum of the reciprocals thereof. The meaning of the constraint is that the sum of the 64 components of the trellis access frequency vector is equal to 1. The constraint condition is used for updating the local optimal particle and the global optimal particle in the particle swarm optimization.
To ensure higher data availability, the invention is applied to PiThe range of each dimension in' is limited. The invention provides that if the original grid corresponding to the input track at the time sequence j is CkThen the mesh corresponding to the updated timing j can only be selected from CkAnd selecting 9 grids of the surrounding 3 x 3, as shown in fig. 3, so that the distance of the disturbed grid track deviating from the original grid track can be controlled, and the data availability is ensured. In pair PiAfter traversal, the pair { C can be obtained1,C2,...,C64The maximum number of accesses possible per grid in the trellis. The maximum access times of each grid are normalized to obtain PiThe upper limit of each dimension is 0, and the lower limit of each dimension is 0.
And then solving an objective function with constraint conditions by using a particle swarm algorithm, wherein the position vector of the particle is defined as a grid access frequency vector to be updated of the input track. The specific solving process is realized by adopting the following algorithm.
Figure BDA0002907083400000111
Because no noise is added in the execution process of the original particle swarm optimization algorithm, the safety cannot be guaranteed, and therefore the method carries out customized modification on the original particle swarm optimization algorithm, namely, the function ImpavedPSO. The algorithm pseudo code for the function ImprovedPSO generalizes to algorithm 2.
Figure BDA0002907083400000112
Figure BDA0002907083400000121
In the improved particle swarm algorithm, the PerturbedPositionVector (cndot.) represents a process of adding Laplace noise to the frequency vector corresponding to the position vector of each particle by using a sparse vector technology and then normalizing the disturbed frequency vector to obtain the disturbed position vector, and the step is the difference between the improved particle swarm algorithm and the original particle swarm algorithm.
In order to ensure that the laplacian noise added at an early stage is not phagocytosed by the random variation at a later stage, the method allocates a certain proportion of privacy budget for each iteration. Consider that the smaller itr, the more random each particle position vector changes. As itr grows larger, the particle in the later iteration process gets closer to the optimal particle. Therefore, the later iterations, the more privacy budget should be allocated, so that the less laplacian noise is added, the less impact on the optimal particles. The present invention uses the reciprocal sequence of the trigonometric numbers to achieve this. Therefore, the calculation formula of the privacy budget consumed in each iteration is as follows:
Figure BDA0002907083400000131
wherein itr denotes the particle swarm algorithm executed to the first iteration, M2000 denotes the maximum number of iterations of ImprovidPSO, ε2The privacy budget allocated by the invention for the particle swarm algorithm is shown.
Because the position vector of each particle represents a grid access frequency vector, namely a disturbance object of a Laplace mechanism is a vector, the method adopts a Sparse Vector Technology (SVT) to add Laplace noise to the frequency vector corresponding to the position vector of each particle, and then normalizes the disturbed frequency vector to obtain the disturbed position vector, and the specific implementation process is shown in the following algorithm.
Figure BDA0002907083400000132
Figure BDA0002907083400000141
Step three: trajectory synthesis based on updated grid access frequency vectors
Inputting data: raw trajectory dataset D and mesh trajectory dataset DC(ii) a A mesh domain G; an updated grid access frequency vector P' corresponding one-to-one to each trace.
The treatment process comprises the following steps: in the track synthesis process, a candidate grid Set obtained by statistics according to the updated grid access frequency vector pCShaped as SetC={C1,C1,C2,C2,C2… }; set of probabilities used in the step of calculating a usability score for each candidate trellisC,houShaped as SetC,hourl{0-4 points: { C1:0.3,C20.7, points 5-8: { C1:0.45,C2:0.55},…}。
Output data (i.e. final result): and D ' of the disturbed track data set, wherein D ' corresponds to the tracks in D one by one, and the data organization form is the same, except that each track in D ' is a disturbed track of the corresponding track in D.
After solving the objective function by using the improved particle swarm optimization, obtaining a grid access frequency vector P which enables the objective function to obtain a minimum valuei', P can be obtained statisticallyi' set of grids S with a Medium Access frequency greater than 0C. In turn from the set SCIn the method, the most suitable grid is selected for each time sequence, so as to finally obtain aThe composite grid trajectory with the highest bar availability. The fitness score for each grid of the present invention is defined as follows:
Figure BDA0002907083400000151
wherein, tjRepresents a time series number, CkRepresenting the grid selected for the current timing.
Figure BDA0002907083400000152
The original grid trace representing the input trace is at time tjThe grid of time is CkThe probability of (c).
Figure BDA0002907083400000153
Representing the synthesized grid trajectories calculated using Pearson's correlation coefficients
Figure BDA0002907083400000154
With the original grid trajectory
Figure BDA0002907083400000155
By time tjThe grid of (2) accesses the similarity of the frequency vectors. Rank (·) function representation and grid CkCorresponding to
Figure BDA0002907083400000156
(or
Figure BDA0002907083400000157
At SCThe rank in the corresponding entry for all grids in (c).
The invention adopts the following iterative process to complete the synthesis of the grid track.
Figure BDA0002907083400000158
Figure BDA0002907083400000161
After obtaining the synthesized grid track TC'And then, for each grid corresponding to the time sequence, firstly selecting a sub-grid with the maximum sign-in density from the grid, and then selecting a position with the highest access frequency from the position set of the user which falls in the sub-grid and belongs to the given input track as a disturbance position of the current time sequence, thereby ensuring the data availability of the conversion process. The invention synthesizes the grid track T by the methodC'And converting the data into a track T 'in a latitude and longitude form, and taking the T' as a final release track.
The advantages of the invention are illustrated by adopting three data availability indexes of Jersen-Shannon divergence of the position access frequency vector, Kendel coefficient of the position access frequency vector and average query error and two safety indexes of capability of resisting Bayesian attack and protection effect of track correlation. For two usability indexes of Jacson-Shannon divergence and average query error of the position access frequency vector, the smaller the value of the usability index is, the higher the usability is, and the opposite is true for the Kendel coefficient index of the position access frequency vector. For the effect index for resisting the Bayesian attack, the smaller the value of the effect index is, the higher the safety is, and the opposite is true for the protection effect index of the track correlation. From the experimental effect of the embodiment on the Yonsei-Seoul trajectory data set, the method of the invention is stronger than three comparison methods, namely AdaTrace, DPT and TGM, in three data availability indexes; on two safety indexes, the method of the invention also has stronger privacy protection degree than that of a comparison method in most of time. Therefore, the method of the invention has obvious advantages in both data availability and privacy protection.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. A privacy protection method for correlation between tracks based on a particle swarm optimization algorithm is characterized by comprising the following steps:
s1, carrying out grid division on geographic spaces where all tracks to be issued are located to obtain a grid domain G, wherein the tracks comprise longitude and latitude and timestamps, mapping each track from a longitude and latitude form to a grid form, and then counting grid access frequency vectors of each grid track in the grid domain G
Figure FDA0002907083390000011
S2, solving the target by adopting an improved particle swarm optimization algorithm for each grid track to obtain an updated grid access frequency vector
Figure FDA0002907083390000012
Wherein the position vector of the particle is
Figure FDA0002907083390000013
With the aim of keeping the grid trajectories as corresponding as possible
Figure FDA0002907083390000014
And
Figure FDA0002907083390000015
while minimizing the grid footprint
Figure FDA0002907083390000016
The improved particle swarm optimization algorithm updates the position vector and the velocity vector of each particle in each iteration, adds Laplace noise with the dimension of | G | to the frequency vector corresponding to the position vector of the particle by adopting a sparse vector technology, and normalizes the disturbed frequency vector to obtain the position vector of the disturbed particle;
s3, accessing frequency vectors for each updated grid
Figure FDA0002907083390000017
Bonding of
Figure FDA0002907083390000018
And the affiliated grid track and the grid domain G are mapped into a longitude and latitude form and then are issued after the correspondingly disturbed grid track is generated.
2. The method of claim 1, wherein in step S1, each mesh in the mesh domain G is further divided into L × L sub-meshes, where L is equal to q (C)k) The direct-current voltage is in direct proportion,
Figure FDA0002907083390000019
wherein q (C)k) Representation grid CkThe location of the (c) is diverse,
Figure FDA00029070833900000110
denotes the ith grid track, ε1Representing the privacy budget, Lap (-) is a laplace density function.
3. The method of claim 1, wherein in step S2, the objective function of the improved particle swarm optimization algorithm is as follows:
Figure FDA0002907083390000021
the constraint conditions of the improved particle swarm optimization algorithm are as follows:
Figure FDA0002907083390000022
wherein the function sim (-) calculates the correlation coefficient between the two vectors; piRepresenting input trace correspondencesInquiring a frequency vector; pi' represents the updated grid access frequency vector of the input trace; pjRepresents the grid access frequency vector, | P, corresponding to other tracksi' | denotes the dimension of the grid access frequency vector after the input trace update, Pi′[m]Representing the mth dimension of the trellis access frequency vector after the input trace update.
4. The method of claim 1, wherein P isi' of each dimension Pi′[m]The value range of (a) needs to satisfy the following restrictions:
if the original grid corresponding to the jth node of the input track is CkThen the mesh corresponding to the updated jth node can only be selected from CkSelect from 9 grids of 3 x 3 around;
in pair PiAfter traversal, get the pair { C1,C2,…,C|G|The maximum number of possible accesses per grid in the tree;
normalizing the maximum access times of each grid to obtain PiThe upper limit of each dimension is 0, and the lower limit of each dimension is 0.
5. The method of claim 1, wherein in step S2, the improved particle swarm optimization algorithm assigns a privacy budget for each iteration as follows:
Figure FDA0002907083390000023
wherein itr represents the current iteration round executed by the improved particle swarm optimization algorithm, M represents the maximum iteration number of the improved particle swarm optimization algorithm, and epsilon2Representing an improved privacy budget allocated by the particle swarm optimization algorithm.
6. The method according to any one of claims 1 to 5, wherein step S3 includes:
s31, counting to obtain Pi' set of grids S with a Medium Access frequency greater than 0C
S32, sequentially collecting SCThe grid with the highest availability is selected for each node, so that a composite grid track with high availability is obtained finally
Figure FDA0002907083390000031
S33, for the grid corresponding to each node, selecting a sub-grid with the maximum position diversity from the grid, and selecting a position with the highest access frequency from a position set which is in the sub-grid and belongs to a user corresponding to a given input track as a disturbance position of the current time sequence, so that the position is converted into a track T in a longitude and latitude formi', and will Ti' as the final release track.
7. The method of claim 6, wherein in step S32, the availability of each grid is defined as follows:
Figure FDA0002907083390000032
wherein, tjDenotes the node number, CkDenoted as current node tjA selected grid;
Figure FDA0002907083390000033
the original mesh trajectory representing the input trajectory is at node tjThe grid of time is CkThe probability of (d);
Figure FDA0002907083390000034
representing synthesized mesh trajectories
Figure FDA0002907083390000035
With the original grid trajectory
Figure FDA0002907083390000036
By node tjThe similarity of the grid access frequency vectors;
Figure FDA0002907083390000037
function representation and grid CkCorresponding to
Figure FDA0002907083390000038
At SCThe rank in the corresponding entry for all the grids in the list,
Figure FDA0002907083390000039
representation and grid CkCorresponding to
Figure FDA00029070833900000310
At SCThe rank in the corresponding entry for all grids in (c).
8. A privacy protection system for correlation between tracks based on particle swarm optimization algorithm is characterized by comprising the following steps: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the privacy protection method based on inter-track correlation of particle swarm optimization according to any one of claims 1 to 7.
CN202110074474.0A 2021-01-20 2021-01-20 Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system Active CN112861171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110074474.0A CN112861171B (en) 2021-01-20 2021-01-20 Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110074474.0A CN112861171B (en) 2021-01-20 2021-01-20 Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system

Publications (2)

Publication Number Publication Date
CN112861171A true CN112861171A (en) 2021-05-28
CN112861171B CN112861171B (en) 2024-04-09

Family

ID=76007588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110074474.0A Active CN112861171B (en) 2021-01-20 2021-01-20 Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system

Country Status (1)

Country Link
CN (1) CN112861171B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201695A (en) * 2021-12-17 2022-03-18 南京邮电大学 Moving track privacy protection matching method based on hotspot grid dimension conversion

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491730A (en) * 2018-03-08 2018-09-04 湖南大学 Correlation method for secret protection between track based on lagrangian optimization
CN112199581A (en) * 2020-09-11 2021-01-08 卞美玲 Cloud computing and information security oriented cloud service management method and artificial intelligence platform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491730A (en) * 2018-03-08 2018-09-04 湖南大学 Correlation method for secret protection between track based on lagrangian optimization
CN112199581A (en) * 2020-09-11 2021-01-08 卞美玲 Cloud computing and information security oriented cloud service management method and artificial intelligence platform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡德敏;詹涵;: "可预测的差分扰动用户轨迹隐私保护方法", 小型微型计算机系统, no. 06, 14 June 2019 (2019-06-14) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201695A (en) * 2021-12-17 2022-03-18 南京邮电大学 Moving track privacy protection matching method based on hotspot grid dimension conversion

Also Published As

Publication number Publication date
CN112861171B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
Sun et al. An adaptive regeneration framework based on search space adjustment for differential evolution
Zhang et al. Correlated differential privacy: Feature selection in machine learning
CN111770454B (en) Game method for position privacy protection and platform task allocation in mobile crowd sensing
CN111125764B (en) Privacy protection-oriented user track generation method and system
Ni et al. An anonymous entropy-based location privacy protection scheme in mobile social networks
Yu et al. Zinb-based graph embedding autoencoder for single-cell rna-seq interpretations
Xiang et al. Differentially-private deep learning from an optimization perspective
Yang et al. Location privacy preservation mechanism for location-based service with incomplete location data
Fang et al. Regression analysis with differential privacy preserving
Cheng et al. OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing
Straka et al. Gaussian sum unscented Kalman filter with adaptive scaling parameters
CN112861171A (en) Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm
CN114884682B (en) Crowd sensing data stream privacy protection method based on self-adaptive local differential privacy
Yamamoto et al. eFL-Boost: Efficient federated learning for gradient boosting decision trees
CN116861239A (en) Federal learning method and system
Lu et al. A smart adversarial attack on deep hashing based image retrieval
Boenisch et al. Individualized PATE: Differentially private machine learning with individual privacy guarantees
Sun et al. Synthesizing realistic trajectory data with differential privacy
Wang et al. Towards accurate data-free quantization for diffusion models
Wang et al. Protecting the location privacy of mobile social media users
Bacanin et al. Intrusion detection by XGBoost model tuned by improved social network search algorithm
CN117407921A (en) Differential privacy histogram release method and system based on must-connect and don-connect constraints
CN117391364A (en) Multi-station cooperative electronic interference resource allocation method
CN114091100B (en) Track data collection method and system meeting local differential privacy
El-Santawy et al. Chaotic Harmony Search Optimizer for Solving Numerical Integration.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant