CN112861171A - Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm - Google Patents
Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm Download PDFInfo
- Publication number
- CN112861171A CN112861171A CN202110074474.0A CN202110074474A CN112861171A CN 112861171 A CN112861171 A CN 112861171A CN 202110074474 A CN202110074474 A CN 202110074474A CN 112861171 A CN112861171 A CN 112861171A
- Authority
- CN
- China
- Prior art keywords
- grid
- track
- vector
- particle swarm
- swarm optimization
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000002245 particle Substances 0.000 title claims abstract description 79
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000005457 optimization Methods 0.000 title claims abstract description 45
- 239000013598 vector Substances 0.000 claims abstract description 119
- 238000005516 engineering process Methods 0.000 claims abstract description 8
- 102100035971 Molybdopterin molybdenumtransferase Human genes 0.000 claims description 19
- 101710119577 Molybdopterin molybdenumtransferase Proteins 0.000 claims description 19
- 239000002131 composite material Substances 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 14
- 238000013139 quantization Methods 0.000 abstract description 3
- 230000008901 benefit Effects 0.000 description 11
- 230000000694 effects Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computer Security & Cryptography (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a privacy protection method and system for correlation between tracks based on a particle swarm optimization algorithm, and belongs to the field of computers. The invention discretizes the longitude and latitude space where the track data is located, and avoids the interference of useless information as much as possible; extracting grid access frequency vectors of grid tracks based on grid space, and taking the similarity between the two vectors as the quantization of the correlation between the tracks, wherein the quantization time complexity is O (n) magnitude; the improved particle swarm algorithm is used for solving the objective function, so that the correlation between the track to be protected and the rest tracks can be considered and reduced at the same time, and the problem that the existing track correlation privacy protection technology can only protect two tracks is solved; the improved particle swarm optimization is embodied in each iteration, after the position vector and the speed vector of each particle are updated, the Laplace noise is added to the frequency vector corresponding to the position vector by adopting a sparse vector technology, and then the frequency vector is normalized to obtain the disturbed position vector, so that the safety of the iteration process is ensured.
Description
Technical Field
The invention belongs to the field of computers, and particularly relates to a privacy protection method and system for correlation between tracks based on a particle swarm optimization algorithm.
Background
The inter-track correlation of different users can be directly applied to many application scenarios, such as targeted advertisement recommendations and epidemiological surveys, etc. Although the correlation between tracks can bring many benefits, the correlation between tracks can reason the relationship between the users to which the tracks belong with a high probability, so that the relationship between the religious beliefs, health states and the like among the users to which the tracks belong can be easily inferred, and serious privacy threat is caused.
Currently, a few researchers have proposed privacy protection methods for inter-track correlation. However, these methods limit their application to scenarios where two tracks are published offline. Such as in the well-known application of drip-drop, Uber, in car-driving, these approaches can achieve the desired goal when two fellow passengers want to hide the correlation between their movement trajectories produced by the drip-drop or Uber. In fact, only two users are unlikely to be available on real social grid applications (e.g., microblogs), and thus when these social grid applications want to obtain services such as location-based recommendations from third-party providers, these applications typically publish far more than 2 tracks to the third-party providers. The existing method is not further developed based on the scene.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a privacy protection method and a privacy protection system for correlation among tracks based on a particle swarm optimization algorithm, aiming at reducing the similarity of grid access frequency vectors with other tracks as much as possible by keeping the statistic characteristic change of the grid access frequency vectors of input tracks before and after disturbance as small as possible based on the improved particle swarm optimization algorithm, thereby protecting the privacy of correlation among the tracks and ensuring that the track data issued after disturbance has higher data availability and safety.
To achieve the above object, according to a first aspect of the present invention, there is provided a privacy protection method for inter-track correlation based on a particle swarm optimization algorithm, the method comprising the following steps:
s1, carrying out grid division on geographic spaces where all tracks to be issued are located to obtain a grid domain G, wherein the tracks comprise longitude and latitude and timestamps, mapping each track from a longitude and latitude form to a grid form, and then counting grid access frequency vectors of each grid track in the grid domain G
S2, solving the target by adopting an improved particle swarm optimization algorithm for each grid track to obtain an updated grid access frequency vectorWherein the position vector of the particle isWith the aim of keeping the grid trajectories as corresponding as possibleAndwhile minimizing the grid footprintThe improved particle swarm optimization algorithm updates the position vector and the velocity vector of each particle in each iteration, adds Laplace noise with the dimension of | G | to the frequency vector corresponding to the position vector of the particle by adopting a sparse vector technology, and normalizes the disturbed frequency vector to obtain the position vector of the disturbed particle;
s3, accessing frequency vectors for each updated gridBonding ofAnd the affiliated grid track and the grid domain G are mapped into a longitude and latitude form and then are issued after the correspondingly disturbed grid track is generated.
Has the advantages that:
(1) the invention discretizes the longitude and latitude space where the track data set is located to obtain a grid space, and avoids interference of useless information as much as possible.
(2) The grid access frequency vectors of grid tracks are extracted based on the grid space, the similarity between the two vectors is used as the quantization of the correlation between the tracks, the specific method for calculating the similarity of the vectors comprises cosine similarity, Pearson correlation coefficient and the like, the time complexity of the method for quantizing the correlation between the tracks in the mode is in the order of O (n), the calculation is more efficient, and therefore the method is more suitable for being used in the scene needing to repeatedly calculate the correlation between the tracks.
(3) Compared with the original particle swarm algorithm, the improved particle swarm algorithm is embodied in each iteration, after the position vector and the speed vector of each particle are updated, the Laplace noise is added to the frequency vector corresponding to the position vector of the particle by adopting a sparse vector technology, and then the disturbed frequency vector is normalized to obtain the position vector of the disturbed particle, so that the safety of the iteration process is ensured.
(4) The objective function solved by the improved particle swarm algorithm designed by the invention can simultaneously consider and reduce the correlation between the track to be protected and other tracks, and the problem that the existing track correlation privacy protection technology can only effectively protect two tracks is not needed.
(5) When the track is synthesized, the invention avoids the low-efficiency method of exhaustively exhausting all possible tracks, and realizes a method of balancing track synthesis efficiency and data availability of the synthesized track.
Preferably, in step S1, each mesh in the mesh domain G is further divided into L × L sub-meshes, where L and q (C)k) The direct-current voltage is in direct proportion,
wherein q (C)k) Representation grid CkThe location of the (c) is diverse,denotes the ith grid track, ε1Representing the privacy budget, Lap (-) is a laplace density function.
Has the advantages that: the invention improves the existing self-adaptive mesh division method, and only considers the number of positions falling in each mesh when the existing self-adaptive mesh division method further divides the meshes in the mesh domain G, so that each mesh is easily excessively divided into over-dense sub-meshes.
Preferably, in step S2, the objective function of the improved particle swarm optimization algorithm is as follows:
the constraint conditions of the improved particle swarm optimization algorithm are as follows:
wherein the function sim (-) calculates the correlation coefficient between the two vectors; piRepresenting a grid access frequency vector corresponding to the input track; pi' represents the updated grid access frequency vector of the input trace; pjRepresents the grid access frequency vector, | P, corresponding to other tracksi' | denotes the dimension of the grid access frequency vector after the input trace update, Pi′[m]Representing the mth dimension of the trellis access frequency vector after the input trace update.
Has the advantages that: the method optimizes an objective function through specially designed constraints, the denominator of the objective function represents the correlation size or the reciprocal of the grid access frequency vector of an input track before and after updating, and the statistical characteristic of the grid access frequency vector of a given input track (or a track to be protected) is kept as far as possible; the numerator represents the magnitude of the correlation, or the sum of the reciprocals thereof, of the updated grid access frequency vector of the input trajectory with the grid access frequency vectors corresponding to other trajectories in order to minimize the correlation between a given input trajectory and the remaining trajectories in the trajectory data set. The meaning of the constraint is that the sum of the individual components of the trellis access frequency vector equals 1. The constraint condition is used for the updating process of the local optimal particles and the global optimal particles in the particle swarm optimization. Secondly, because the processing object of the objective function is a grid access frequency vector, and the vector is extracted based on the same grid space, the number of dimensions of all grid access frequency vectors is consistent and is irrelevant to the length of each track, and therefore, the problems that the track length and longitude and latitude can be independently processed do not need to be considered. Finally, the objective function is solved through an improved particle swarm optimization algorithm, and the obtained solution enables the objective function to obtain a minimum value, which is equivalent to that a numerator obtains a minimum value while a denominator obtains a maximum value. The denominator obtains a maximum value, which means that the statistical feature of the grid access frequency vector of the given input track is better maintained, and the data availability is facilitated; the fact that the molecules obtain the minimum value means that the correlation between the given input track and the rest of the tracks is well protected, and the safety of the disturbed tracks is facilitated.
Preferably, Pi' of each dimension Pi′[m]The value range of (a) needs to satisfy the following restrictions:
if the original grid corresponding to the jth node of the input track is CkThen the mesh corresponding to the updated jth node can only be selected from CkSelect from 9 grids of 3 x 3 around;
in pair PiAfter traversal, get the pair { C1,C2,…,C|G|Each net inThe maximum number of accesses possible for a lattice;
normalizing the maximum access times of each grid to obtain PiThe upper limit of each dimension is 0, and the lower limit of each dimension is 0.
Has the advantages that: the invention controls the original grid of each node of the input track to be only disturbed to one of the surrounding 9 grids by limiting the value range of each dimension in the grid access frequency vector, thereby effectively controlling the distance of the disturbed grid track deviating from the original grid track, enhancing the indistinguishability of the grid track before and after disturbance, being beneficial to maintaining the accuracy of requesting the position-based service as much as possible and ensuring the data availability.
Preferably, in step S2, the improved particle swarm optimization algorithm assigns a privacy budget calculation formula for each iteration as follows:
wherein itr represents the current iteration round executed by the improved particle swarm optimization algorithm, M represents the maximum iteration number of the improved particle swarm optimization algorithm, and epsilon2Representing an improved privacy budget allocated by the particle swarm optimization algorithm.
Has the advantages that: according to the invention, the appropriate privacy budget is allocated to each iteration of the improved particle swarm optimization algorithm through the reciprocal sequence of the trigonometric numbers, and the privacy budget allocated to each iteration is increased along with the increase of the iteration round, so that the more the privacy budget allocated to the later iteration round is, the less Laplace noise is added, the smaller the influence on the position vector of the particle of the iteration round is, and the more the particle is beneficial to approaching the solution of the objective function.
Preferably, step S3 includes:
s31, counting to obtain Pi' set of grids S with a Medium Access frequency greater than 0C;
S32, sequentially collecting SCIn which the highest availability is selected for each nodeTo finally obtain a synthetic grid track with high usability
S33, for the grid corresponding to each node, selecting a sub-grid with the maximum position diversity from the grid, and selecting a position with the highest access frequency from a position set which is in the sub-grid and belongs to a user corresponding to a given input track as a disturbance position of the current time sequence, so that the position is converted into a track T in a longitude and latitude formi', and will Ti' as the final release track.
Has the advantages that: in the process of synthesizing the grid tracks, the invention selects a new grid with highest availability for each node, and dynamically evaluates the availability of the synthesized partial grid sequence, thereby ensuring that the finally synthesized complete grid track has higher availability, avoiding the problem of low efficiency when directly selecting one grid track from all possible synthesized grid tracks, and achieving better balance between algorithm execution efficiency and data availability. When the synthetic grid track is converted into the track in the form of longitude and latitude, the invention realizes that the converted track in the form of longitude and latitude has better usability through the position diversity of the sub-grid and the control of the access frequency of each position in the sub-grid, and can effectively keep the characteristics of the access preference (frequency) to different positions and the grid access frequency vector contained in the input track.
Preferably, in step S32, the availability of each grid is defined as follows:
wherein, tjDenotes the node number, CkDenoted as current node tjA selected grid;the original mesh trajectory representing the input trajectory isNode tjThe grid of time is CkThe probability of (d);representing synthesized mesh trajectoriesWith the original grid trajectoryBy node tjThe similarity of the grid access frequency vectors;function representation and grid CkCorresponding toAt SCThe rank in the corresponding entry for all the grids in the list,representation and grid CkCorresponding toAt SCThe rank in the corresponding entry for all grids in (c).
Has the advantages that: the availability function is designed for each grid in the candidate grid set of the current node, and comprises two parts, as described above, the former part is used for selecting a new grid which best meets the access preference of the current node in the input track under the timestamp from the candidate grid set, and the latter part is used for selecting a new grid which enables the synthesized grid sequence and the input track to have the highest similarity to the current node from the candidate grid set, so that the grid access frequency vector similarity between the grid track before and after disturbance and the current node is the highest, and the spatial preference in the time period to which the timestamp of the current node belongs is kept as much as possible.
To achieve the above object, according to a second aspect of the present invention, there is provided a privacy protection system for inter-track correlation based on particle swarm optimization, including: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is configured to read executable instructions stored in the computer-readable storage medium, and execute the inter-track correlation privacy protection method based on the particle swarm optimization algorithm according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
the method is based on the improved particle swarm optimization algorithm, and the similarity of the grid access frequency vectors of the input tracks before and after disturbance with the grid access frequency vectors of other tracks is reduced as much as possible while the statistic characteristic change of the grid access frequency vectors of the input tracks before and after disturbance is kept to be small, so that the correlation privacy among the tracks is protected, and the track data issued after disturbance has high data availability and safety. Compared with the existing comparison method, the method provided by the invention has higher data availability and stronger privacy protection degree in most of time. The present invention supports the use of multiple methods to calculate the correlation between two trajectories.
Drawings
FIG. 1 is a flowchart of a privacy protection method for inter-track correlation based on a particle swarm optimization algorithm according to the present invention;
FIG. 2 is a spatial distribution plot of a trajectory data set used in an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating calculation of an upper bound of values of each dimension of a grid access frequency vector corresponding to a track according to an embodiment of the present invention;
FIG. 4 is a graph comparing data availability of the present invention with AdaTrace, DPT and TGM;
FIG. 5 is a graph comparing privacy protection strength of AdaTrace, DPT and TGM in accordance with the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in fig. 1, the method of the present invention is implemented in three steps:
before describing the implementation, the trace data set and some key parameter settings used in the present embodiment will be described first.
The trajectory data set adopted by the embodiment is a data set Yonseii collected by Seoul university in Korea, and 9 researchers from university in Korea generate data generated by SmartDC, which is a mobile phone location service application, within two months of 2011. The embodiment uses a subset Yonsei-Seoul formed by check-in records in the Yonsei data set, wherein longitude and latitude coordinates of the check-in records are located in the Seoul city range. The Yonsei-Seoul data set contains only 9 users, but the average check-in number of each user is up to 4094.5, and the Yonsei-Seoul data set is a very dense track data set. As can be seen from fig. 2, this data set is spatially aggregated to a very high degree. Through preliminary experiments on the trajectory data set, it is observed that the improved particle swarm optimization fluctuates less when the maximum iteration number M is 2000, and the improved particle swarm optimization can be considered to be approximately converged. The present embodiment sets the M parameter to 2000.
In this embodiment, the privacy budget parameter epsilon is needed, and five different values, namely 0.1,1,3,5 and 10, are respectively taken in the experiment to observe the variation of the experiment results. In the method of the present invention, the grid partitioning step of location diversity adaptation and the updating step of grid access frequency vector based on particle swarm optimization algorithm need to consume privacy budgets, and the embodiment allocates privacy budgets epsilon for the two steps1、ε2The proportion of epsilon is 15% and 85%, respectively.
The method comprises the following steps: location diversity adaptive meshing
Inputting data: a data set Yonsei-Seoul (denoted as D) containing a plurality of pieces of trajectory data, the form of which is T { (x)1,y1,t1),…,(x|T|,y|T|,t|T|) -each position tuple is like (123.08,36.92,20201212164506), x, y denote longitude and latitude, respectively, and t denotes the time of generation of the (x, y) position; the scale division of the grid domain G, N × N, such as 8 × 8; total privacy budget parameter epsilon ═ epsilon1+ε2E.g.,. epsilon.0.1.
The treatment process comprises the following steps: finding out the maximum longitude and latitude and the minimum longitude and latitude of the geographical space where the track is located according to the D, and dividing the geographical area formed by the maximum longitude and latitude and the minimum longitude and latitude into N-N grid areas G; traversing each track in the D, and converting into a track in a grid form, such as a formBased on the derived grid trajectories and the assigned privacy budget ε1And counting the position diversity result with noise of all grid tracks on each grid in the grid domain G, and further adaptively dividing each grid in the grid domain G based on the result to obtain sub-grids.
Outputting data: an original trajectory data set D; data set D consisting of grid tracesC(ii) a A mesh domain G; remaining privacy budget ε2。
The present embodiment selects the division size of the trellis domain to be 8 × 8. Firstly, dividing a geographic space where Yonsei-Seoul track data to be issued is located into 64 grids to obtain a grid domain G. Then, the longitude and latitude form track is converted into a grid form track as follows: for each track TiTraversing each latitude and longitude coordinate position loc in the trajectorylDetermines which grid it falls within grid field G, and assumes that it falls within grid CkAfter traversal, get the corresponding TiGrid track T ofi CWhere i ∈ [1,9 ]],l∈[1,|Ti|],k∈[1,64]. Next, the grid division with adaptive position diversity is performed based on the density with laplacian noise calculated by the following formula, and each grid C in the grid domain G is divided intokDivided into a plurality of sub-grids.
Step two: updating of grid access frequency vector based on particle swarm optimization algorithm
Inputting data: an original trajectory data set D; data set D consisting of grid tracesC(ii) a A mesh domain G; remaining privacy budget ε2。
The treatment process comprises the following steps: from the grid trajectory dataset DCExtracting a grid access frequency vector p of each grid track in a grid domain G, wherein the shape of the grid access frequency vector p is {0.1,0,0.3,0.3,0.25 and 0.05 }; and updating the grid access frequency vector of each grid track by using an improved particle swarm optimization algorithm to obtain an updated vector P ', so as to form a vector set P'.
Outputting data: raw trajectory dataset D and mesh trajectory dataset DC(ii) a A mesh domain G; and the updated grid access frequency vector set P' corresponds to each track one by one.
For each grid track Ti CCounting to obtain a 64-dimensional grid access frequency vector P of the track on the grid domain Gi. For the input track Ti CUsing its trellis to access the frequency vector PiAnd grid access frequency vector P of other tracksj,j∈[1,9]And j ≠ i } constructs a constrained optimization problem as follows,
wherein the first formula is an objective function and the second formula is a constraint. The sim (·) function calculates the correlation coefficient between two vectors, which is used in the present invention to quantify the correlation between two tracks, and the embodiment adopts the pearson correlation coefficient formula as the concrete implementation of the sim (·) function; piIndicating correspondence of input tracksA grid access frequency vector; pi' represents a grid access frequency vector after input track updating, and is a corresponding solution vector when a target function obtains a minimum value by using a constraint optimization algorithm; pjRepresenting the grid access frequency vectors corresponding to other tracks. The denominator of the objective function represents the correlation magnitude of the grid access frequency vector of the input track before and after updating or the reciprocal thereof, and the numerator represents the correlation magnitude of the grid access frequency vector of the input track after updating and the correlation magnitude of the grid access frequency vector corresponding to other tracks or the sum of the reciprocals thereof. The meaning of the constraint is that the sum of the 64 components of the trellis access frequency vector is equal to 1. The constraint condition is used for updating the local optimal particle and the global optimal particle in the particle swarm optimization.
To ensure higher data availability, the invention is applied to PiThe range of each dimension in' is limited. The invention provides that if the original grid corresponding to the input track at the time sequence j is CkThen the mesh corresponding to the updated timing j can only be selected from CkAnd selecting 9 grids of the surrounding 3 x 3, as shown in fig. 3, so that the distance of the disturbed grid track deviating from the original grid track can be controlled, and the data availability is ensured. In pair PiAfter traversal, the pair { C can be obtained1,C2,...,C64The maximum number of accesses possible per grid in the trellis. The maximum access times of each grid are normalized to obtain PiThe upper limit of each dimension is 0, and the lower limit of each dimension is 0.
And then solving an objective function with constraint conditions by using a particle swarm algorithm, wherein the position vector of the particle is defined as a grid access frequency vector to be updated of the input track. The specific solving process is realized by adopting the following algorithm.
Because no noise is added in the execution process of the original particle swarm optimization algorithm, the safety cannot be guaranteed, and therefore the method carries out customized modification on the original particle swarm optimization algorithm, namely, the function ImpavedPSO. The algorithm pseudo code for the function ImprovedPSO generalizes to algorithm 2.
In the improved particle swarm algorithm, the PerturbedPositionVector (cndot.) represents a process of adding Laplace noise to the frequency vector corresponding to the position vector of each particle by using a sparse vector technology and then normalizing the disturbed frequency vector to obtain the disturbed position vector, and the step is the difference between the improved particle swarm algorithm and the original particle swarm algorithm.
In order to ensure that the laplacian noise added at an early stage is not phagocytosed by the random variation at a later stage, the method allocates a certain proportion of privacy budget for each iteration. Consider that the smaller itr, the more random each particle position vector changes. As itr grows larger, the particle in the later iteration process gets closer to the optimal particle. Therefore, the later iterations, the more privacy budget should be allocated, so that the less laplacian noise is added, the less impact on the optimal particles. The present invention uses the reciprocal sequence of the trigonometric numbers to achieve this. Therefore, the calculation formula of the privacy budget consumed in each iteration is as follows:
wherein itr denotes the particle swarm algorithm executed to the first iteration, M2000 denotes the maximum number of iterations of ImprovidPSO, ε2The privacy budget allocated by the invention for the particle swarm algorithm is shown.
Because the position vector of each particle represents a grid access frequency vector, namely a disturbance object of a Laplace mechanism is a vector, the method adopts a Sparse Vector Technology (SVT) to add Laplace noise to the frequency vector corresponding to the position vector of each particle, and then normalizes the disturbed frequency vector to obtain the disturbed position vector, and the specific implementation process is shown in the following algorithm.
Step three: trajectory synthesis based on updated grid access frequency vectors
Inputting data: raw trajectory dataset D and mesh trajectory dataset DC(ii) a A mesh domain G; an updated grid access frequency vector P' corresponding one-to-one to each trace.
The treatment process comprises the following steps: in the track synthesis process, a candidate grid Set obtained by statistics according to the updated grid access frequency vector pCShaped as SetC={C1,C1,C2,C2,C2… }; set of probabilities used in the step of calculating a usability score for each candidate trellisC,houShaped as SetC,hourl{0-4 points: { C1:0.3,C20.7, points 5-8: { C1:0.45,C2:0.55},…}。
Output data (i.e. final result): and D ' of the disturbed track data set, wherein D ' corresponds to the tracks in D one by one, and the data organization form is the same, except that each track in D ' is a disturbed track of the corresponding track in D.
After solving the objective function by using the improved particle swarm optimization, obtaining a grid access frequency vector P which enables the objective function to obtain a minimum valuei', P can be obtained statisticallyi' set of grids S with a Medium Access frequency greater than 0C. In turn from the set SCIn the method, the most suitable grid is selected for each time sequence, so as to finally obtain aThe composite grid trajectory with the highest bar availability. The fitness score for each grid of the present invention is defined as follows:
wherein, tjRepresents a time series number, CkRepresenting the grid selected for the current timing.The original grid trace representing the input trace is at time tjThe grid of time is CkThe probability of (c).Representing the synthesized grid trajectories calculated using Pearson's correlation coefficientsWith the original grid trajectoryBy time tjThe grid of (2) accesses the similarity of the frequency vectors. Rank (·) function representation and grid CkCorresponding to(orAt SCThe rank in the corresponding entry for all grids in (c).
The invention adopts the following iterative process to complete the synthesis of the grid track.
After obtaining the synthesized grid track TC'And then, for each grid corresponding to the time sequence, firstly selecting a sub-grid with the maximum sign-in density from the grid, and then selecting a position with the highest access frequency from the position set of the user which falls in the sub-grid and belongs to the given input track as a disturbance position of the current time sequence, thereby ensuring the data availability of the conversion process. The invention synthesizes the grid track T by the methodC'And converting the data into a track T 'in a latitude and longitude form, and taking the T' as a final release track.
The advantages of the invention are illustrated by adopting three data availability indexes of Jersen-Shannon divergence of the position access frequency vector, Kendel coefficient of the position access frequency vector and average query error and two safety indexes of capability of resisting Bayesian attack and protection effect of track correlation. For two usability indexes of Jacson-Shannon divergence and average query error of the position access frequency vector, the smaller the value of the usability index is, the higher the usability is, and the opposite is true for the Kendel coefficient index of the position access frequency vector. For the effect index for resisting the Bayesian attack, the smaller the value of the effect index is, the higher the safety is, and the opposite is true for the protection effect index of the track correlation. From the experimental effect of the embodiment on the Yonsei-Seoul trajectory data set, the method of the invention is stronger than three comparison methods, namely AdaTrace, DPT and TGM, in three data availability indexes; on two safety indexes, the method of the invention also has stronger privacy protection degree than that of a comparison method in most of time. Therefore, the method of the invention has obvious advantages in both data availability and privacy protection.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (8)
1. A privacy protection method for correlation between tracks based on a particle swarm optimization algorithm is characterized by comprising the following steps:
s1, carrying out grid division on geographic spaces where all tracks to be issued are located to obtain a grid domain G, wherein the tracks comprise longitude and latitude and timestamps, mapping each track from a longitude and latitude form to a grid form, and then counting grid access frequency vectors of each grid track in the grid domain G
S2, solving the target by adopting an improved particle swarm optimization algorithm for each grid track to obtain an updated grid access frequency vectorWherein the position vector of the particle isWith the aim of keeping the grid trajectories as corresponding as possibleAndwhile minimizing the grid footprintThe improved particle swarm optimization algorithm updates the position vector and the velocity vector of each particle in each iteration, adds Laplace noise with the dimension of | G | to the frequency vector corresponding to the position vector of the particle by adopting a sparse vector technology, and normalizes the disturbed frequency vector to obtain the position vector of the disturbed particle;
2. The method of claim 1, wherein in step S1, each mesh in the mesh domain G is further divided into L × L sub-meshes, where L is equal to q (C)k) The direct-current voltage is in direct proportion,
3. The method of claim 1, wherein in step S2, the objective function of the improved particle swarm optimization algorithm is as follows:
the constraint conditions of the improved particle swarm optimization algorithm are as follows:
wherein the function sim (-) calculates the correlation coefficient between the two vectors; piRepresenting input trace correspondencesInquiring a frequency vector; pi' represents the updated grid access frequency vector of the input trace; pjRepresents the grid access frequency vector, | P, corresponding to other tracksi' | denotes the dimension of the grid access frequency vector after the input trace update, Pi′[m]Representing the mth dimension of the trellis access frequency vector after the input trace update.
4. The method of claim 1, wherein P isi' of each dimension Pi′[m]The value range of (a) needs to satisfy the following restrictions:
if the original grid corresponding to the jth node of the input track is CkThen the mesh corresponding to the updated jth node can only be selected from CkSelect from 9 grids of 3 x 3 around;
in pair PiAfter traversal, get the pair { C1,C2,…,C|G|The maximum number of possible accesses per grid in the tree;
normalizing the maximum access times of each grid to obtain PiThe upper limit of each dimension is 0, and the lower limit of each dimension is 0.
5. The method of claim 1, wherein in step S2, the improved particle swarm optimization algorithm assigns a privacy budget for each iteration as follows:
wherein itr represents the current iteration round executed by the improved particle swarm optimization algorithm, M represents the maximum iteration number of the improved particle swarm optimization algorithm, and epsilon2Representing an improved privacy budget allocated by the particle swarm optimization algorithm.
6. The method according to any one of claims 1 to 5, wherein step S3 includes:
s31, counting to obtain Pi' set of grids S with a Medium Access frequency greater than 0C;
S32, sequentially collecting SCThe grid with the highest availability is selected for each node, so that a composite grid track with high availability is obtained finally
S33, for the grid corresponding to each node, selecting a sub-grid with the maximum position diversity from the grid, and selecting a position with the highest access frequency from a position set which is in the sub-grid and belongs to a user corresponding to a given input track as a disturbance position of the current time sequence, so that the position is converted into a track T in a longitude and latitude formi', and will Ti' as the final release track.
7. The method of claim 6, wherein in step S32, the availability of each grid is defined as follows:
wherein, tjDenotes the node number, CkDenoted as current node tjA selected grid;the original mesh trajectory representing the input trajectory is at node tjThe grid of time is CkThe probability of (d);representing synthesized mesh trajectoriesWith the original grid trajectoryBy node tjThe similarity of the grid access frequency vectors;function representation and grid CkCorresponding toAt SCThe rank in the corresponding entry for all the grids in the list,representation and grid CkCorresponding toAt SCThe rank in the corresponding entry for all grids in (c).
8. A privacy protection system for correlation between tracks based on particle swarm optimization algorithm is characterized by comprising the following steps: a computer-readable storage medium and a processor;
the computer-readable storage medium is used for storing executable instructions;
the processor is used for reading executable instructions stored in the computer-readable storage medium and executing the privacy protection method based on inter-track correlation of particle swarm optimization according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110074474.0A CN112861171B (en) | 2021-01-20 | 2021-01-20 | Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110074474.0A CN112861171B (en) | 2021-01-20 | 2021-01-20 | Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112861171A true CN112861171A (en) | 2021-05-28 |
CN112861171B CN112861171B (en) | 2024-04-09 |
Family
ID=76007588
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110074474.0A Active CN112861171B (en) | 2021-01-20 | 2021-01-20 | Particle swarm optimization algorithm-based inter-track correlation privacy protection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112861171B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201695A (en) * | 2021-12-17 | 2022-03-18 | 南京邮电大学 | Moving track privacy protection matching method based on hotspot grid dimension conversion |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491730A (en) * | 2018-03-08 | 2018-09-04 | 湖南大学 | Correlation method for secret protection between track based on lagrangian optimization |
CN112199581A (en) * | 2020-09-11 | 2021-01-08 | 卞美玲 | Cloud computing and information security oriented cloud service management method and artificial intelligence platform |
-
2021
- 2021-01-20 CN CN202110074474.0A patent/CN112861171B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491730A (en) * | 2018-03-08 | 2018-09-04 | 湖南大学 | Correlation method for secret protection between track based on lagrangian optimization |
CN112199581A (en) * | 2020-09-11 | 2021-01-08 | 卞美玲 | Cloud computing and information security oriented cloud service management method and artificial intelligence platform |
Non-Patent Citations (1)
Title |
---|
胡德敏;詹涵;: "可预测的差分扰动用户轨迹隐私保护方法", 小型微型计算机系统, no. 06, 14 June 2019 (2019-06-14) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114201695A (en) * | 2021-12-17 | 2022-03-18 | 南京邮电大学 | Moving track privacy protection matching method based on hotspot grid dimension conversion |
Also Published As
Publication number | Publication date |
---|---|
CN112861171B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sun et al. | An adaptive regeneration framework based on search space adjustment for differential evolution | |
Zhang et al. | Correlated differential privacy: Feature selection in machine learning | |
CN111770454B (en) | Game method for position privacy protection and platform task allocation in mobile crowd sensing | |
CN111125764B (en) | Privacy protection-oriented user track generation method and system | |
Ni et al. | An anonymous entropy-based location privacy protection scheme in mobile social networks | |
Yu et al. | Zinb-based graph embedding autoencoder for single-cell rna-seq interpretations | |
Xiang et al. | Differentially-private deep learning from an optimization perspective | |
Yang et al. | Location privacy preservation mechanism for location-based service with incomplete location data | |
Fang et al. | Regression analysis with differential privacy preserving | |
Cheng et al. | OPTDP: Towards optimal personalized trajectory differential privacy for trajectory data publishing | |
Straka et al. | Gaussian sum unscented Kalman filter with adaptive scaling parameters | |
CN112861171A (en) | Inter-track correlation privacy protection method and system based on particle swarm optimization algorithm | |
CN114884682B (en) | Crowd sensing data stream privacy protection method based on self-adaptive local differential privacy | |
Yamamoto et al. | eFL-Boost: Efficient federated learning for gradient boosting decision trees | |
CN116861239A (en) | Federal learning method and system | |
Lu et al. | A smart adversarial attack on deep hashing based image retrieval | |
Boenisch et al. | Individualized PATE: Differentially private machine learning with individual privacy guarantees | |
Sun et al. | Synthesizing realistic trajectory data with differential privacy | |
Wang et al. | Towards accurate data-free quantization for diffusion models | |
Wang et al. | Protecting the location privacy of mobile social media users | |
Bacanin et al. | Intrusion detection by XGBoost model tuned by improved social network search algorithm | |
CN117407921A (en) | Differential privacy histogram release method and system based on must-connect and don-connect constraints | |
CN117391364A (en) | Multi-station cooperative electronic interference resource allocation method | |
CN114091100B (en) | Track data collection method and system meeting local differential privacy | |
El-Santawy et al. | Chaotic Harmony Search Optimizer for Solving Numerical Integration. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |