CN112182645A - Quantifiable privacy protection method, equipment and medium for destination prediction - Google Patents

Quantifiable privacy protection method, equipment and medium for destination prediction Download PDF

Info

Publication number
CN112182645A
CN112182645A CN202010967393.9A CN202010967393A CN112182645A CN 112182645 A CN112182645 A CN 112182645A CN 202010967393 A CN202010967393 A CN 202010967393A CN 112182645 A CN112182645 A CN 112182645A
Authority
CN
China
Prior art keywords
privacy
track
protection model
noise
protection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010967393.9A
Other languages
Chinese (zh)
Other versions
CN112182645B (en
Inventor
蒋洪波
王孟源
肖竹
刘代波
曾凡仔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN202010967393.9A priority Critical patent/CN112182645B/en
Publication of CN112182645A publication Critical patent/CN112182645A/en
Application granted granted Critical
Publication of CN112182645B publication Critical patent/CN112182645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method, equipment and a medium for quantifiable privacy protection aiming at destination prediction, wherein the method comprises the following steps: injecting noises with different radiuses into a plurality of groups of same historical track data sets by using a Laplace mechanism to obtain corresponding track data sets; performing destination prediction on each group of track data sets, calculating corresponding privacy protection degrees according to prediction results to further construct training samples, and taking the noise radius grade as a label value of the training samples; training a privacy quantification protection model based on multiple linear regression by using all training samples and label values thereof; when receiving a privacy protection degree requirement and a track to be protected, inputting the privacy protection degree requirement to a privacy quantitative protection model; and injecting noise into the track to be protected according to the noise radius level output by the privacy quantization protection model by utilizing a Laplace mechanism. The method and the device can fully meet the privacy requirements of the user and provide accurate and stable track privacy protection for the user.

Description

Quantifiable privacy protection method, equipment and medium for destination prediction
Technical Field
The invention belongs to the field of privacy protection and information security, and provides a quantifiable privacy protection method, equipment and a medium for destination prediction aiming at the position privacy threat of high-accuracy destination prediction service.
Background
With the popularization of mobile equipment with a GPS and the continuous development of the technology of the Internet of things, more and more services based on position location facilitate the aspects of our lives. Such as business activities, personal health management, etc. In recent years, a technique for predicting a user destination has been developed. The technology predicts the position which the mobile user wants to reach according to the partial track of the mobile device user and a prediction model established by a historical track database of a city. This technology has significantly pushed the development of location-based services, but also poses a significant privacy threat.
Applications using this technology are now emerging in many scenarios. For example, in automatic driving, destination prediction automatically generates a place where a passenger will go for the convenience of the passenger; in some social software, the technology predicts a place where a user wants to go according to a track traveled by the user so as to push an advertisement related to a user destination; at the drip research institute, drip companies utilize the technology to improve the riding experience of users and the order taking efficiency of drip drivers. The latest relevant research results of the drip research institute show that the prediction technology of the drip company can achieve extremely high prediction accuracy, and the average distance error between the prediction result and a real destination is within 1 km. The latest article shows that under the conditions of sufficient data volume and complete user travel track, the probability of successfully predicting the destination position of the user can reach 80%, and meanwhile, the average distance error can be limited within 500 meters.
High accuracy and precision bring better location-based application services to mobile end users, but at the same time pose a great threat to user privacy and personal safety. The concrete expression is as follows: (1) in past research, research on the trajectories of mobile devices has shown that destination locations are often associated with sensitive locations of users, such as the user's home, the user's place of work, a bar frequented by the user, etc., and thus exposure of sensitive locations is a serious violation of the user's personal privacy; (2) an attacker hostile to the user exists, if the accurate destination position of the user is predicted through the position positioning server, the attacker can go to the destination in advance to wait for the user and then attack the user at random, and the attacker is a great personal safety threat to the user; (3) nobody can guarantee that a server providing location-based services is absolutely authentic, and when the server grasps the trajectory of a user and predicts the accurate destination of the user, the destination of the user is at risk of leakage.
In the past, few methods for protecting the privacy of a user destination are used, and a common method is an end point removal method, namely, at a user mobile terminal, the head end and the tail end of a user track are removed, then the removed incomplete track is sent to a service provider, and the destination predicted by the service provider is not too accurate. However, this approach is too crude, provides extremely unstable privacy protection, and there are special cases (e.g., there are stops at the beginning and end) that defeat the protection method.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, an apparatus and a medium for quantifiable privacy protection for destination prediction in location services, which can fully meet the privacy requirements of users and provide users with accurate and stable track privacy protection.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
a quantifiable privacy preserving method for destination prediction, comprising:
step 1, acquiring training data;
obtaining a set of historical track data sets DtrainM groups of same historical track data sets are obtained through copying;
respectively injecting the radius r into the M groups of historical track data sets by using a Laplace mechanismi,i∈[0,1,...,M-1]To obtain a trajectory data set with different noise added:
Figure BDA0002682839910000021
ri=r0+i×rbase,i∈[0,1,...,M-1],
in the formula (I), the compound is shown in the specification,
Figure BDA0002682839910000022
representing the ith group of track data sets, wherein the injected noise radius level is i;
Figure BDA0002682839910000023
represents the jth track, L, in the ith set of historical track data setsDRepresenting the size of the historical track data set; r isiRepresenting the radius of noise, r, injected into the ith set of historical track data0To use the radius randomly generated by the Laplace mechanism, rbaseIs a preset noise base radius;
Figure BDA0002682839910000024
show to the track
Figure BDA0002682839910000025
Random angle of implantation and radius riThe noise of (2);
for L in each set of trace dataDPredicting the destination of each track, counting and calculating the privacy protection degree of each group of track data sets according to the prediction result, and then generating a plurality of derivative values according to the obtained privacy protection degree and a preset function; the privacy protection degree and the derivative value thereof correspondingly obtained by each group of track data sets form a group of training samples, and the noise radius grade i corresponding to each track data set is used as the label value of each training sample;
step 2, training a protection model;
training a privacy quantification protection model based on multiple linear regression by taking M groups of training samples as input and taking corresponding label values as output;
step 3, processing track noise;
when receiving a privacy protection degree requirement and a track to be protected, inputting the privacy protection degree requirement to a privacy quantization protection model, and outputting to obtain a noise radius grade to be injected into the track; the trajectory to be protected is compared with a historical trajectory data set DtrainThe tracks in (1) belong to the same area;
and injecting noise into the track to be protected according to the noise radius level output by the privacy quantization protection model by using a Laplace mechanism, and finally sending the track subjected to noise injection to a position server.
Further, the privacy quantification protection model based on multiple linear regression can be represented by substituting M sets of training samples and their corresponding label values into the following linear relationship sets:
Figure BDA0002682839910000031
in the formula, λ0,λ1,λ2,...,λqTo quantify the parameters of the protection model for privacy0,∈2,...,∈M-1To protect the bias value of the model, xi,1,xi,2,...,xi,qAre the ith set of training samples, yiFor the label value of the ith set of training samples, i ∈ [0,1];
Training a multiple linear regression-based protection model, i.e. solving the parameter lambda corresponding to the minimum of the following optimization function0,λ1,λ2,...,λqThe optimization function is:
Figure BDA0002682839910000032
further, after a privacy quantization protection model is obtained through training by using a training sample, pruning simplification processing is further carried out on the obtained privacy quantization protection model; and 3, acquiring the Laplace noise radius grade to be injected into the track by using the simplified privacy quantization protection model.
Further, the pruning simplification treatment process comprises the following steps:
step A1, training the privacy quantification protection model obtained by using the training sample, and determining the parameter lambda of the privacy quantification protection model0,λ1,λ2,...,λqStored in a parameter sequence lambda*And a sequence of buffer parameters lambdatIts optimization function value is expressed as SSE*(ii) a Storing M groups of training samples in variable array A*And cache variable array At
Step A2, for parameter sequence λ*The privacy quantization protection model is formed, and each variable is calculated based on a variable array A*Correlation coefficient R with labelkAll correlation coefficients R to be obtainedkIs stored in a list R*(ii) a Wherein the correlation coefficient RkThe calculation formula of (2) is as follows:
Figure BDA0002682839910000033
in the formula, xi,kIs a variable array A*K variable value, y of the ith training sample setiFor the label value corresponding to the ith set of training samples,
Figure BDA0002682839910000034
is a variable array A*The k-th variable mean of the middle M training samples,
Figure BDA0002682839910000035
the average value of M label values corresponding to M groups of training samples is obtained;
step A3, from List R*Selecting the minimum correlation coefficient, setting the subscript as s, and combining the minimum correlation coefficient RsAnd comparing with a preset correlation coefficient threshold value:
if the minimum correlation coefficient RsIf the correlation coefficient is larger than a preset correlation coefficient threshold value, the parameter sequence lambda is determined to be*The formed privacy quantification protection model is used as a final simplified privacy quantification protection model;
if the minimum correlation coefficient RsIf the correlation coefficient is less than or equal to the preset correlation coefficient threshold value, the parameter sequence lambda is cachedtParameter lambda with middle subscript ssSet to 0, buffer variable array AtVariable value slave cache variable array A with middle subscript stGet out of the buffer, and then buffer the variable array AtAnd sample label is substituted into a buffer parameter sequence lambdatThe formed privacy quantitative protection model calculates the optimization function value and stores the optimization function value in the SSEtComparison of SSEtAnd SSE*
If SSEt>SSE*Then the parameter sequence λ is maintained*Without change, will be represented by a parameter sequence λ*The formed privacy quantification protection model is used as a final simplified privacy quantification protection model;
if SSEt≤SSE*Then the parameter sequence is lambda*Updating to and caching parameter sequence lambdatSame, variable array A*Update to and cache variable array AtSame, SSE*Is replaced with SSEtAnd returning to the step A2.
Further, a specific method for calculating the privacy protection degree of each group of track data sets according to the prediction result comprises the following steps:
let i group trajectory data set
Figure BDA0002682839910000041
Middle LDL corresponding to the strip trackDThe prediction accuracy of the G partitions distributed in the track is P from large to smalli,1,Pi,2,...,Pi,GTogether forming an ith set of trajectory data sets
Figure BDA0002682839910000042
To a sequence of accuracies
Figure BDA0002682839910000043
Then according to the accuracy sequence
Figure BDA0002682839910000044
Calculating the privacy protection degree WAE of each group of track data setsi
Figure BDA0002682839910000045
Wherein, P'gFor historical track data set DtrainPrediction accuracy at the g-th partition.
Further, the preset function used for generating the derivative value according to the privacy protection degree includes any one or more of the following: y is x2,y=x3,y=x-1,y=x-2
Figure BDA0002682839910000046
y is log x, y is ln x; wherein x is the privacy protection degree and y is the derived value.
Further, a predetermined noise base radius rbase=500m。
The invention also provides an apparatus comprising a processor and a memory; wherein: the memory is to store computer instructions; the processor is configured to execute the computer instructions stored in the memory, and in particular, to perform any of the methods described above.
The present invention also provides a computer storage medium storing a program for implementing any of the above methods when executed.
Advantageous effects
Before a user track is sent to a position server from a mobile terminal, a privacy quantitative protection model established by multiple linear regression is utilized, and the corresponding level of noise adding processing can be obtained according to the privacy protection degree requirement of the user on the track; then, by utilizing a Laplace privacy mechanism in the differential privacy and according to the noise radius grade obtained by the privacy quantitative protection model, quantitative noise adding processing is carried out on the track to be protected of the user, the privacy requirement of the user can be fully met, and accurate and stable track privacy protection can be provided for the user. In addition, the invention adopts a pruning simplified processing method to remove redundant parameters in the privacy quantitative protection model, thereby reducing the calculated amount of the privacy quantitative protection model and improving the processing speed of the privacy protection method.
Drawings
FIG. 1 is a technical circuit diagram of a method according to an embodiment of the invention;
FIG. 2 is a MAE of the accuracy of privacy protection provided under different privacy requirements based on a Bohr diagram and Beijing taxi track data sets in an experiment according to an embodiment of the present invention;
FIG. 3 is a graph of stability RMSE of privacy protection provided under different privacy requirements based on Bohr diagram and Beijing taxi track data sets in an experiment according to an embodiment of the present invention;
FIG. 4 shows the successful protection ratio at different user satisfaction thresholds in an experiment according to an embodiment of the present invention;
FIG. 5 is a graph of the effect of differential privacy parameters on RMSE providing privacy protection based on a Bohr-chart taxi track dataset, obtained experimentally in accordance with an embodiment of the present invention;
fig. 6 shows the influence of differential privacy parameters on RMSE for privacy protection provided based on a beijing taxi track dataset, obtained through experiments according to an embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail, which are developed based on the technical solutions of the present invention, and give detailed implementation manners and specific operation procedures to further explain the technical solutions of the present invention.
The embodiment of the invention provides a quantifiable privacy protection method for destination prediction, which comprises the following steps:
step 1, acquiring training data;
(1) obtaining a set of historical track data sets DtrainM groups of same historical track data sets are obtained through copying;
(2) respectively injecting the radius r into the M groups of historical track data sets by using a Laplace mechanismi,i∈[0,1,...,M-1]To obtain a trajectory data set with different noise added:
Figure BDA0002682839910000051
ri=r0+i×rbase,i∈[0,1,...,M-1],
in the formula (I), the compound is shown in the specification,
Figure BDA0002682839910000052
representing the ith group of track data sets, wherein the injected noise radius level is i;
Figure BDA0002682839910000053
represents the jth track, L, in the ith set of historical track data setsDRepresenting the size of the historical track data set; r isiRepresenting the radius of noise, r, injected into the ith set of historical track data0To use the radius randomly generated by the Laplace mechanism, rbaseIn this embodiment, r is taken for a predetermined noise base radiusbase=500m;
Figure BDA0002682839910000061
Show to the track
Figure BDA0002682839910000062
Random angle of implantation and radius riThe noise of (2);
historical track data set DtrainEach track in (a) may be denoted as T ═ l1,l2,...,lcWherein l isiIn this embodiment, according to a laplacian mechanism in differential privacy, noise is injected into each position point in the track to obtain a false position point deviating from an original position point, so that all false position points form a false track containing noise, and the false track is sent to the server end instead of the original real track, thereby achieving the purpose of privacy protection on the real track.
Wherein the radius r is randomly generated by using a Laplace mechanism0The generation method comprises the following steps:
according to the laplacian mechanism, the probability service density formula of the false location points:
Figure BDA0002682839910000063
wherein r is represented as a false position point and a real position point liAnd theta represents the degree of an included angle between a connecting line of the false position point and the real position point and a horizontal coordinate axis in the rectangular coordinate system. And r and theta are random variables independent of each other, so by integration we obtain the set of probability density equations for r and theta:
Figure BDA0002682839910000064
Figure BDA0002682839910000065
from the formula, the angles corresponding to the false positions can be randomly generated through uniform distribution.
The cumulative distribution function due to the distance r is:
Figure BDA0002682839910000066
thus, a final laplace distance r generating function can be obtained:
Figure BDA0002682839910000067
wherein, the epsilon represents the privacy sensitivity of the Laplace distance r, and under the same condition, the smaller the epsilon, the better the privacy protection effect; w-1Is a Lambert W function method, p is the random probability of laplace brazilian fabry, and e is a natural constant.
From the above formula
Figure BDA0002682839910000068
The obtained laplace distance r is the radius r randomly generated by using the laplace mechanism in this embodiment0
(3) For L in each set of trace dataDThe method comprises the following steps of predicting destinations of tracks, and calculating the privacy protection degree of each group of track data sets according to the prediction result, and comprises the following specific steps:
a1, setting the ith group track data set
Figure BDA0002682839910000069
Middle LDL corresponding to the strip trackDThe prediction accuracy of the G partitions distributed in the track is P from large to smalli,1,Pi,2,...,Pi,GTogether forming an ith set of trajectory data sets
Figure BDA00026828399100000610
To a sequence of accuracies
Figure BDA00026828399100000611
Let the map on which the trajectories in the M sets of trajectory data sets are distributed be divided into G cells. Thus, for L in each set of trace data setsDThe strip track is predicted, and L can be obtainedDAnd (6) predicting the result. Wherein, each prediction result has G prediction probabilities, corresponding to G partitions. Ranking of a possible destination can be obtained according to the arrangement of the prediction probabilities from large to small. Thus, this rank has G ranks. According to the prediction probability, if the partition position corresponding to one bit in the ranking is the real destination of the user, the position prediction is accurate once. Thus passing through LDPredicting the track, and counting the times of successful prediction of each rank in the accumulated ranking to obtain a list Ni,p1,Ni,p2,...,Ni,pG. Then, according to this list, based on
Figure BDA0002682839910000071
The prediction accuracy of each group of track data sets at each bit can be calculatedAccuracy Pi,g,i∈[0,1,...,M-1],g∈[1,2,...,G]。
a2, then according to the accuracy sequence
Figure BDA0002682839910000072
Calculating the privacy protection degree WAE of each group of track data setsi
Figure BDA0002682839910000073
Wherein, Pi,gAnd P'gIndicating the accuracy of the ranking at the location of list g after privacy protection and before privacy protection, respectively.
Due to the accuracy of the sequence
Figure BDA0002682839910000074
The sections are arranged in the order from big to small, and the sections with higher rank are more easily attacked by attackers, so the invention hopes to reduce the accuracy of the positions with higher rank to ensure that the track privacy is fully protected, and the weighted absolute error, namely the privacy protection degree formula is selected to define the privacy protection degree. Since an attacker tends to select a prediction result with a ranked position with higher accuracy to attack, the higher the accuracy of the position with the higher ranking, the greater the influence, and the greater the assigned privacy weight. Therefore, here choose
Figure BDA0002682839910000075
The effect of variations in accuracy for distinguishing different ranked locations on the degree of privacy protection, such as assuming that the accuracy of the first and fifth ranked locations varies by Δ P1And Δ P5And Δ P1=ΔP5From the formula, we know that the two coefficients are 1 and 1/5, respectively, then in the accumulation,
Figure BDA0002682839910000076
therefore, the judgment users of the attackers with different prediction accuracies are well distinguishedThe location of the real destination.
Figure BDA0002682839910000077
In (1),
Figure BDA0002682839910000078
the corresponding noise is 0, i.e. this is a true prediction without noise. Then, through the formula for calculating the WAE, we calculate M privacy protection degree values, which are respectively WAE0,WAE1,...,WAEM-1Then wherein WAE0=0。
a3, and then generating a plurality of derivative values according to the obtained privacy protection degree according to the following preset functions: y is x2,y=x3,y=x-1,y=x-2
Figure BDA0002682839910000079
y is log x, y is ln x; wherein x is the privacy protection degree, and y is the derived value; therefore, a group of training samples can be formed by the privacy protection degree and the derivative value thereof correspondingly obtained by each group of track data sets, and the noise radius grade i corresponding to each track data set is used as the label value of each training sample;
step 2, training a protection model;
training a privacy quantification protection model based on multiple linear regression by taking M groups of training samples as input and taking corresponding label values as output; wherein, substituting M sets of training samples and their corresponding label values can be expressed as the following linear relationship set:
Figure BDA0002682839910000081
in the formula, λ0,λ1,λ2,...,λqTo quantify the parameters of the protection model for privacy0,∈2,...,∈M-1To protect the bias value of the model, xi,1,xi,2,…,xi,qAre respectively asIth set of training samples, yiFor the label value of the ith set of training samples, i ∈ [0,1];
Training a multiple linear regression-based protection model, i.e. solving the parameter lambda corresponding to the minimum of the following optimization function0,λ1,λ2,...,λqThe optimization function is:
Figure BDA0002682839910000082
in this embodiment, the Scikit-left machine learning library is used to calculate λ0,λ1,...,λqThe optimization function value SSE is minimized.
After the privacy quantization protection model is obtained through training by using the training sample, pruning simplification processing is further carried out on the obtained privacy quantization protection model, so that the privacy quantization protection model is simplified on the basis of ensuring the privacy protection effect, and the calculated amount of privacy quantization is reduced. The pruning simplified treatment process comprises the following steps:
step A1, training the privacy quantification protection model obtained by using the training sample, and determining the parameter lambda of the privacy quantification protection model0,λ1,λ2,...,λqStored in a parameter sequence lambda*And a sequence of buffer parameters lambdatIts optimization function value is expressed as SSE*(ii) a Storing M groups of training samples in variable array A*And cache variable array At
Step A2, for parameter sequence λ*The privacy quantization protection model is formed, and each variable is calculated based on a variable array A*Correlation coefficient R with labelkAll correlation coefficients R to be obtainedkIs stored in a list R*(ii) a Wherein the correlation coefficient RkThe calculation formula of (2) is as follows:
Figure BDA0002682839910000083
in the formula, xi,kIs a variable array A*K variable value, y of the ith training sample setiFor the label value corresponding to the ith set of training samples,
Figure BDA0002682839910000084
is a variable array A*The k-th variable mean of the middle M training samples,
Figure BDA0002682839910000085
the average value of M label values corresponding to M groups of training samples is obtained;
step A3, from List R*Selecting the minimum correlation coefficient, setting the subscript as s, and combining the minimum correlation coefficient RsAnd comparing with a preset correlation coefficient threshold value:
if the minimum correlation coefficient RsIf the correlation coefficient is larger than a preset correlation coefficient threshold value, the parameter sequence lambda is determined to be*The formed privacy quantification protection model is used as a final simplified privacy quantification protection model;
if the minimum correlation coefficient RsLess than or equal to the preset correlation coefficient threshold (set to 0.5 in this embodiment), the buffer parameter sequence λ is obtainedtParameter lambda with middle subscript ssSet to 0, buffer variable array AtVariable value slave cache variable array A with middle subscript stGet out of the buffer, and then buffer the variable array AtAnd sample label is substituted into a buffer parameter sequence lambdatThe formed privacy quantitative protection model calculates the optimization function value and stores the optimization function value in the SSEtComparison of SSEtAnd SSE*
If SSEt>SSE*Then the parameter sequence λ is maintained*Without change, will be represented by a parameter sequence λ*The formed privacy quantification protection model is used as a final simplified privacy quantification protection model;
if SSEt≤SSE*Then the parameter sequence is lambda*Updating to and caching parameter sequence lambdatSame, variable array A*Update to and cache variable array AtSame, SSE*Is replaced with SSEtAnd returning to the step A2.
Step 3, processing track noise;
as shown in fig. 1, when a privacy protection degree requirement (PR) and a track to be protected are received, the privacy protection degree requirement is input to a final simplified privacy quantization protection model of pruning simplification processing, and a radius level of noise to be injected into the track is output; the trajectory to be protected is compared with a historical trajectory data set DtrainThe tracks in (1) belong to the same area;
according to the noise radius grade output by the privacy quantization protection model, noise is injected into the track to be protected, the track after the noise is injected is sent to the position server instead of the original real track, the real track is protected according to the privacy protection degree requirement, and an attacker is prevented from carrying out destination prediction attack on the user through the track received by the server.
Experimental setup
In order to verify the effectiveness of different scenes and attack methods, the method provided by the invention is provided with two groups of data sets, wherein one group of data sets is from a T-Drive project and is a Beijing taxi data set collected from 9 month 1 to 10 month 25 in 2013. Another set of boldo taxi data from Kaggle collected for days 7, 1 in 2013 to 30 in 6, 2014.
There are three types of attack models: SubSyn and T-DesP are Markov transition matrix based destination prediction models, Distribution is Gaussian probability Distribution and trajectory clustering based prediction models, and H-TALL is a neural network based prediction model.
This experiment has set up three indexes, the reliability of the presentation effect:
Figure BDA0002682839910000091
Figure BDA0002682839910000092
Figure BDA0002682839910000093
MAE and RMSE are well able to represent the error, and stability, of the protection provided by our protection framework. And ER represents the degree of satisfaction of the user, and the number of times of prediction of satisfaction of the user is the ratio of the total number of times according to different threshold values omega.
The experimental contents have the following three aspects:
(1) effect against different attack methods (destination prediction methods);
(2) effects in different scenarios (datasets of different sizes);
(3) the influence of parameter setting on the effect;
setting parameters:
PR:0.1,0.2,0.3,0.4,0.5,0.6;
∈:0.1,0.3,0.5,0.7,0.9,1.1,1.3,1.5,1.7;
ω:0.01,0.02,0.03,0.04,0.05;
results of the experiment
Fig. 2 and 3 account for the situation where our protection model provides quantitative privacy protection on two data sets for four different attack methods based on the privacy requirements of the user. Fig. 2 and 3 illustrate the accuracy (MAE) and stability (RMSE), respectively, of quantitative privacy protection provided by our privacy protection framework for user privacy needs. First, in fig. 2, the protection average bias value provided by us is not more than 0.04 at the highest on different data sets and different attack methods. Moreover, the protection effect of the method exceeds 0.02 in the attack method of Distribution only when PR is equal to 0.2. That is, our privacy preserving framework is capable of providing extremely precise protection, i.e., with a maximum deviation of PR ± 0.2, in most cases. Fig. 3 also shows that the maximum root mean square error of our protection framework does not exceed 0.04, that is, the protection effect provided by our framework is very stable in multiple privacy protections.
Fig. 4 shows the relationship between the satisfaction threshold and the satisfaction degree. It can be seen that when the user has severe requirements (i.e. ω < 0.3), our protection can satisfy most users, even the best case, and the satisfaction can reach more than 90%. And when the requirements of users are more relaxed (namely omega is more than 0.3), the satisfaction degree of the protection effect can be extremely close to 100% under the condition of more than half.
Finally, the influence of the noise parameter e on the protection effect, here the relation between the RMSE and the noise parameter e is used. Fig. 5 and 6 plot the results of the experiment on different data sets. With the increase of noise parameters, the privacy protection effect of the user is weakened, so that the privacy is better protected by selecting a lower differential privacy parameter epsilon.
The above embodiments are preferred embodiments of the present application, and those skilled in the art can make various changes or modifications without departing from the general concept of the present application, and such changes or modifications should fall within the scope of the claims of the present application.

Claims (9)

1. A quantifiable privacy preserving method for destination prediction, comprising:
step 1, acquiring training data;
obtaining a set of historical track data sets DtrainM groups of same historical track data sets are obtained through copying;
respectively injecting the radius r into the M groups of historical track data sets by using a Laplace mechanismi,i∈[0,1,…,M-1]To obtain a trajectory data set with different noise added:
Figure FDA0002682839900000011
ri=r0+i×rbase,i∈[0,1,…,M-1],
in the formula (I), the compound is shown in the specification,
Figure FDA0002682839900000012
representing the ith group of track data sets, wherein the injected noise radius level is i;
Figure FDA0002682839900000013
represents the jth track, L, in the ith set of historical track data setsDRepresenting the size of the historical track data set; r isiRepresenting the radius of noise, r, injected into the ith set of historical track data0To use the radius randomly generated by the Laplace mechanism, rbaseIs a preset noise base radius;
Figure FDA0002682839900000014
show to the track
Figure FDA0002682839900000015
Random angle of implantation and radius riThe noise of (2);
for L in each set of trace dataDPredicting the destination of each track, counting and calculating the privacy protection degree of each group of track data sets according to the prediction result, and then generating a plurality of derivative values according to the obtained privacy protection degree and a preset function; the privacy protection degree and the derivative value thereof correspondingly obtained by each group of track data sets form a group of training samples, and the noise radius grade i corresponding to each track data set is used as the label value of each training sample;
step 2, training a protection model;
training a privacy quantification protection model based on multiple linear regression by taking M groups of training samples as input and taking corresponding label values as output;
step 3, processing track noise;
when receiving a privacy protection degree requirement and a track to be protected, inputting the privacy protection degree requirement to a privacy quantization protection model, and outputting to obtain a noise radius grade to be injected into the track; the trajectory to be protected is compared with a historical trajectory data set DtrainThe tracks in (1) belong to the same area;
and injecting noise into the track to be protected according to the noise radius level output by the privacy quantization protection model by using a Laplace mechanism, and finally sending the track subjected to noise injection to a position server.
2. The method of claim 1, wherein the multiple linear regression-based privacy quantification protection model substitutes M sets of training samples and their corresponding label values into a linear relationship set that can be expressed as:
Figure FDA0002682839900000016
in the formula, λ012,…,λqTo quantify the parameters of the protection model for privacy0,∈2,…,∈M-1To protect the bias value of the model, xi,1,xi,2,…,xi,qAre the ith set of training samples, yiFor the label value of the ith set of training samples, i ∈ [0,1, …, M-1];
Training a multiple linear regression-based protection model, i.e. solving the parameter lambda corresponding to the minimum of the following optimization function012,…,λqThe optimization function is:
Figure FDA0002682839900000021
3. the method according to claim 2, characterized in that after the privacy quantitative protection model is obtained by training with the training sample, the obtained privacy quantitative protection model is further pruned and simplified; and 3, acquiring the Laplace noise radius grade to be injected into the track by using the simplified privacy quantization protection model.
4. The method of claim 3, wherein the pruning reduction process is:
step A1, training the privacy quantification protection model obtained by using the training sample, and determining the parameter lambda of the privacy quantification protection model012,…,λqStoringIn a parameter sequence lambda*And a sequence of buffer parameters lambdatIts optimization function value is expressed as SSE*(ii) a Storing M groups of training samples in variable array A*And cache variable array At
Step A2, for parameter sequence λ*The privacy quantization protection model is formed, and each variable is calculated based on a variable array A*Correlation coefficient R with labelkAll correlation coefficients R to be obtainedkIs stored in a list R*(ii) a Wherein the correlation coefficient RkThe calculation formula of (2) is as follows:
Figure FDA0002682839900000022
in the formula, xi,kIs a variable array A*K variable value, y of the ith training sample setiFor the label value corresponding to the ith set of training samples,
Figure FDA0002682839900000023
is a variable array A*The k-th variable mean of the middle M training samples,
Figure FDA0002682839900000024
the average value of M label values corresponding to M groups of training samples is obtained;
step A3, from List R*Selecting the minimum correlation coefficient, setting the subscript as s, and combining the minimum correlation coefficient RsAnd comparing with a preset correlation coefficient threshold value:
if the minimum correlation coefficient RsIf the correlation coefficient is larger than a preset correlation coefficient threshold value, the parameter sequence lambda is determined to be*The formed privacy quantification protection model is used as a final simplified privacy quantification protection model;
if the minimum correlation coefficient RsIf the correlation coefficient is less than or equal to the preset correlation coefficient threshold value, the parameter sequence lambda is cachedtParameter lambda with middle subscript ssSet to 0, buffer variable array AtVariable value slave cache variable array with middle subscript of sAtGet out of the buffer, and then buffer the variable array AtAnd sample label is substituted into a buffer parameter sequence lambdatThe formed privacy quantitative protection model calculates the optimization function value and stores the optimization function value in the SSEtComparison of SSEtAnd SSE*
If SSEt>SSE*Then the parameter sequence λ is maintained*Without change, will be represented by a parameter sequence λ*The formed privacy quantification protection model is used as a final simplified privacy quantification protection model;
if SSEt≤SSE*Then the parameter sequence is lambda*Updating to and caching parameter sequence lambdatSame, variable array A*Update to and cache variable array AtSame, SSE*Is replaced with SSEtAnd returning to the step A2.
5. The method according to claim 1, wherein the specific method for statistically calculating the privacy protection degree of each group of track data sets according to the prediction result is as follows:
let i group trajectory data set
Figure FDA0002682839900000031
Middle LDL corresponding to the strip trackDThe prediction accuracy of the G partitions distributed in the track is P from large to smalli,1,Pi,2,…,Pi,GTogether forming an ith set of trajectory data sets
Figure FDA0002682839900000032
To a sequence of accuracies
Figure FDA0002682839900000033
Then according to the accuracy sequence
Figure FDA0002682839900000034
Calculating the privacy protection degree WAE of each group of track data setsi
Figure FDA0002682839900000035
Wherein, P'gFor historical track data set DtrainPrediction accuracy at the g-th partition.
6. The method of claim 1, wherein the predetermined function used to generate the derived value according to the degree of privacy protection comprises any one or more of: y is x2,y=x3,y=x-1,y=x-2
Figure FDA0002682839900000036
y is log x, y is ln x; wherein x is the privacy protection degree and y is the derived value.
7. The method of claim 1, wherein the predetermined noise floor radius r isbase=500m。
8. An apparatus comprising a processor and a memory; wherein: the memory is to store computer instructions; the processor is configured to execute the computer instructions stored by the memory, in particular to perform the method according to any one of claims 1 to 7.
9. A computer storage medium storing a program which, when executed, performs the method of any one of claims 1 to 7.
CN202010967393.9A 2020-09-15 2020-09-15 Quantifiable privacy protection method, equipment and medium for destination prediction Active CN112182645B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010967393.9A CN112182645B (en) 2020-09-15 2020-09-15 Quantifiable privacy protection method, equipment and medium for destination prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010967393.9A CN112182645B (en) 2020-09-15 2020-09-15 Quantifiable privacy protection method, equipment and medium for destination prediction

Publications (2)

Publication Number Publication Date
CN112182645A true CN112182645A (en) 2021-01-05
CN112182645B CN112182645B (en) 2022-02-11

Family

ID=73921078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010967393.9A Active CN112182645B (en) 2020-09-15 2020-09-15 Quantifiable privacy protection method, equipment and medium for destination prediction

Country Status (1)

Country Link
CN (1) CN112182645B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301562A (en) * 2021-05-20 2021-08-24 山东大学 Second-order multi-autonomous system differential privacy convergence method and system for quantitative communication
CN113420333A (en) * 2021-07-16 2021-09-21 合肥工业大学 Privacy-protection online taxi appointment boarding point recommendation system and method
CN114065287A (en) * 2021-11-18 2022-02-18 南京航空航天大学 Track difference privacy protection method and system for resisting prediction attack

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0001510D0 (en) * 2000-01-25 2000-03-15 Ross Gordon Methods for transmitting information to individuals and groups by exploiting locality factors whilst preserving user privacy
US20110208763A1 (en) * 2010-02-25 2011-08-25 Microsoft Corporation Differentially private data release
KR20150107331A (en) * 2014-03-14 2015-09-23 국방과학연구소 Method and device for generating random noise data preserving the correlation on privacy preserving time-series databases
CN105069371A (en) * 2015-07-28 2015-11-18 武汉大学 Geospatial data based user privacy protection method and system
CN105912616A (en) * 2016-04-07 2016-08-31 电子科技大学 Enhanced privacy protection method based on track reconstruction
CN106650486A (en) * 2016-09-28 2017-05-10 河北经贸大学 Trajectory privacy protection method in road network environment
CN108427891A (en) * 2018-03-12 2018-08-21 南京理工大学 Neighborhood based on difference secret protection recommends method
CN108763954A (en) * 2018-05-17 2018-11-06 西安电子科技大学 Linear regression model (LRM) multidimensional difference of Gaussian method for secret protection, information safety system
CN109409125A (en) * 2018-10-12 2019-03-01 南京邮电大学 It is a kind of provide secret protection data acquisition and regression analysis

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0001510D0 (en) * 2000-01-25 2000-03-15 Ross Gordon Methods for transmitting information to individuals and groups by exploiting locality factors whilst preserving user privacy
US20110208763A1 (en) * 2010-02-25 2011-08-25 Microsoft Corporation Differentially private data release
KR20150107331A (en) * 2014-03-14 2015-09-23 국방과학연구소 Method and device for generating random noise data preserving the correlation on privacy preserving time-series databases
CN105069371A (en) * 2015-07-28 2015-11-18 武汉大学 Geospatial data based user privacy protection method and system
CN105912616A (en) * 2016-04-07 2016-08-31 电子科技大学 Enhanced privacy protection method based on track reconstruction
CN106650486A (en) * 2016-09-28 2017-05-10 河北经贸大学 Trajectory privacy protection method in road network environment
CN108427891A (en) * 2018-03-12 2018-08-21 南京理工大学 Neighborhood based on difference secret protection recommends method
CN108763954A (en) * 2018-05-17 2018-11-06 西安电子科技大学 Linear regression model (LRM) multidimensional difference of Gaussian method for secret protection, information safety system
CN109409125A (en) * 2018-10-12 2019-03-01 南京邮电大学 It is a kind of provide secret protection data acquisition and regression analysis

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
侯尧等: "基于差分隐私的个人轨迹信息保护机制", 《计算机工程与应用》 *
叶阿勇等: "基于预测和滑动窗口的轨迹差分隐私保护机制", 《通信学报》 *
李敏等: "差分隐私保护下的Adam优化算法研究", 《计算机应用与软件》 *
王宝楠等: "基于差分隐私的线性回归分析", 《电脑知识与技术》 *
许斌等: "大数据环境中非交互式查询差分隐私保护模型", 《计算机工程与应用》 *
郑剑等: "差异化隐私预算分配的线性回归分析算法", 《计算机应用与软件》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113301562A (en) * 2021-05-20 2021-08-24 山东大学 Second-order multi-autonomous system differential privacy convergence method and system for quantitative communication
CN113420333A (en) * 2021-07-16 2021-09-21 合肥工业大学 Privacy-protection online taxi appointment boarding point recommendation system and method
CN113420333B (en) * 2021-07-16 2022-10-04 合肥工业大学 Privacy-protection online taxi appointment and boarding point recommendation system and method
CN114065287A (en) * 2021-11-18 2022-02-18 南京航空航天大学 Track difference privacy protection method and system for resisting prediction attack
CN114065287B (en) * 2021-11-18 2024-05-07 南京航空航天大学 Track differential privacy protection method and system for resisting predictive attack

Also Published As

Publication number Publication date
CN112182645B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
CN112182645B (en) Quantifiable privacy protection method, equipment and medium for destination prediction
Bahn et al. Can niche‐based distribution models outperform spatial interpolation?
CN106600052B (en) User attribute and social network detection system based on space-time trajectory
CN106446005B (en) Factorization model
Yang et al. Predicting next location using a variable order Markov model
CN110855648B (en) Early warning control method and device for network attack
US11797839B2 (en) Training neural networks using priority queues
CN111652378B (en) Learning to select vocabulary for category features
CN109643323B (en) Selecting content items using reinforcement learning
CN111639291A (en) Content distribution method, content distribution device, electronic equipment and storage medium
Xi et al. A hybrid algorithm of traffic accident data mining on cause analysis
Yang et al. Recurrent spatio-temporal point process for check-in time prediction
Li et al. Differentially private trajectory analysis for points-of-interest recommendation
Zhang et al. An ensemble method for job recommender systems
CN110602631B (en) Processing method and processing device for location data for resisting conjecture attack in LBS
CN112488163A (en) Abnormal account identification method and device, computer equipment and storage medium
CN117540106B (en) Social activity recommendation method and device for protecting multi-mode data privacy
Poornalatha et al. Web page prediction by clustering and integrated distance measure
CN114394099B (en) Method and device for identifying abnormal running of vehicle, computer equipment and storage medium
CN115495478A (en) Data query method and device, electronic equipment and storage medium
CN115225359A (en) Honeypot data tracing method and device, computer equipment and storage medium
Shahrasbi et al. On Detecting Data Pollution Attacks On Recommender Systems Using Sequential GANs
Lin et al. Finding similar users from GPS data based on assignment problem
Park et al. Spatio‐temporal query contextualization for microtext retrieval in social media
Ugli et al. Movie Recommendation System Using Community Detection Based on the Girvan–Newman Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant