CN111770454B - Game method for position privacy protection and platform task allocation in mobile crowd sensing - Google Patents

Game method for position privacy protection and platform task allocation in mobile crowd sensing Download PDF

Info

Publication number
CN111770454B
CN111770454B CN202010629965.2A CN202010629965A CN111770454B CN 111770454 B CN111770454 B CN 111770454B CN 202010629965 A CN202010629965 A CN 202010629965A CN 111770454 B CN111770454 B CN 111770454B
Authority
CN
China
Prior art keywords
user
task
privacy
platform
disturbance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010629965.2A
Other languages
Chinese (zh)
Other versions
CN111770454A (en
Inventor
沈航
蔡威
白光伟
王天荆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN202010629965.2A priority Critical patent/CN111770454B/en
Publication of CN111770454A publication Critical patent/CN111770454A/en
Application granted granted Critical
Publication of CN111770454B publication Critical patent/CN111770454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/02Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a game method for position privacy protection and platform task allocation in mobile crowd sensing, which firstly simulates the interaction between a user and a platform through a credible third party: and each user selects a privacy budget to add noise to the position, and the platform allocates tasks according to the disturbance position of each user. The interaction process is then modeled as a game and equilibrium points are derived. And finally, continuously trying different position disturbance strategies by using a reinforcement learning method, and outputting an optimal position disturbance scheme. Experimental results show that the mechanism can improve the overall utility of the user as much as possible while optimizing the task allocation utility, so that the user and the platform achieve win-win results. The method solves the problems that in the process that an MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks, due to the existence of malicious attackers, the position availability is poor and the task allocation effectiveness is reduced when the privacy protection strength of the users is improved.

Description

Game method for position privacy protection and platform task allocation in mobile crowd sensing
Technical Field
The technical scheme belongs to the technical field of networks, and particularly relates to a win-win game method for protecting position privacy and distributing platform tasks in a mobile crowd sensing MCS.
Background
In recent years, the popularity of Mobile Crowd Sensing (MCS) has been greatly promoted by the explosive development of internet of things technology. A typical MCS system consists of a data requester, a server (MCS platform) and a mobile user. The server distributes the tasks of the data requesters to the mobile users in the MCS system, and the mobile users use the mobile intelligent equipment to complete data acquisition and send back to the server and obtain certain rewards.
Task allocation is one of the most important links in the MCS system. The goal is to optimize the utility of the overall system while accomplishing all (or most) of the tasks of the target perception area. Minimizing the travel distance is typically selected as an optimization target for MCS task allocation. However, the travel distance cannot be calculated from the user's location information, and if the true location is transmitted to the MCS platform, the user will be at risk of personal privacy disclosure. Therefore, to attract more users to participate in the perception task, the MCS system must provide location privacy protection for the users.
The space camouflage technology in the traditional position privacy protection technology can also be used for user position privacy protection in MCS task allocation. The level of privacy protection provided by this technique is easily reduced if a malicious attacker in the MCS system has some a priori knowledge. Differential privacy techniques can be used to provide powerful location privacy protection for users without regard to prior knowledge of the adversary. In addition, considering that different users have different requirements on privacy protection, the MCS system needs to provide privacy protection with a plurality of different privacy budgets for the users to choose from.
The travel distance is an important index for measuring the MCS task allocation cost. Researchers have proposed an ActiveCrowd task allocation framework that considers time sensitivity, aiming at minimizing the total distance of movement, and solving the user selection problem of multitasking in MCS. Since the MCS platform predicts the true locations of all users, this may reveal user location privacy, reducing the willingness of the users to participate in perception. There are also researchers using the conventional spatial masquerading technique in LBS to protect the location privacy of users in the task allocation. Researchers also propose a spatial crowdsourcing mechanism based on differential privacy and geographic positioning, and provide efficient services for the outside while providing the users with the same location privacy protection with the same privacy budget. Some researchers use differential privacy technology to obscure the user positions and provide all users with the same degree of position privacy protection in the task allocation process. However, this framework is difficult to accommodate for user-differentiated privacy protection requirements. In consideration of the personalized privacy protection requirements of users, researchers also put forward a personalized privacy protection task distribution framework, and the users are allowed to specify the privacy budget by using a K-anonymity thought, so that personalized position privacy protection is provided for the users. Due to the fact that the randomness of the user for selecting the privacy budget is strong, especially when a malicious attacker exists in the MCS system, the user selects the privacy budget with higher privacy protection degree to cause the usability of the user position to be reduced, and the task of distributing the MCS platform is not facilitated.
Disclosure of Invention
From the foregoing discussion of the prior art, it can be seen that, in the process of designing a task allocation framework for providing personalized privacy protection, besides ensuring that the MCS platform allocates tasks efficiently, a more appropriate location privacy protection is provided for the user.
Game theory is an effective approach to the MCS system performance tradeoff problem, as in related studies of MCS incentive mechanisms, game theory is used to provide methods such as auctions, pricing, and reputation based mechanisms to incentivize users to participate in MCS awareness. Trusted Third Parties (TTPs) are the most important part of the mechanism. The TTP not only needs to provide position privacy protection for the user, but also simulates interaction of user selection of privacy budget and MCS platform allocation task, and establishes the most appropriate personalized privacy protection for the user.
The mobile crowd-sourcing aware MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks. However, due to the existence of malicious attackers, the usability of the location is deteriorated due to the enhancement of privacy protection of the user, and the task allocation utility is reduced.
The invention provides a game method for position privacy protection and platform task allocation in mobile crowd-sourcing perception, which is a user and platform win-win game method based on reinforcement learning and comprises the following steps:
firstly, simulating interaction between a user and an MCS platform through a TTP of a trusted third party: each user selects a privacy budget to add noise to the position, and the MCS platform allocates tasks according to the disturbance position of each user;
modeling the interaction process into a game, and deducing balance points;
and finally, continuously trying different position disturbance strategies by using a reinforcement learning method, and outputting an optimal position disturbance scheme.
The win-win game method for user position privacy protection and platform task allocation uses a reinforcement learning algorithm, and trains an off-line model capable of outputting an optimal position disturbance strategy by continuously trying the position disturbance scheme combination of all users. The experimental result shows that the privacy budget task allocation game can establish personalized and most appropriate position privacy protection for the user in an MCS system providing personalized privacy protection, so that the privacy protection of the user is improved as much as possible while the task allocation utility is ensured, and the win-win situation of the user and the platform is achieved.
Drawings
FIG. 1 is an overall frame of an MCS system;
FIG. 2 is a schematic illustration of a privacy budget-task allocation game in a trusted third party TTP;
FIG. 3 is a decision framework based on reinforcement learning;
FIGS. 4a and 4b are schematic diagrams comparing the performance of the algorithm of the present invention with a random algorithm;
wherein: FIG. 4a is user overall utility and FIG. 4b is task assignment utility;
FIGS. 5a and 5b are schematic diagrams of the effect of the number of users;
wherein: FIG. 5a is the user's overall utility and FIG. 5b is the average travel distance;
FIGS. 6a and 6b are schematic diagrams of the effect of the task publication radius;
wherein: fig. 6a is the user's overall utility and fig. 6b is the average travel distance.
Detailed Description
The technical scheme is further explained by combining the drawings and the detailed implementation mode as follows: scheme 1 overview
The mobile crowd-sourcing aware MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks. However, due to the existence of malicious attackers, the usability of the location is deteriorated due to the enhancement of privacy protection of the user, and the task allocation utility is reduced.
Aiming at the problem, the game method provided by the invention firstly simulates the interaction between the user and the platform through a credible third party: and each user selects a privacy budget to add noise to the position, and the platform allocates tasks according to the disturbance position of each user. The interaction process is then modeled as a game and equilibrium points are derived. And finally, continuously trying different position disturbance strategies by using a reinforcement learning method, and outputting an optimal position disturbance scheme. Experimental results show that the mechanism can improve the overall utility of the user as much as possible while optimizing the task allocation utility, so that the user and the platform achieve win-win results. The method solves the problems that in the process that an MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks, due to the existence of malicious attackers, the position availability is poor and the task allocation effectiveness is reduced when the privacy protection strength of the users is improved.
The game method is characterized in that in a mobile crowd sensing MCS system, after receiving a task request, an MCS platform issues a task; providing position information to the MCS platform intentionally wishing to perform a task; the MCS platform selects users and distributes tasks, and is characterized in that a trusted third party TTP simulates the interaction between the users and the MCS platform; the method comprises the following steps: 1) for the tasks issued by the MCS platform, users who wish to execute the tasks transmit the true distance and privacy budget to the applied tasks to the TTP; 2) simulating the interaction process of the user and the MCS platform in the TTP, and obtaining the optimal disturbance position of the user; 3) the MCS platform selects a user allocation task according to the optimal disturbance position of the user from the TTP;
in the step 2), the interaction process between the users and the MCS platform is simulated by the Stencoerberg game, the whole leader user is used as a leader in the Stencoerberg game model, and the MCS platform is used as a follower in the model; the steps of the leader and the following interaction process are as follows:
2.1) the leader selects a privacy budget and communicates to the follower the perturbation policy of its location;
2.2) the follower allocates tasks for the follower according to the disturbance strategy of the leader by minimizing the travel distance;
2.3) after receiving the task allocation result of the follower, the leader adjusts the disturbance strategy, transmits the disturbance strategy of the new position to the follower, and repeatedly executes the step 2.2) until the balance point is reached, and then the circulation is ended to obtain the optimal position disturbance strategy; under a balance point, the optimal state of the utility of the leader is maximized while the task distribution utility is ensured;
2.4) obtaining the optimal disturbance position of the user according to the optimal position disturbance strategy during the balance point, and then entering the step 3) for processing.
In the steps 2.2) and 2.3), different position disturbance strategies are continuously tried by using a reinforcement learning method, and finally, an optimal position disturbance strategy is obtained.
Firstly, a Markov decision process is used for representing a process of obtaining an optimal disturbance strategy: then, a Q-learning algorithm is adopted to solve the Markov decision process to obtain the initial state s(1)A final execution action is initiated to maximize convergence of the accumulated reward values;
the following were used:
section 2 describes a system model of the present invention;
section 3 carries on problem modeling to the proposed game mechanism;
part 4, MDP the decision problem and use Q-learning algorithm to solve;
section 5 is comparison of algorithm performance and analysis of experimental results.
2 System model
As shown in fig. 1, the overall framework of the MCS system includes an MCS platform, a user and a trusted third party TTP.
Upon receiving the request task, the platform issues a set of tasks
Figure GDA0002589550860000041
The corresponding radius set of the task release area is
Figure GDA0002589550860000042
In the task allocation stage, the platform allocates tasks with the minimum travel distance as a target according to task application information transmitted by a trusted third party to generate an allocation matrix An×mAnd completing task allocation.
The set of mobile users in the system is represented as
Figure GDA0002589550860000043
After the platform releases the task, each user wiSending a triplet
Figure GDA0002589550860000051
To a trusted third party. Wherein the variable
Figure GDA0002589550860000052
Is the user wiMaximum privacy budget accepted (the greater the privacy budget, the less the privacy protection, the greater the possibility of location privacy disclosure), aggregation
Figure GDA0002589550860000053
Figure GDA0002589550860000054
Represents user wiSet of requested tasks, vector
Figure GDA0002589550860000055
Represents user wiTo kiThe true distance vector of the individual tasks. After the platform allocates the tasks, according to the allocation matrix An×mAnd executing the task assigned to the user to a specific task position.
The trusted third party is a provider for protecting location privacy, is also a decision maker for a user optimal location perturbation scheme, and is an extremely important part in the system.
Figure GDA0002589550860000056
A set of privacy budgets representing h different degrees of protection offered by a trusted third party. After receiving the triples uploaded by the user, the trusted third party is the user wiProviding a privacy budget εi(
Figure GDA0002589550860000057
I.e. satisfy user wiPersonalized requirement of) to obtain the location perturbation policy vector pi ═ (epsilon) of all users12,...,εn) Then simulating the platform to assign tasks with minimized travel distance, generating an assignment matrix An×mThen, the position disturbance strategy pi of the user is adjusted according to the distribution matrix so that the utility of the user is maximized, then task distribution is carried out, and continuous iteration is carried out to obtain the optimal position disturbance strategy
Figure GDA0002589550860000058
Finally, the trusted third party connects user wiTask application information of
Figure GDA0002589550860000059
And uploading to a real system platform.
It is assumed that each user can apply for a plurality of tasks, each task can be assigned to only one user, and only one user can perform one task.
3 problem modeling
The method comprises the steps of firstly introducing a generalized differential privacy concept, then analyzing position privacy protection provided by a system, then introducing a task allocation mode of a platform, finally explaining user privacy protection-platform task allocation game, and deducing a balance point of the game.
3.1 generalized differential privacy
For any two adjacent data sets x, x 'and any output Y, if the probability distribution M (x), M (x') is such that the maximum difference in Y is eεI.e. M (x) (Y) eεM (x') (Y), then the mechanism M is differential privacy satisfying the privacy budget epsilon. For any two positions x and x ', if their Euclidean distance satisfies d (x, x ') ≦ r, then under the obfuscation mechanism M, M (x) and M (x ') do not differ by more than ε r, which represents the privacy budget per unit distance. In this case, even if a malicious attacker knows the obfuscation mechanism M, the true location cannot be discerned.
Definition 1 (d)xDifferential privacy) mechanism M satisfies dxDifferential privacy if and only if for arbitrary
Figure GDA00025895508600000510
Are all provided with
Figure GDA0002589550860000061
Where M (x) and (Y) represent the probability that x belongs to set Y. dxε d (x, x '), where ε is the privacy budget, the smaller ε, the more aggressive the privacy protection, and d (x, x ') represents the distance between x and x '.
Definition 2 (Laplace mechanism) hypothesis is
Figure GDA0002589550860000062
Two sets, dxIs that
Figure GDA0002589550860000063
A metric of (1). The probability density function for all elements x in x is
Figure GDA0002589550860000064
Wherein
Figure GDA0002589550860000065
Figure GDA0002589550860000066
Then the mechanism
Figure GDA0002589550860000067
Is selected from (x, d)x) To
Figure GDA0002589550860000068
And satisfies dxDifferential privacy.
In particular, when the elements of x and y are both one-dimensional, the Laplace mechanism indicates that the transformed value y results from the initial value x adding the corresponding noise, i.e., y ═ x + Lap (1/epsilon). When the mechanism M satisfies the condition dxDifferential privacy of ∈ | x-y |.
Proposition 1, if dx≤dxThen d is satisfiedxDifferential privacy also satisfies dxDifferential privacy.
It is obvious that d is satisfied for any onexMechanism for differential privacy M, when dx≤dxWhen M also satisfies dxDifferential privacy.
3.2 location privacy protection
Users who are willing to perform a task need to upload the true distance and privacy budget to the applied task to the TTP. The TTP adds corresponding Laplace noise to the real distance according to the received privacy budget, so that an attacker cannot deduce the real position information of the user even knowing a specific position fuzzy mechanism, and the position privacy of the user is protected.
Because the user needs to upload the true distance of the applied task to the TTP, and the TTP finally uploads the disturbed disturbance distance to the MCS platform, the more the number of tasks applied by the user is, the more the exposed location information is, and the possibility of privacy disclosure is correspondingly increased. Meanwhile, the possibility of privacy disclosure has a direct relation with the privacy budget.
Theorem 1. setting
Figure GDA0002589550860000069
Represents user wiTask set to application
Figure GDA00025895508600000618
True distance vector ofiFor user wiThe privacy budget of (a) is determined,
Figure GDA00025895508600000610
is a task set
Figure GDA00025895508600000611
The issue radius set of the task. For mechanism M: m (d)ii)=di+Lap(1/εi) Satisfies the condition of
Figure GDA00025895508600000612
Differential privacy of (1).
And (3) proving that: for arbitrary di
Figure GDA00025895508600000619
dij∈diAnd dij∈diAll represent user wiTo task tjPossible true distance, and dij-d′ij|≤rj
Figure GDA00025895508600000614
Presentation reporting to MCS levelsUser w of a stationiTo task collections
Figure GDA00025895508600000615
Perturbing the distance vector, i.e.
Figure GDA00025895508600000616
Wherein eta12,...,
Figure GDA00025895508600000617
Is subject to Laplace (0, 1/epsilon)i) K of (a)iIndependent and equally distributed random variables. Thus, there are
Figure GDA0002589550860000071
And is also provided with
Figure GDA0002589550860000072
So that M is M (d)ii)=di+Lap(1/εi) Satisfies a privacy level of
Figure GDA0002589550860000073
Differential privacy of (1).
Theorem 1 shows that: the privacy level of the user is related to the chosen privacy budget and the task applied. The smaller the privacy budget is, the greater the privacy protection degree is; the fewer the number of the tasks applied, the less the exposed position information; the smaller the issue radius of the application task, the greater the indistinguishability of two real locations in the same task area.
The trusted third party receives the user wiTransmitted triplet
Figure GDA0002589550860000074
Then, M (d)ii) Providing location privacy protection for it. Wherein
Figure GDA0002589550860000075
I.e. to provide a more aggressive privacy protection.
As can be appreciated from proposition 1, at this point, perturbation mechanism M (d)ii) Still meet the privacy level of
Figure GDA0002589550860000076
Is privacy differentiated.
3.3 task Allocation
And the platform sorts the applicant of each task in a descending order according to the probability of being closer to the task according to the minimum privacy budget of the user, the application task set and the disturbed distance vector transmitted by the trusted third party. And after calculating the descending sequence of the applicants of all the tasks, distributing each task to the nearest user.
Suppose user waAnd wbIs task tjAny two applicants, dajAnd dbjRespectively, representing their true distance to the task. When d isaj<dbjThen t isjIs assigned to waIs even greater. In other words, when
Figure GDA0002589550860000077
At task tjIn descending order sequence of (1), user waArranged at user wbBefore (c) is performed.
Figure GDA0002589550860000078
Is obtained by reaction at dajObtained by adding Laplace noise, thereby obtaining
Figure GDA0002589550860000079
The same can be obtained
Figure GDA00025895508600000710
Wherein mua,μbRespectively is Laplace (0, 1/epsilon)a),Laplace(0,1/εb) The variable of (c). So that there are
Figure GDA00025895508600000711
Memory plane set
Figure GDA0002589550860000081
Equation (3) can be further expressed as
Figure GDA0002589550860000082
The user w can be calculated by double integral evaluation of the formula (4)aUser wbThe probability of closer distance, thereby determining waAnd wbAt task tjThe order of the front and back in the sequence. For task tjAll applicants can find out more than one t by comparing every twojA user sequence ordered in ascending distance. By performing the same calculation for other tasks, a ranking matrix can be calculated
Figure GDA0002589550860000083
Line SjRepresenting a task tjOf the ordered sequence, element sjiK denotes an application execution task tjUser w ofkRanked at the ith position in all applicants. When i is greater than tjWhen the number of applicants is sjiInfinity. At this time, the task assignment problem aimed at minimizing the overall travel distance is reduced to assigning each task to the ranking matrix Sn×mThe first user of each row. However, when the same user ranks at the same order of a plurality of tasks, a conflict occurs, that is, the plurality of tasks are all allocated to the user, and then the optimal allocation scheme can be obtained by performing linear programming with an integer of 0 to 1 and combining formula (4) to eliminate the conflict.
The end result of task assignment is to generate an assignment matrix
Figure GDA0002589550860000084
For any aij∈An×mHas aij∈{0,1}。aijA value of 1 indicates a task tjIs allocated to user wi
Figure GDA0002589550860000085
Each task is distributed to at most one user to execute;
Figure GDA0002589550860000086
indicating that each user performs at most one task.
3.4 privacy budget-task distribution Game
To provide the most appropriate privacy protection for the user, TTP needs to simulate the user selection of perturbation strategies, simulate platform assignment tasks, and simulate user-platform interactions. This interaction process is modeled as a Stancodelberg game (Stackelberg game): the user as a whole is used as a leader to convey the position disturbance strategy of the user as a whole to the platform; the MCS platform is used as a follower to allocate tasks by taking the minimum travel distance as a target according to a disturbance strategy of a user; after receiving the task allocation result of the platform, the user adjusts the overall disturbance strategy to maximize the overall utility, so that the interaction is continuous.
The two game parties are two virtual entities in the TTP respectively: a leader and a follower. And the leader simulates a user to select a disturbance strategy, and the follower simulation platform distributes tasks. As shown in FIG. 2, the leader is first user wiSelecting a privacy budget εiProviding satisfaction of the privacy budget εiThe protection mechanism M of (2), the user overall protection strategy is recorded as pi, the mechanism M (d)ii) By the formula
Figure GDA0002589550860000091
User M (d)ii) Uploaded true distance vector d to the requested taskiPerturbation as a vector
Figure GDA0002589550860000092
The leader uploads the current policy pi to the platform. The follower distributes tasks by taking the minimum travel distance as a target according to the received pi to obtain a task distribution matrix An×m。aijIs a matrix An×mThe value of (b) is 0 or 1. a isijA value of 1 indicates a task tjIs assigned to user wiAnd if the value is 0, the task t is indicatedjIs not allocated to user wi
After the platform task is allocated, user wiIs expected to have a utility function of
Figure GDA0002589550860000093
Wherein λ isiIs the user wiRepresents the user's tendency strength between location privacy protection and assigned tasks, lambdai>1 indicates that location privacy is more likely to be protected.
Figure GDA0002589550860000094
Indicating that a trusted third party is providing a privacy budget εiAfter differential privacy protection of (1), user wiThe distance between the blurred distance vector and the true distance vector is expected to be
Figure GDA0002589550860000095
The utility function expectation of the user as a whole can be expressed as
Figure GDA0002589550860000096
The platform utility function is expressed as
Figure GDA0002589550860000097
Wherein
Figure GDA0002589550860000098
Indicating the number of tasks that are to be assigned,
Figure GDA0002589550860000099
indicating the travel distance expectation for the assigned task. The utility of the platform is expressed in terms of the inverse of the average travel distance, the greater the average travel distance, the lower the utility of the platform.
It is desirable to maximize personal utility as much as possible for a reasonable user. That is, after being assigned with a task, the privacy protection strength is increased in an attempt to better protect the privacy. If not assigned the task, will try to reduce privacy protection degree, let oneself more have an opportunity to select, and then improve individual's usefulness. Therefore, after the follower simulates task allocation each time, the leader can adjust privacy protection strategies of all users according to the current allocation matrix, and the overall effectiveness expectation of the users is maximized. The follower will redistribute the task according to the adjusted privacy policy to minimize the travel distance. The leader and the follower reach a balance point finally through continuous interaction, namely
Figure GDA0002589550860000101
Figure GDA0002589550860000102
This balance point is the optimum state point that maximizes the user's overall utility while optimizing the task allocation utility. At this time, the optimal perturbation strategy selected by the user according to the current task allocation result is the current strategy, and the optimal result of the platform performing task allocation according to the perturbation strategy of the current user is the current task allocation result.
Due to the strategy pi, the selection space is
Figure GDA0002589550860000103
The temporal complexity of the traversal is O (h)m). The time complexity of task allocation is approximately O (n)2) (ii) a The overall time complexity is about O (h)mn2). Since the number m of users in the system is often very large, which results in too high time complexity, the brute force exhaustion method is obviously not the best method for solving the problem.
4 location perturbation decision based on reinforcement learning
Reinforcement learning is useful for solving the problem of an agent maximizing return value during interaction with the environment, and a common model is the standard Markov Decision Process (MDP). Therefore, the invention adopts a reinforcement learning method to solve the problem of disturbance strategy decision for maximizing the utility of the user under the condition of high-efficiency task allocation. This section introduces the MDP of the location perturbation strategy decision problem, followed by the Q-learning algorithm for solving the optimal perturbation strategy.
4.1 MDP of decisions
The Markov decision process is a sequential decision model used for simulating the execution action of an agent and acquiring return in an environment with Markov system state. It is usually expressed as a five-tuple < S, a, P, R, γ >, where S denotes the system state, a denotes the action of the agent, P denotes the transfer function between the system states, R denotes the reward, and γ denotes the discount factor.
The process of the trusted third party selecting the optimal perturbation strategy for the user can be seen as a markov process. The agent is a leader in a trusted third party, and the environment is an interaction process of the leader and a follower. The MDP five-element of the location perturbation strategy decision problem is described in detail below:
the system state is composed of a disturbance strategy vector pi and a task allocation matrix A. Initial state s(1)=[π(0),A(0)]In which pi(0)The privacy budget of each user is represented as an initial value uploaded to a trusted third party, namely the minimum privacy protection degree accepted by the user.
Perturbation strategy
Figure GDA0002589550860000111
An action that is a leader. Since each user can select a set of privacy budgets provided by a trusted third party
Figure GDA0002589550860000112
Any one of the location perturbation schemes meeting the privacy requirement of the leader is adopted, so that the action strategy space of the leader is
Figure GDA0002589550860000113
At time t, the system state s(t)Taking action pi(t)Post arrival state s(t+1). Because the state is composed of the disturbance strategy and the task allocation matrix, and the task allocation matrix depends on the disturbance strategy, the state at the next moment is determined by the current state and the current action, and the conditions are met
P(s(t+1)|s(1)(1),s(2)(2),...,s(t)(t))=P(s(t+1)|s(t)(t)) (14)
I.e. the state transition has markov properties.
The reward R represents the reward for performing the corresponding action in the current state. Using equation (10) as a reward value calculation equation, i.e. at state s(t)Take action pi(t)And then the return value is equal to the utility value of the user at the moment.
The discount factor γ,0 ≦ γ ≦ 1, indicating how important the future and current rewards are, γ ≦ 0 meaning that looking at the current reward alone, γ ≦ 1 indicates that the future reward is as important as the current reward.
Since both the state space and the motion space are finite, the perturbation decision problem is a finite markov decision process. After the perturbation decision is converted into the MDP, the optimal perturbation selection problem in the privacy protection task allocation game is converted into: finding the initial state s(1)A final execution action is initiated that maximizes convergence of the accumulated reward values.
4.2Q-learning based location perturbation decision algorithm
The Q-learning algorithm is an effective unsupervised reinforcement learning algorithm for solving the Markov decision process. The intelligent agent finds the best strategy to achieve the maximum convergence of the reported value by continuously trial and error learning in different environments.
In the Q-learning algorithm, the agent creates a decision matrix Q, where rows represent states and columns represent actions, storing the values of the state-action pairs (s, pi), and initializing to a zero matrix. The Q matrix is iteratively updated by Bellman Equation (Bellman Equation) as follows:
Q(s,π)←(1-α)Q(s,π)+α(uw(s,π)+γV(s')), (15)
Figure GDA0002589550860000114
wherein alpha belongs to (0,1) and represents the learning rate, and the larger the value is, the less the training results before the retention is; u. ofw(s, π) represents the reported value of the execution of action π in state s; s' represents the next state after the state s performs the action π; γ is a discount factor, and has a value of 0 ≦ γ ≦ 1, indicating the effect of the future reward and the current reward on the action value function (Q function), γ ≦ 0 meaning that the action value function is only relevant for the current reward, γ ≦ 1 meaning that the future reward is as important as the current reward for the action value function; the function V (-) represents the maximum value in the next state of the Q matrix.
According to the decision matrix Q and the current state s, the leader can use an e-greedy strategy to avoid the algorithm from falling into local optimality. In state s, the leader performs the current optimal action with a probability of 1-e
Figure GDA0002589550860000125
The actions are randomly selected with a probability of e.
The Q-learning based perturbation scheme decision algorithm is described as follows:
inputting:
Figure GDA0002589550860000121
and (3) outputting: pi
Start of
Step 1, initializing the alpha, gamma,
Figure GDA0002589550860000126
π,Q(s,π)=0,A=0
step 2, for k ← 1to epsilon do
Step 3.s(k)=[A(k-1)(k-1)]
Step 4, selecting action through e-greedy algorithm
Figure GDA0002589550860000122
Step 5, executing action pi, uploading the disturbed user position to the follower
Step 6, the follower distributes tasks and generates a distribution matrix A(k)
Step 7.for i ← 1to m do
Step 8, user wiCalculating utility according to equation (9)
Step 9.end for
Step 10. calculate u according to the formula (10)w(s(k)(k))
Step 11, updating Q(s) according to formula (15)(k)(k))
Step 12, updating V(s) according to formula (16)(k))
Step 13.end for
Step 14.return pi*
End up
The algorithm inputs the minimum privacy budget for all users
Figure GDA0002589550860000123
And system selectable sets of privacy budget values
Figure GDA0002589550860000124
Output optimal disturbance strategy pi*
In step 1, a learning rate alpha and a discount factor gamma used in the algorithm are initialized, a decision matrix Q is initialized to a zero matrix, and a task allocation matrix is initialized to the zero matrix.
Steps 2-13 are a loop body with the epicode representing the maximum number of iterations of training. And 4, in the first circulation, the leader provides privacy protection for the privacy level by using the initial value of the privacy budget uploaded by the user. And in the second circulation and later, the leader selects a disturbance scheme by using an e-greedy algorithm, utilizes the previously trained optimal disturbance strategy according to the probability of 1-e, and randomly selects the disturbance strategy according to the probability of e to avoid local optimization. And 5-6, allocating tasks by the follower according to the received user privacy budget, the disturbance position and the application task set, and generating an allocation matrix. Step 7-9 is to calculate the utility of each user based on the current allocation matrix. In step 10, the overall utility is calculated according to the utility of each user, that is, the current state s is calculated(k)Take action pi down(k)The prize of (1). Steps 11-12 are updating the values of the state-action pairs in the decision matrix Q.
Step 14 is outputting the position perturbation strategy pi when convergence is reached or the cycle number is over*
The algorithm executes epsilon times in a co-loop mode, and in each loop iteration, a leader can acquire the current optimal position disturbance scheme strategy pi through a Q table according to the time complexity of O (1). The temporal complexity of the follower assignment task is O (n)2). The time to calculate the utility of all users is o (m). In summary, the time complexity of the Q-learning based position perturbation decision algorithm proposed by the present invention is O (epsilon × max (m, n)2))。
5 Experimental and results analysis
The performance of the privacy budget-task allocation gaming mechanism is evaluated through simulation experiments. The following describes specific experimental environmental parameters and analyzes the experimental results.
Table 1 lists the value settings of the basic parameters in the experiment. In a perception environment area of 5km multiplied by 5km, 10 users participate in perception of tasks, 5 perception tasks are to be distributed in a platform, and the issuing radius of each task is 1 km. Each user selects a maximum privacy budget that he/she can accept,assuming that the initial privacy budget for each user is 5, the most appropriate privacy budget is then selected for each user in the algorithm iteration. Privacy weight coefficient lambda of each useriA positive distribution with a mean of 1 and a variance of 5 was followed. This is because location privacy protection and assigned tasks are equally important to the user as a whole. The learning rate, discount factor and greedy strategy coefficients in Q-learning are set to 0.2,0.7 and 0.8, respectively.
Table 1 experimental environment parameter settings
Table 1 Experimental parameters
Figure GDA0002589550860000131
5.1 evaluation of Q-learning Algorithm Performance
A random algorithm that randomly selects a perturbation strategy for the user is used as Baseline against the Q-learning algorithm of the present invention.
Fig. 4a and 4b compare the performance of the Q-learning algorithm and the stochastic algorithm used in the present invention in terms of the user overall utility and the task assignment utility, respectively. The experimental chart shows that the performance of the Q-learning algorithm is obviously superior to that of the random algorithm no matter the overall utility of the user or the task allocation utility. In the random algorithm, the privacy budget is randomly selected for each user in each iteration process, so that the result of task allocation at each time is inconsistent, the user utility and the task allocation utility are expected to fluctuate up and down, and convergence cannot be realized. FIG. 4a shows the trend of increasing and then smoothing the overall utility of the user in the Q-learning algorithm. This is because the initial privacy budget with the least degree of privacy protection uploaded by each user is selected by default at the time the Q-learning algorithm is just started, resulting in a low user utility expectation for the assigned task. With the increase of the iteration times, the algorithm continuously selects a more appropriate privacy budget for the user, and the overall utility expectation of the user is increased. Also, in fig. 4b, since the privacy protection of the user is small initially, the usability of the user location is high in the task allocation stage, so that the result of task allocation is closer to the optimal value. As user utility expectations increase, user privacy protection becomes greater and location availability decreases, resulting in a small expected increase in travel distance and a slight decrease in task assignment utility. According to experimental results, the mechanism provided by the invention can better protect the position privacy of the user while optimizing the task allocation utility, improve the overall utility of the user and achieve the win-win situation between the user and the platform.
5.2 influence of the number of Users on System Performance
The number of mobile users, which is an indispensable part of the MCS system, is an important factor for measuring the performance of the system. Fig. 5a and 5b show the influence of the number of users on the system performance in an MCS system with a task number of 5 and a task issuing radius of 1 km. It can be seen from fig. 5b that the average travel distance of both No-privacy and Q-learning algorithm proposed by the present invention becomes smaller as the number of users increases. This is because an increase in the number of users will result in new candidates appearing closer to the task. When a task is assigned to a new candidate, the average travel distance will be significantly reduced, thereby improving overall task assignment utility. At the same time, the average travel distance using randomly selected Baseline may increase due to the appearance of candidates that are further away from the task. Because the number of tasks is fixed, and users close to the tasks can select stronger protection schemes, the utility of the users assigned with the tasks does not fluctuate greatly along with the change of the total number of the users. The experiment result shows that the average travel distance can be effectively reduced when the number of the users is increased, the average travel distance is close to the optimal value without privacy protection, and the effectiveness of task allocation is obviously improved.
5.3 impact of task publishing radius on System Performance
The issue radius of the task also affects the performance of the system, and too small an issue radius may result in no user in the task issue range and the task cannot be distributed and executed. Fig. 6a and 6b show the effect of the task distribution radius on the system performance in the MCS system with the number of users 10 and the number of tasks 5. As can be seen from fig. 6a and 6b, when the release radius is less than 1km, as the task release radius increases, the overall utility and the average travel distance of the user become larger. This is because tasks that have no users in the native region will be requested and successfully distributed as the issue radius increases. When the radius is larger than 1km, the user's overall utility and average travel distance tend to be stable. The reason for one is that all tasks are allocated and no new users are allocated tasks anymore. On the other hand, the matrix of task assignment at this time does not change due to the increase of the issue radius.
The experimental result shows that the algorithm can improve the overall utility of the user while ensuring the task allocation utility in the MCS system providing the personalized privacy protection. Meanwhile, the effect is better in an MCS system with larger task release radius and more users participating in sensing tasks.
6 concluding remarks
The invention provides a win-win game mechanism for user position privacy protection and platform task allocation in a mobile crowd sensing MCS, and a balance point is solved by a reinforcement learning means. The core idea is as follows: providing personalized position privacy protection for users to attract more users to participate in MCS perception tasks; the utility of the overall user is improved as much as possible while the utility of the platform task allocation is optimized through the game. Experimental results show that the game mechanism provided by the invention can well solve the balance problem of task distribution and user position privacy protection, and has better effect in a system with large task release radius and more users.

Claims (2)

1. A game method for position privacy protection and platform task allocation in mobile crowd sensing is characterized in that in a mobile crowd sensing system MCS, after receiving a task request, an MCS platform issues a task; users who wish to perform tasks provide location information to the MCS platform; the MCS platform selects users and distributes tasks, and is characterized in that a trusted third party TTP simulates the interaction between the users and the MCS platform; the method comprises the following steps: 1) for the tasks issued by the MCS platform, users who wish to execute the tasks transmit the true distance and privacy budget to the applied tasks to the TTP; 2) simulating the interaction process of the user and the MCS platform in the TTP, and obtaining the optimal disturbance position of the user; 3) the MCS platform selects a user allocation task according to the optimal disturbance position of the user from the TTP;
in the step 2), the interaction process between the users and the MCS platform is simulated by the Stencoerberg game, the whole leader user is used as a leader in the Stencoerberg game model, and the MCS platform is used as a follower in the model; the steps of the leader and the following interaction process are as follows:
2.1) the leader selects a privacy budget and communicates to the follower the perturbation policy of its location;
2.2) the follower allocates tasks for the follower according to the disturbance strategy of the leader by minimizing the travel distance;
2.3) after receiving the task allocation result of the follower, the leader adjusts the disturbance strategy, transmits the disturbance strategy of the new position to the follower, and repeatedly executes the step 2.2) until the balance point is reached, and then the circulation is ended to obtain the optimal position disturbance strategy; under a balance point, the optimal state of the utility of the leader is maximized while the task distribution utility is ensured;
2.4) obtaining the optimal disturbance position of the user according to the optimal position disturbance strategy during the balance point, and then entering the step 3) for processing.
2. The game method for location privacy protection and platform task allocation in mobile crowd sensing according to claim 1, wherein in the steps 2.2) and 2.3), different location perturbation strategies are tried continuously by using a reinforcement learning method, and finally an optimal location perturbation strategy is obtained;
firstly, a Markov decision process is used for representing a process of obtaining an optimal disturbance strategy:
in the Markov decision process, an agent is used as a leader, and the environment is used as an interactive process of the leader and a follower; the five elements of the markov decision process are:
element 1: at time t, system state s(t)Perturbed by location strategy pi(t-1)And a task allocation matrix A(t-1)Composition is carried out;
initial state is s(1)=[π(0),A(0)]In which pi(0)Represents: the privacy budget of each user is an initial value transmitted to the TTP, namely the minimum privacy protection strength accepted by the user;
element 2: location perturbation strategy
Figure FDA0002944780450000011
An action that is a leader; each user can select the set of privacy budgets provided by the TTP
Figure FDA0002944780450000012
In any one of the location perturbation schemes meeting the privacy requirement of the leader, the action strategy space of the leader is
Figure FDA0002944780450000013
Element 3: at time t, the system state s(t)Taking action pi(t)Post arrival state s(t+1)(ii) a The system state is composed of a position disturbance strategy and a task allocation matrix, and the task allocation matrix depends on the disturbance strategy, so that the state at the next moment is determined by the current state and the current action, P(s)(t+1)|s(1),π(1),s(2),π(2),..,s(t),π(t))=P(s(t+1)|s(t),π(t)) I.e., state transitions are markov;
element 4: the reward R represents the reward for executing corresponding actions in the current state; in a state s(t)Take action pi(t)Then, the return value is equal to the integral utility value of the user at the moment;
element 5: the discount factor gamma is more than or equal to 0 and less than or equal to 1, the importance degree of the future return and the current return is represented, the condition that gamma is 0 means that only the current reward is seen, and the condition that gamma is 1 means that the future reward is as important as the current reward is represented;
because both the state space and the action space are limited, the position disturbance decision problem is a limited Markov decision process;
then, a Q-learning algorithm is adopted to solve the Markov decision process to obtain the initial state s(1)A final execution action is initiated to maximize convergence of the accumulated reward values;
in the Q-learning algorithm, a decision matrix Q is created by the agent, where rows represent states and columns represent actions, used to store the values of the state-action pairs;
initialization: initializing a learning rate alpha and a discount factor gamma used in the algorithm, initializing a decision matrix Q to a zero matrix, and initializing a task allocation matrix to the zero matrix;
firstly, the leader provides privacy protection for the privacy level by using the privacy budget initial value, and selects an action through an e-greedy algorithm
Figure FDA0002944780450000021
Then, executing an action pi, and uploading the disturbed user position to a follower; the follower distributes tasks according to the received privacy budget, the disturbance position and the application task set and generates a distribution matrix A(k)
Calculating the utility of each user according to the current distribution matrix;
then, according to the utility of each user, the overall utility is calculated, namely the current state s is calculated(k)Take action pi down(k)The reward of (1);
iteratively updating the values of the state-action pairs in the Q matrix through a Bellman equation;
repeating the above process; a final execution action that maximizes convergence of the accumulated reward values;
position disturbance strategy pi when output reaches convergence or cycle number is finished*
CN202010629965.2A 2020-07-03 2020-07-03 Game method for position privacy protection and platform task allocation in mobile crowd sensing Active CN111770454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010629965.2A CN111770454B (en) 2020-07-03 2020-07-03 Game method for position privacy protection and platform task allocation in mobile crowd sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010629965.2A CN111770454B (en) 2020-07-03 2020-07-03 Game method for position privacy protection and platform task allocation in mobile crowd sensing

Publications (2)

Publication Number Publication Date
CN111770454A CN111770454A (en) 2020-10-13
CN111770454B true CN111770454B (en) 2021-06-01

Family

ID=72723507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010629965.2A Active CN111770454B (en) 2020-07-03 2020-07-03 Game method for position privacy protection and platform task allocation in mobile crowd sensing

Country Status (1)

Country Link
CN (1) CN111770454B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288478A (en) * 2020-10-28 2021-01-29 中山大学 Edge computing service incentive method based on reinforcement learning
CN112543420B (en) * 2020-11-03 2024-04-16 深圳前海微众银行股份有限公司 Task processing method, device and server
CN112967118B (en) * 2021-02-03 2023-06-20 华南理工大学 Mobile crowd sensing excitation method, device, system and storage medium
CN112866993B (en) * 2021-02-06 2022-10-21 北京信息科技大学 Time sequence position publishing method and system
CN113377655B (en) * 2021-06-16 2023-06-20 南京大学 Task allocation method based on MAS-Q-learning
CN114254722B (en) * 2021-11-17 2022-12-06 中国人民解放军军事科学院国防科技创新研究院 Multi-intelligent-model fusion method for game confrontation
CN114415735B (en) * 2022-03-31 2022-06-14 天津大学 Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method
CN116744289B (en) * 2023-06-02 2024-02-09 中国矿业大学 Intelligent position privacy protection method for 3D space mobile crowd sensing application

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103533078B (en) * 2013-10-24 2017-07-21 无锡赛思汇智科技有限公司 A kind of method and system for generating map
CN103761485B (en) * 2014-01-13 2017-01-11 清华大学 Privacy protection method
CN105407482B (en) * 2015-11-04 2019-01-22 上海交通大学 The guard method of user location privacy in mobile gunz sensing network
CN105528248B (en) * 2015-12-04 2019-04-30 北京邮电大学 Intelligent perception incentive mechanism under multitask collaboration application
US10111031B2 (en) * 2016-01-22 2018-10-23 The United States Of America As Represented By The Secretary Of The Air Force Object detection and tracking system
CN108200610B (en) * 2018-02-26 2021-10-22 重庆邮电大学 Crowd sensing resource allocation method adopting distributed game
CN108668253A (en) * 2018-04-09 2018-10-16 南京邮电大学 A kind of gunz cooperative sensing motivational techniques based on evolutionary Game
CN109214205B (en) * 2018-08-01 2021-07-02 安徽师范大学 K-anonymity-based position and data privacy protection method in crowd-sourcing perception
CN110390560A (en) * 2019-06-28 2019-10-29 浙江师范大学 A kind of mobile intelligent perception multitask pricing method based on Stackelberg game

Also Published As

Publication number Publication date
CN111770454A (en) 2020-10-13

Similar Documents

Publication Publication Date Title
CN111770454B (en) Game method for position privacy protection and platform task allocation in mobile crowd sensing
Wang et al. Dependent task offloading for edge computing based on deep reinforcement learning
CN113434212B (en) Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning
CN111754000A (en) Quality-aware edge intelligent federal learning method and system
Kaur et al. Deep‐Q learning‐based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud
CN111866954B (en) User selection and resource allocation method based on federal learning
Kaur et al. A novel multi-objective bacteria foraging optimization algorithm (MOBFOA) for multi-objective scheduling
CN110458663B (en) Vehicle recommendation method, device, equipment and storage medium
CN112052071B (en) Cloud software service resource allocation method combining reinforcement learning and machine learning
CN110009233B (en) Game theory-based task allocation method in crowd sensing
CN109308246A (en) Optimization method, device and the equipment of system parameter, readable medium
Wang et al. Joint service caching, resource allocation and computation offloading in three-tier cooperative mobile edge computing system
CN112905013B (en) Agent control method, device, computer equipment and storage medium
CN113778691A (en) Task migration decision method, device and system
Li et al. Batch jobs load balancing scheduling in cloud computing using distributional reinforcement learning
Chen et al. A novel marine predators algorithm with adaptive update strategy
Alexandrescu et al. A genetic algorithm for mapping tasks in heterogeneous computing systems
Chen et al. A pricing approach toward incentive mechanisms for participant mobile crowdsensing in edge computing
CN110743164B (en) Dynamic resource partitioning method for reducing response delay in cloud game
Park et al. Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation
CN114385359B (en) Cloud edge task time sequence cooperation method for Internet of things
Mouli et al. Making the most of preference feedback by modeling feature dependencies
Huang et al. Multi-objective task offloading for highly dynamic heterogeneous Vehicular Edge Computing: An efficient reinforcement learning approach
CN118093102B (en) Resource allocation method in crowd sensing
Gandhi et al. Optimizing Workload Scheduling in Cloud Paradigm using Robust Neutrosophic C-Means Clustering Boosted with Fish School Search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant