CN111770454B - Game method for position privacy protection and platform task allocation in mobile crowd sensing - Google Patents
Game method for position privacy protection and platform task allocation in mobile crowd sensing Download PDFInfo
- Publication number
- CN111770454B CN111770454B CN202010629965.2A CN202010629965A CN111770454B CN 111770454 B CN111770454 B CN 111770454B CN 202010629965 A CN202010629965 A CN 202010629965A CN 111770454 B CN111770454 B CN 111770454B
- Authority
- CN
- China
- Prior art keywords
- user
- task
- privacy
- platform
- disturbance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 230000008569 process Effects 0.000 claims abstract description 33
- 230000003993 interaction Effects 0.000 claims abstract description 21
- 230000002787 reinforcement Effects 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims description 38
- 238000004422 calculation algorithm Methods 0.000 claims description 37
- 230000009471 action Effects 0.000 claims description 32
- 230000000875 corresponding effect Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 claims 1
- 230000007246 mechanism Effects 0.000 abstract description 23
- 230000006870 function Effects 0.000 description 10
- 230000008447 perception Effects 0.000 description 8
- 239000003795 chemical substances by application Substances 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000012358 sourcing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001174 ascending effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
- H04W4/029—Location-based management or tracking services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/02—Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Medical Informatics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a game method for position privacy protection and platform task allocation in mobile crowd sensing, which firstly simulates the interaction between a user and a platform through a credible third party: and each user selects a privacy budget to add noise to the position, and the platform allocates tasks according to the disturbance position of each user. The interaction process is then modeled as a game and equilibrium points are derived. And finally, continuously trying different position disturbance strategies by using a reinforcement learning method, and outputting an optimal position disturbance scheme. Experimental results show that the mechanism can improve the overall utility of the user as much as possible while optimizing the task allocation utility, so that the user and the platform achieve win-win results. The method solves the problems that in the process that an MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks, due to the existence of malicious attackers, the position availability is poor and the task allocation effectiveness is reduced when the privacy protection strength of the users is improved.
Description
Technical Field
The technical scheme belongs to the technical field of networks, and particularly relates to a win-win game method for protecting position privacy and distributing platform tasks in a mobile crowd sensing MCS.
Background
In recent years, the popularity of Mobile Crowd Sensing (MCS) has been greatly promoted by the explosive development of internet of things technology. A typical MCS system consists of a data requester, a server (MCS platform) and a mobile user. The server distributes the tasks of the data requesters to the mobile users in the MCS system, and the mobile users use the mobile intelligent equipment to complete data acquisition and send back to the server and obtain certain rewards.
Task allocation is one of the most important links in the MCS system. The goal is to optimize the utility of the overall system while accomplishing all (or most) of the tasks of the target perception area. Minimizing the travel distance is typically selected as an optimization target for MCS task allocation. However, the travel distance cannot be calculated from the user's location information, and if the true location is transmitted to the MCS platform, the user will be at risk of personal privacy disclosure. Therefore, to attract more users to participate in the perception task, the MCS system must provide location privacy protection for the users.
The space camouflage technology in the traditional position privacy protection technology can also be used for user position privacy protection in MCS task allocation. The level of privacy protection provided by this technique is easily reduced if a malicious attacker in the MCS system has some a priori knowledge. Differential privacy techniques can be used to provide powerful location privacy protection for users without regard to prior knowledge of the adversary. In addition, considering that different users have different requirements on privacy protection, the MCS system needs to provide privacy protection with a plurality of different privacy budgets for the users to choose from.
The travel distance is an important index for measuring the MCS task allocation cost. Researchers have proposed an ActiveCrowd task allocation framework that considers time sensitivity, aiming at minimizing the total distance of movement, and solving the user selection problem of multitasking in MCS. Since the MCS platform predicts the true locations of all users, this may reveal user location privacy, reducing the willingness of the users to participate in perception. There are also researchers using the conventional spatial masquerading technique in LBS to protect the location privacy of users in the task allocation. Researchers also propose a spatial crowdsourcing mechanism based on differential privacy and geographic positioning, and provide efficient services for the outside while providing the users with the same location privacy protection with the same privacy budget. Some researchers use differential privacy technology to obscure the user positions and provide all users with the same degree of position privacy protection in the task allocation process. However, this framework is difficult to accommodate for user-differentiated privacy protection requirements. In consideration of the personalized privacy protection requirements of users, researchers also put forward a personalized privacy protection task distribution framework, and the users are allowed to specify the privacy budget by using a K-anonymity thought, so that personalized position privacy protection is provided for the users. Due to the fact that the randomness of the user for selecting the privacy budget is strong, especially when a malicious attacker exists in the MCS system, the user selects the privacy budget with higher privacy protection degree to cause the usability of the user position to be reduced, and the task of distributing the MCS platform is not facilitated.
Disclosure of Invention
From the foregoing discussion of the prior art, it can be seen that, in the process of designing a task allocation framework for providing personalized privacy protection, besides ensuring that the MCS platform allocates tasks efficiently, a more appropriate location privacy protection is provided for the user.
Game theory is an effective approach to the MCS system performance tradeoff problem, as in related studies of MCS incentive mechanisms, game theory is used to provide methods such as auctions, pricing, and reputation based mechanisms to incentivize users to participate in MCS awareness. Trusted Third Parties (TTPs) are the most important part of the mechanism. The TTP not only needs to provide position privacy protection for the user, but also simulates interaction of user selection of privacy budget and MCS platform allocation task, and establishes the most appropriate personalized privacy protection for the user.
The mobile crowd-sourcing aware MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks. However, due to the existence of malicious attackers, the usability of the location is deteriorated due to the enhancement of privacy protection of the user, and the task allocation utility is reduced.
The invention provides a game method for position privacy protection and platform task allocation in mobile crowd-sourcing perception, which is a user and platform win-win game method based on reinforcement learning and comprises the following steps:
firstly, simulating interaction between a user and an MCS platform through a TTP of a trusted third party: each user selects a privacy budget to add noise to the position, and the MCS platform allocates tasks according to the disturbance position of each user;
modeling the interaction process into a game, and deducing balance points;
and finally, continuously trying different position disturbance strategies by using a reinforcement learning method, and outputting an optimal position disturbance scheme.
The win-win game method for user position privacy protection and platform task allocation uses a reinforcement learning algorithm, and trains an off-line model capable of outputting an optimal position disturbance strategy by continuously trying the position disturbance scheme combination of all users. The experimental result shows that the privacy budget task allocation game can establish personalized and most appropriate position privacy protection for the user in an MCS system providing personalized privacy protection, so that the privacy protection of the user is improved as much as possible while the task allocation utility is ensured, and the win-win situation of the user and the platform is achieved.
Drawings
FIG. 1 is an overall frame of an MCS system;
FIG. 2 is a schematic illustration of a privacy budget-task allocation game in a trusted third party TTP;
FIG. 3 is a decision framework based on reinforcement learning;
FIGS. 4a and 4b are schematic diagrams comparing the performance of the algorithm of the present invention with a random algorithm;
wherein: FIG. 4a is user overall utility and FIG. 4b is task assignment utility;
FIGS. 5a and 5b are schematic diagrams of the effect of the number of users;
wherein: FIG. 5a is the user's overall utility and FIG. 5b is the average travel distance;
FIGS. 6a and 6b are schematic diagrams of the effect of the task publication radius;
wherein: fig. 6a is the user's overall utility and fig. 6b is the average travel distance.
Detailed Description
The technical scheme is further explained by combining the drawings and the detailed implementation mode as follows: scheme 1 overview
The mobile crowd-sourcing aware MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks. However, due to the existence of malicious attackers, the usability of the location is deteriorated due to the enhancement of privacy protection of the user, and the task allocation utility is reduced.
Aiming at the problem, the game method provided by the invention firstly simulates the interaction between the user and the platform through a credible third party: and each user selects a privacy budget to add noise to the position, and the platform allocates tasks according to the disturbance position of each user. The interaction process is then modeled as a game and equilibrium points are derived. And finally, continuously trying different position disturbance strategies by using a reinforcement learning method, and outputting an optimal position disturbance scheme. Experimental results show that the mechanism can improve the overall utility of the user as much as possible while optimizing the task allocation utility, so that the user and the platform achieve win-win results. The method solves the problems that in the process that an MCS system needs to provide personalized privacy protection for users to attract more users to participate in tasks, due to the existence of malicious attackers, the position availability is poor and the task allocation effectiveness is reduced when the privacy protection strength of the users is improved.
The game method is characterized in that in a mobile crowd sensing MCS system, after receiving a task request, an MCS platform issues a task; providing position information to the MCS platform intentionally wishing to perform a task; the MCS platform selects users and distributes tasks, and is characterized in that a trusted third party TTP simulates the interaction between the users and the MCS platform; the method comprises the following steps: 1) for the tasks issued by the MCS platform, users who wish to execute the tasks transmit the true distance and privacy budget to the applied tasks to the TTP; 2) simulating the interaction process of the user and the MCS platform in the TTP, and obtaining the optimal disturbance position of the user; 3) the MCS platform selects a user allocation task according to the optimal disturbance position of the user from the TTP;
in the step 2), the interaction process between the users and the MCS platform is simulated by the Stencoerberg game, the whole leader user is used as a leader in the Stencoerberg game model, and the MCS platform is used as a follower in the model; the steps of the leader and the following interaction process are as follows:
2.1) the leader selects a privacy budget and communicates to the follower the perturbation policy of its location;
2.2) the follower allocates tasks for the follower according to the disturbance strategy of the leader by minimizing the travel distance;
2.3) after receiving the task allocation result of the follower, the leader adjusts the disturbance strategy, transmits the disturbance strategy of the new position to the follower, and repeatedly executes the step 2.2) until the balance point is reached, and then the circulation is ended to obtain the optimal position disturbance strategy; under a balance point, the optimal state of the utility of the leader is maximized while the task distribution utility is ensured;
2.4) obtaining the optimal disturbance position of the user according to the optimal position disturbance strategy during the balance point, and then entering the step 3) for processing.
In the steps 2.2) and 2.3), different position disturbance strategies are continuously tried by using a reinforcement learning method, and finally, an optimal position disturbance strategy is obtained.
Firstly, a Markov decision process is used for representing a process of obtaining an optimal disturbance strategy: then, a Q-learning algorithm is adopted to solve the Markov decision process to obtain the initial state s(1)A final execution action is initiated to maximize convergence of the accumulated reward values;
the following were used:
2 System model
As shown in fig. 1, the overall framework of the MCS system includes an MCS platform, a user and a trusted third party TTP.
Upon receiving the request task, the platform issues a set of tasksThe corresponding radius set of the task release area isIn the task allocation stage, the platform allocates tasks with the minimum travel distance as a target according to task application information transmitted by a trusted third party to generate an allocation matrix An×mAnd completing task allocation.
The set of mobile users in the system is represented asAfter the platform releases the task, each user wiSending a tripletTo a trusted third party. Wherein the variableIs the user wiMaximum privacy budget accepted (the greater the privacy budget, the less the privacy protection, the greater the possibility of location privacy disclosure), aggregation Represents user wiSet of requested tasks, vectorRepresents user wiTo kiThe true distance vector of the individual tasks. After the platform allocates the tasks, according to the allocation matrix An×mAnd executing the task assigned to the user to a specific task position.
The trusted third party is a provider for protecting location privacy, is also a decision maker for a user optimal location perturbation scheme, and is an extremely important part in the system.A set of privacy budgets representing h different degrees of protection offered by a trusted third party. After receiving the triples uploaded by the user, the trusted third party is the user wiProviding a privacy budget εi(I.e. satisfy user wiPersonalized requirement of) to obtain the location perturbation policy vector pi ═ (epsilon) of all users1,ε2,...,εn) Then simulating the platform to assign tasks with minimized travel distance, generating an assignment matrix An×mThen, the position disturbance strategy pi of the user is adjusted according to the distribution matrix so that the utility of the user is maximized, then task distribution is carried out, and continuous iteration is carried out to obtain the optimal position disturbance strategyFinally, the trusted third party connects user wiTask application information ofAnd uploading to a real system platform.
It is assumed that each user can apply for a plurality of tasks, each task can be assigned to only one user, and only one user can perform one task.
3 problem modeling
The method comprises the steps of firstly introducing a generalized differential privacy concept, then analyzing position privacy protection provided by a system, then introducing a task allocation mode of a platform, finally explaining user privacy protection-platform task allocation game, and deducing a balance point of the game.
3.1 generalized differential privacy
For any two adjacent data sets x, x 'and any output Y, if the probability distribution M (x), M (x') is such that the maximum difference in Y is eεI.e. M (x) (Y) eεM (x') (Y), then the mechanism M is differential privacy satisfying the privacy budget epsilon. For any two positions x and x ', if their Euclidean distance satisfies d (x, x ') ≦ r, then under the obfuscation mechanism M, M (x) and M (x ') do not differ by more than ε r, which represents the privacy budget per unit distance. In this case, even if a malicious attacker knows the obfuscation mechanism M, the true location cannot be discerned.
Definition 1 (d)xDifferential privacy) mechanism M satisfies dxDifferential privacy if and only if for arbitraryAre all provided with
Where M (x) and (Y) represent the probability that x belongs to set Y. dxε d (x, x '), where ε is the privacy budget, the smaller ε, the more aggressive the privacy protection, and d (x, x ') represents the distance between x and x '.
Definition 2 (Laplace mechanism) hypothesis isTwo sets, dxIs thatA metric of (1). The probability density function for all elements x in x isWherein Then the mechanismIs selected from (x, d)x) ToAnd satisfies dxDifferential privacy.
In particular, when the elements of x and y are both one-dimensional, the Laplace mechanism indicates that the transformed value y results from the initial value x adding the corresponding noise, i.e., y ═ x + Lap (1/epsilon). When the mechanism M satisfies the condition dxDifferential privacy of ∈ | x-y |.
It is obvious that d is satisfied for any onexMechanism for differential privacy M, when dx≤dxWhen M also satisfies dxDifferential privacy.
3.2 location privacy protection
Users who are willing to perform a task need to upload the true distance and privacy budget to the applied task to the TTP. The TTP adds corresponding Laplace noise to the real distance according to the received privacy budget, so that an attacker cannot deduce the real position information of the user even knowing a specific position fuzzy mechanism, and the position privacy of the user is protected.
Because the user needs to upload the true distance of the applied task to the TTP, and the TTP finally uploads the disturbed disturbance distance to the MCS platform, the more the number of tasks applied by the user is, the more the exposed location information is, and the possibility of privacy disclosure is correspondingly increased. Meanwhile, the possibility of privacy disclosure has a direct relation with the privacy budget.
And (3) proving that: for arbitrary di,dij∈diAnd dij∈diAll represent user wiTo task tjPossible true distance, and dij-d′ij|≤rj。Presentation reporting to MCS levelsUser w of a stationiTo task collectionsPerturbing the distance vector, i.e.Wherein eta1,η2,...,Is subject to Laplace (0, 1/epsilon)i) K of (a)iIndependent and equally distributed random variables. Thus, there are
And is also provided withSo that M is M (d)i,εi)=di+Lap(1/εi) Satisfies a privacy level ofDifferential privacy of (1).
The trusted third party receives the user wiTransmitted tripletThen, M (d)i,εi) Providing location privacy protection for it. WhereinI.e. to provide a more aggressive privacy protection.
As can be appreciated from proposition 1, at this point, perturbation mechanism M (d)i,εi) Still meet the privacy level ofIs privacy differentiated.
3.3 task Allocation
And the platform sorts the applicant of each task in a descending order according to the probability of being closer to the task according to the minimum privacy budget of the user, the application task set and the disturbed distance vector transmitted by the trusted third party. And after calculating the descending sequence of the applicants of all the tasks, distributing each task to the nearest user.
Suppose user waAnd wbIs task tjAny two applicants, dajAnd dbjRespectively, representing their true distance to the task. When d isaj<dbjThen t isjIs assigned to waIs even greater. In other words, whenAt task tjIn descending order sequence of (1), user waArranged at user wbBefore (c) is performed.Is obtained by reaction at dajObtained by adding Laplace noise, thereby obtaining
The same can be obtained
Wherein mua,μbRespectively is Laplace (0, 1/epsilon)a),Laplace(0,1/εb) The variable of (c). So that there are
The user w can be calculated by double integral evaluation of the formula (4)aUser wbThe probability of closer distance, thereby determining waAnd wbAt task tjThe order of the front and back in the sequence. For task tjAll applicants can find out more than one t by comparing every twojA user sequence ordered in ascending distance. By performing the same calculation for other tasks, a ranking matrix can be calculated
Line SjRepresenting a task tjOf the ordered sequence, element sjiK denotes an application execution task tjUser w ofkRanked at the ith position in all applicants. When i is greater than tjWhen the number of applicants is sjiInfinity. At this time, the task assignment problem aimed at minimizing the overall travel distance is reduced to assigning each task to the ranking matrix Sn×mThe first user of each row. However, when the same user ranks at the same order of a plurality of tasks, a conflict occurs, that is, the plurality of tasks are all allocated to the user, and then the optimal allocation scheme can be obtained by performing linear programming with an integer of 0 to 1 and combining formula (4) to eliminate the conflict.
The end result of task assignment is to generate an assignment matrix
For any aij∈An×mHas aij∈{0,1}。aijA value of 1 indicates a task tjIs allocated to user wi。Each task is distributed to at most one user to execute;indicating that each user performs at most one task.
3.4 privacy budget-task distribution Game
To provide the most appropriate privacy protection for the user, TTP needs to simulate the user selection of perturbation strategies, simulate platform assignment tasks, and simulate user-platform interactions. This interaction process is modeled as a Stancodelberg game (Stackelberg game): the user as a whole is used as a leader to convey the position disturbance strategy of the user as a whole to the platform; the MCS platform is used as a follower to allocate tasks by taking the minimum travel distance as a target according to a disturbance strategy of a user; after receiving the task allocation result of the platform, the user adjusts the overall disturbance strategy to maximize the overall utility, so that the interaction is continuous.
The two game parties are two virtual entities in the TTP respectively: a leader and a follower. And the leader simulates a user to select a disturbance strategy, and the follower simulation platform distributes tasks. As shown in FIG. 2, the leader is first user wiSelecting a privacy budget εiProviding satisfaction of the privacy budget εiThe protection mechanism M of (2), the user overall protection strategy is recorded as pi, the mechanism M (d)i,εi) By the formula
User M (d)i,εi) Uploaded true distance vector d to the requested taskiPerturbation as a vectorThe leader uploads the current policy pi to the platform. The follower distributes tasks by taking the minimum travel distance as a target according to the received pi to obtain a task distribution matrix An×m。aijIs a matrix An×mThe value of (b) is 0 or 1. a isijA value of 1 indicates a task tjIs assigned to user wiAnd if the value is 0, the task t is indicatedjIs not allocated to user wi。
After the platform task is allocated, user wiIs expected to have a utility function of
Wherein λ isiIs the user wiRepresents the user's tendency strength between location privacy protection and assigned tasks, lambdai>1 indicates that location privacy is more likely to be protected.Indicating that a trusted third party is providing a privacy budget εiAfter differential privacy protection of (1), user wiThe distance between the blurred distance vector and the true distance vector is expected to be
The utility function expectation of the user as a whole can be expressed as
The platform utility function is expressed as
WhereinIndicating the number of tasks that are to be assigned,indicating the travel distance expectation for the assigned task. The utility of the platform is expressed in terms of the inverse of the average travel distance, the greater the average travel distance, the lower the utility of the platform.
It is desirable to maximize personal utility as much as possible for a reasonable user. That is, after being assigned with a task, the privacy protection strength is increased in an attempt to better protect the privacy. If not assigned the task, will try to reduce privacy protection degree, let oneself more have an opportunity to select, and then improve individual's usefulness. Therefore, after the follower simulates task allocation each time, the leader can adjust privacy protection strategies of all users according to the current allocation matrix, and the overall effectiveness expectation of the users is maximized. The follower will redistribute the task according to the adjusted privacy policy to minimize the travel distance. The leader and the follower reach a balance point finally through continuous interaction, namely
This balance point is the optimum state point that maximizes the user's overall utility while optimizing the task allocation utility. At this time, the optimal perturbation strategy selected by the user according to the current task allocation result is the current strategy, and the optimal result of the platform performing task allocation according to the perturbation strategy of the current user is the current task allocation result.
Due to the strategy pi, the selection space isThe temporal complexity of the traversal is O (h)m). The time complexity of task allocation is approximately O (n)2) (ii) a The overall time complexity is about O (h)mn2). Since the number m of users in the system is often very large, which results in too high time complexity, the brute force exhaustion method is obviously not the best method for solving the problem.
4 location perturbation decision based on reinforcement learning
Reinforcement learning is useful for solving the problem of an agent maximizing return value during interaction with the environment, and a common model is the standard Markov Decision Process (MDP). Therefore, the invention adopts a reinforcement learning method to solve the problem of disturbance strategy decision for maximizing the utility of the user under the condition of high-efficiency task allocation. This section introduces the MDP of the location perturbation strategy decision problem, followed by the Q-learning algorithm for solving the optimal perturbation strategy.
4.1 MDP of decisions
The Markov decision process is a sequential decision model used for simulating the execution action of an agent and acquiring return in an environment with Markov system state. It is usually expressed as a five-tuple < S, a, P, R, γ >, where S denotes the system state, a denotes the action of the agent, P denotes the transfer function between the system states, R denotes the reward, and γ denotes the discount factor.
The process of the trusted third party selecting the optimal perturbation strategy for the user can be seen as a markov process. The agent is a leader in a trusted third party, and the environment is an interaction process of the leader and a follower. The MDP five-element of the location perturbation strategy decision problem is described in detail below:
the system state is composed of a disturbance strategy vector pi and a task allocation matrix A. Initial state s(1)=[π(0),A(0)]In which pi(0)The privacy budget of each user is represented as an initial value uploaded to a trusted third party, namely the minimum privacy protection degree accepted by the user.
Perturbation strategyAn action that is a leader. Since each user can select a set of privacy budgets provided by a trusted third partyAny one of the location perturbation schemes meeting the privacy requirement of the leader is adopted, so that the action strategy space of the leader is
At time t, the system state s(t)Taking action pi(t)Post arrival state s(t+1). Because the state is composed of the disturbance strategy and the task allocation matrix, and the task allocation matrix depends on the disturbance strategy, the state at the next moment is determined by the current state and the current action, and the conditions are met
P(s(t+1)|s(1),π(1),s(2),π(2),...,s(t),π(t))=P(s(t+1)|s(t),π(t)) (14)
I.e. the state transition has markov properties.
The reward R represents the reward for performing the corresponding action in the current state. Using equation (10) as a reward value calculation equation, i.e. at state s(t)Take action pi(t)And then the return value is equal to the utility value of the user at the moment.
The discount factor γ,0 ≦ γ ≦ 1, indicating how important the future and current rewards are, γ ≦ 0 meaning that looking at the current reward alone, γ ≦ 1 indicates that the future reward is as important as the current reward.
Since both the state space and the motion space are finite, the perturbation decision problem is a finite markov decision process. After the perturbation decision is converted into the MDP, the optimal perturbation selection problem in the privacy protection task allocation game is converted into: finding the initial state s(1)A final execution action is initiated that maximizes convergence of the accumulated reward values.
4.2Q-learning based location perturbation decision algorithm
The Q-learning algorithm is an effective unsupervised reinforcement learning algorithm for solving the Markov decision process. The intelligent agent finds the best strategy to achieve the maximum convergence of the reported value by continuously trial and error learning in different environments.
In the Q-learning algorithm, the agent creates a decision matrix Q, where rows represent states and columns represent actions, storing the values of the state-action pairs (s, pi), and initializing to a zero matrix. The Q matrix is iteratively updated by Bellman Equation (Bellman Equation) as follows:
Q(s,π)←(1-α)Q(s,π)+α(uw(s,π)+γV(s')), (15)
wherein alpha belongs to (0,1) and represents the learning rate, and the larger the value is, the less the training results before the retention is; u. ofw(s, π) represents the reported value of the execution of action π in state s; s' represents the next state after the state s performs the action π; γ is a discount factor, and has a value of 0 ≦ γ ≦ 1, indicating the effect of the future reward and the current reward on the action value function (Q function), γ ≦ 0 meaning that the action value function is only relevant for the current reward, γ ≦ 1 meaning that the future reward is as important as the current reward for the action value function; the function V (-) represents the maximum value in the next state of the Q matrix.
According to the decision matrix Q and the current state s, the leader can use an e-greedy strategy to avoid the algorithm from falling into local optimality. In state s, the leader performs the current optimal action with a probability of 1-eThe actions are randomly selected with a probability of e.
The Q-learning based perturbation scheme decision algorithm is described as follows:
and (3) outputting: pi
Start of
Step 3.s(k)=[A(k-1),π(k-1)]
Step 6, the follower distributes tasks and generates a distribution matrix A(k)
Step 7.for i ← 1to m do
Step 8, user wiCalculating utility according to equation (9)
Step 9.end for
Step 11, updating Q(s) according to formula (15)(k),π(k))
Step 12, updating V(s) according to formula (16)(k))
Step 13.end for
Step 14.return pi*
End up
The algorithm inputs the minimum privacy budget for all usersAnd system selectable sets of privacy budget valuesOutput optimal disturbance strategy pi*。
In step 1, a learning rate alpha and a discount factor gamma used in the algorithm are initialized, a decision matrix Q is initialized to a zero matrix, and a task allocation matrix is initialized to the zero matrix.
Steps 2-13 are a loop body with the epicode representing the maximum number of iterations of training. And 4, in the first circulation, the leader provides privacy protection for the privacy level by using the initial value of the privacy budget uploaded by the user. And in the second circulation and later, the leader selects a disturbance scheme by using an e-greedy algorithm, utilizes the previously trained optimal disturbance strategy according to the probability of 1-e, and randomly selects the disturbance strategy according to the probability of e to avoid local optimization. And 5-6, allocating tasks by the follower according to the received user privacy budget, the disturbance position and the application task set, and generating an allocation matrix. Step 7-9 is to calculate the utility of each user based on the current allocation matrix. In step 10, the overall utility is calculated according to the utility of each user, that is, the current state s is calculated(k)Take action pi down(k)The prize of (1). Steps 11-12 are updating the values of the state-action pairs in the decision matrix Q.
Step 14 is outputting the position perturbation strategy pi when convergence is reached or the cycle number is over*。
The algorithm executes epsilon times in a co-loop mode, and in each loop iteration, a leader can acquire the current optimal position disturbance scheme strategy pi through a Q table according to the time complexity of O (1). The temporal complexity of the follower assignment task is O (n)2). The time to calculate the utility of all users is o (m). In summary, the time complexity of the Q-learning based position perturbation decision algorithm proposed by the present invention is O (epsilon × max (m, n)2))。
5 Experimental and results analysis
The performance of the privacy budget-task allocation gaming mechanism is evaluated through simulation experiments. The following describes specific experimental environmental parameters and analyzes the experimental results.
Table 1 lists the value settings of the basic parameters in the experiment. In a perception environment area of 5km multiplied by 5km, 10 users participate in perception of tasks, 5 perception tasks are to be distributed in a platform, and the issuing radius of each task is 1 km. Each user selects a maximum privacy budget that he/she can accept,assuming that the initial privacy budget for each user is 5, the most appropriate privacy budget is then selected for each user in the algorithm iteration. Privacy weight coefficient lambda of each useriA positive distribution with a mean of 1 and a variance of 5 was followed. This is because location privacy protection and assigned tasks are equally important to the user as a whole. The learning rate, discount factor and greedy strategy coefficients in Q-learning are set to 0.2,0.7 and 0.8, respectively.
Table 1 experimental environment parameter settings
Table 1 Experimental parameters
5.1 evaluation of Q-learning Algorithm Performance
A random algorithm that randomly selects a perturbation strategy for the user is used as Baseline against the Q-learning algorithm of the present invention.
Fig. 4a and 4b compare the performance of the Q-learning algorithm and the stochastic algorithm used in the present invention in terms of the user overall utility and the task assignment utility, respectively. The experimental chart shows that the performance of the Q-learning algorithm is obviously superior to that of the random algorithm no matter the overall utility of the user or the task allocation utility. In the random algorithm, the privacy budget is randomly selected for each user in each iteration process, so that the result of task allocation at each time is inconsistent, the user utility and the task allocation utility are expected to fluctuate up and down, and convergence cannot be realized. FIG. 4a shows the trend of increasing and then smoothing the overall utility of the user in the Q-learning algorithm. This is because the initial privacy budget with the least degree of privacy protection uploaded by each user is selected by default at the time the Q-learning algorithm is just started, resulting in a low user utility expectation for the assigned task. With the increase of the iteration times, the algorithm continuously selects a more appropriate privacy budget for the user, and the overall utility expectation of the user is increased. Also, in fig. 4b, since the privacy protection of the user is small initially, the usability of the user location is high in the task allocation stage, so that the result of task allocation is closer to the optimal value. As user utility expectations increase, user privacy protection becomes greater and location availability decreases, resulting in a small expected increase in travel distance and a slight decrease in task assignment utility. According to experimental results, the mechanism provided by the invention can better protect the position privacy of the user while optimizing the task allocation utility, improve the overall utility of the user and achieve the win-win situation between the user and the platform.
5.2 influence of the number of Users on System Performance
The number of mobile users, which is an indispensable part of the MCS system, is an important factor for measuring the performance of the system. Fig. 5a and 5b show the influence of the number of users on the system performance in an MCS system with a task number of 5 and a task issuing radius of 1 km. It can be seen from fig. 5b that the average travel distance of both No-privacy and Q-learning algorithm proposed by the present invention becomes smaller as the number of users increases. This is because an increase in the number of users will result in new candidates appearing closer to the task. When a task is assigned to a new candidate, the average travel distance will be significantly reduced, thereby improving overall task assignment utility. At the same time, the average travel distance using randomly selected Baseline may increase due to the appearance of candidates that are further away from the task. Because the number of tasks is fixed, and users close to the tasks can select stronger protection schemes, the utility of the users assigned with the tasks does not fluctuate greatly along with the change of the total number of the users. The experiment result shows that the average travel distance can be effectively reduced when the number of the users is increased, the average travel distance is close to the optimal value without privacy protection, and the effectiveness of task allocation is obviously improved.
5.3 impact of task publishing radius on System Performance
The issue radius of the task also affects the performance of the system, and too small an issue radius may result in no user in the task issue range and the task cannot be distributed and executed. Fig. 6a and 6b show the effect of the task distribution radius on the system performance in the MCS system with the number of users 10 and the number of tasks 5. As can be seen from fig. 6a and 6b, when the release radius is less than 1km, as the task release radius increases, the overall utility and the average travel distance of the user become larger. This is because tasks that have no users in the native region will be requested and successfully distributed as the issue radius increases. When the radius is larger than 1km, the user's overall utility and average travel distance tend to be stable. The reason for one is that all tasks are allocated and no new users are allocated tasks anymore. On the other hand, the matrix of task assignment at this time does not change due to the increase of the issue radius.
The experimental result shows that the algorithm can improve the overall utility of the user while ensuring the task allocation utility in the MCS system providing the personalized privacy protection. Meanwhile, the effect is better in an MCS system with larger task release radius and more users participating in sensing tasks.
6 concluding remarks
The invention provides a win-win game mechanism for user position privacy protection and platform task allocation in a mobile crowd sensing MCS, and a balance point is solved by a reinforcement learning means. The core idea is as follows: providing personalized position privacy protection for users to attract more users to participate in MCS perception tasks; the utility of the overall user is improved as much as possible while the utility of the platform task allocation is optimized through the game. Experimental results show that the game mechanism provided by the invention can well solve the balance problem of task distribution and user position privacy protection, and has better effect in a system with large task release radius and more users.
Claims (2)
1. A game method for position privacy protection and platform task allocation in mobile crowd sensing is characterized in that in a mobile crowd sensing system MCS, after receiving a task request, an MCS platform issues a task; users who wish to perform tasks provide location information to the MCS platform; the MCS platform selects users and distributes tasks, and is characterized in that a trusted third party TTP simulates the interaction between the users and the MCS platform; the method comprises the following steps: 1) for the tasks issued by the MCS platform, users who wish to execute the tasks transmit the true distance and privacy budget to the applied tasks to the TTP; 2) simulating the interaction process of the user and the MCS platform in the TTP, and obtaining the optimal disturbance position of the user; 3) the MCS platform selects a user allocation task according to the optimal disturbance position of the user from the TTP;
in the step 2), the interaction process between the users and the MCS platform is simulated by the Stencoerberg game, the whole leader user is used as a leader in the Stencoerberg game model, and the MCS platform is used as a follower in the model; the steps of the leader and the following interaction process are as follows:
2.1) the leader selects a privacy budget and communicates to the follower the perturbation policy of its location;
2.2) the follower allocates tasks for the follower according to the disturbance strategy of the leader by minimizing the travel distance;
2.3) after receiving the task allocation result of the follower, the leader adjusts the disturbance strategy, transmits the disturbance strategy of the new position to the follower, and repeatedly executes the step 2.2) until the balance point is reached, and then the circulation is ended to obtain the optimal position disturbance strategy; under a balance point, the optimal state of the utility of the leader is maximized while the task distribution utility is ensured;
2.4) obtaining the optimal disturbance position of the user according to the optimal position disturbance strategy during the balance point, and then entering the step 3) for processing.
2. The game method for location privacy protection and platform task allocation in mobile crowd sensing according to claim 1, wherein in the steps 2.2) and 2.3), different location perturbation strategies are tried continuously by using a reinforcement learning method, and finally an optimal location perturbation strategy is obtained;
firstly, a Markov decision process is used for representing a process of obtaining an optimal disturbance strategy:
in the Markov decision process, an agent is used as a leader, and the environment is used as an interactive process of the leader and a follower; the five elements of the markov decision process are:
element 1: at time t, system state s(t)Perturbed by location strategy pi(t-1)And a task allocation matrix A(t-1)Composition is carried out;
initial state is s(1)=[π(0),A(0)]In which pi(0)Represents: the privacy budget of each user is an initial value transmitted to the TTP, namely the minimum privacy protection strength accepted by the user;
element 2: location perturbation strategyAn action that is a leader; each user can select the set of privacy budgets provided by the TTPIn any one of the location perturbation schemes meeting the privacy requirement of the leader, the action strategy space of the leader is
Element 3: at time t, the system state s(t)Taking action pi(t)Post arrival state s(t+1)(ii) a The system state is composed of a position disturbance strategy and a task allocation matrix, and the task allocation matrix depends on the disturbance strategy, so that the state at the next moment is determined by the current state and the current action, P(s)(t+1)|s(1),π(1),s(2),π(2),..,s(t),π(t))=P(s(t+1)|s(t),π(t)) I.e., state transitions are markov;
element 4: the reward R represents the reward for executing corresponding actions in the current state; in a state s(t)Take action pi(t)Then, the return value is equal to the integral utility value of the user at the moment;
element 5: the discount factor gamma is more than or equal to 0 and less than or equal to 1, the importance degree of the future return and the current return is represented, the condition that gamma is 0 means that only the current reward is seen, and the condition that gamma is 1 means that the future reward is as important as the current reward is represented;
because both the state space and the action space are limited, the position disturbance decision problem is a limited Markov decision process;
then, a Q-learning algorithm is adopted to solve the Markov decision process to obtain the initial state s(1)A final execution action is initiated to maximize convergence of the accumulated reward values;
in the Q-learning algorithm, a decision matrix Q is created by the agent, where rows represent states and columns represent actions, used to store the values of the state-action pairs;
initialization: initializing a learning rate alpha and a discount factor gamma used in the algorithm, initializing a decision matrix Q to a zero matrix, and initializing a task allocation matrix to the zero matrix;
firstly, the leader provides privacy protection for the privacy level by using the privacy budget initial value, and selects an action through an e-greedy algorithm
Then, executing an action pi, and uploading the disturbed user position to a follower; the follower distributes tasks according to the received privacy budget, the disturbance position and the application task set and generates a distribution matrix A(k);
Calculating the utility of each user according to the current distribution matrix;
then, according to the utility of each user, the overall utility is calculated, namely the current state s is calculated(k)Take action pi down(k)The reward of (1);
iteratively updating the values of the state-action pairs in the Q matrix through a Bellman equation;
repeating the above process; a final execution action that maximizes convergence of the accumulated reward values;
position disturbance strategy pi when output reaches convergence or cycle number is finished*。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010629965.2A CN111770454B (en) | 2020-07-03 | 2020-07-03 | Game method for position privacy protection and platform task allocation in mobile crowd sensing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010629965.2A CN111770454B (en) | 2020-07-03 | 2020-07-03 | Game method for position privacy protection and platform task allocation in mobile crowd sensing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111770454A CN111770454A (en) | 2020-10-13 |
CN111770454B true CN111770454B (en) | 2021-06-01 |
Family
ID=72723507
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010629965.2A Active CN111770454B (en) | 2020-07-03 | 2020-07-03 | Game method for position privacy protection and platform task allocation in mobile crowd sensing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111770454B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112288478A (en) * | 2020-10-28 | 2021-01-29 | 中山大学 | Edge computing service incentive method based on reinforcement learning |
CN112543420B (en) * | 2020-11-03 | 2024-04-16 | 深圳前海微众银行股份有限公司 | Task processing method, device and server |
CN112967118B (en) * | 2021-02-03 | 2023-06-20 | 华南理工大学 | Mobile crowd sensing excitation method, device, system and storage medium |
CN112866993B (en) * | 2021-02-06 | 2022-10-21 | 北京信息科技大学 | Time sequence position publishing method and system |
CN113377655B (en) * | 2021-06-16 | 2023-06-20 | 南京大学 | Task allocation method based on MAS-Q-learning |
CN114254722B (en) * | 2021-11-17 | 2022-12-06 | 中国人民解放军军事科学院国防科技创新研究院 | Multi-intelligent-model fusion method for game confrontation |
CN114415735B (en) * | 2022-03-31 | 2022-06-14 | 天津大学 | Dynamic environment-oriented multi-unmanned aerial vehicle distributed intelligent task allocation method |
CN116744289B (en) * | 2023-06-02 | 2024-02-09 | 中国矿业大学 | Intelligent position privacy protection method for 3D space mobile crowd sensing application |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103533078B (en) * | 2013-10-24 | 2017-07-21 | 无锡赛思汇智科技有限公司 | A kind of method and system for generating map |
CN103761485B (en) * | 2014-01-13 | 2017-01-11 | 清华大学 | Privacy protection method |
CN105407482B (en) * | 2015-11-04 | 2019-01-22 | 上海交通大学 | The guard method of user location privacy in mobile gunz sensing network |
CN105528248B (en) * | 2015-12-04 | 2019-04-30 | 北京邮电大学 | Intelligent perception incentive mechanism under multitask collaboration application |
US10111031B2 (en) * | 2016-01-22 | 2018-10-23 | The United States Of America As Represented By The Secretary Of The Air Force | Object detection and tracking system |
CN108200610B (en) * | 2018-02-26 | 2021-10-22 | 重庆邮电大学 | Crowd sensing resource allocation method adopting distributed game |
CN108668253A (en) * | 2018-04-09 | 2018-10-16 | 南京邮电大学 | A kind of gunz cooperative sensing motivational techniques based on evolutionary Game |
CN109214205B (en) * | 2018-08-01 | 2021-07-02 | 安徽师范大学 | K-anonymity-based position and data privacy protection method in crowd-sourcing perception |
CN110390560A (en) * | 2019-06-28 | 2019-10-29 | 浙江师范大学 | A kind of mobile intelligent perception multitask pricing method based on Stackelberg game |
-
2020
- 2020-07-03 CN CN202010629965.2A patent/CN111770454B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111770454A (en) | 2020-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111770454B (en) | Game method for position privacy protection and platform task allocation in mobile crowd sensing | |
Wang et al. | Dependent task offloading for edge computing based on deep reinforcement learning | |
CN113434212B (en) | Cache auxiliary task cooperative unloading and resource allocation method based on meta reinforcement learning | |
CN111754000A (en) | Quality-aware edge intelligent federal learning method and system | |
Kaur et al. | Deep‐Q learning‐based heterogeneous earliest finish time scheduling algorithm for scientific workflows in cloud | |
CN111866954B (en) | User selection and resource allocation method based on federal learning | |
Kaur et al. | A novel multi-objective bacteria foraging optimization algorithm (MOBFOA) for multi-objective scheduling | |
CN110458663B (en) | Vehicle recommendation method, device, equipment and storage medium | |
CN112052071B (en) | Cloud software service resource allocation method combining reinforcement learning and machine learning | |
CN110009233B (en) | Game theory-based task allocation method in crowd sensing | |
CN109308246A (en) | Optimization method, device and the equipment of system parameter, readable medium | |
Wang et al. | Joint service caching, resource allocation and computation offloading in three-tier cooperative mobile edge computing system | |
CN112905013B (en) | Agent control method, device, computer equipment and storage medium | |
CN113778691A (en) | Task migration decision method, device and system | |
Li et al. | Batch jobs load balancing scheduling in cloud computing using distributional reinforcement learning | |
Chen et al. | A novel marine predators algorithm with adaptive update strategy | |
Alexandrescu et al. | A genetic algorithm for mapping tasks in heterogeneous computing systems | |
Chen et al. | A pricing approach toward incentive mechanisms for participant mobile crowdsensing in edge computing | |
CN110743164B (en) | Dynamic resource partitioning method for reducing response delay in cloud game | |
Park et al. | Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation | |
CN114385359B (en) | Cloud edge task time sequence cooperation method for Internet of things | |
Mouli et al. | Making the most of preference feedback by modeling feature dependencies | |
Huang et al. | Multi-objective task offloading for highly dynamic heterogeneous Vehicular Edge Computing: An efficient reinforcement learning approach | |
CN118093102B (en) | Resource allocation method in crowd sensing | |
Gandhi et al. | Optimizing Workload Scheduling in Cloud Paradigm using Robust Neutrosophic C-Means Clustering Boosted with Fish School Search |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |