CN111291973B

CN111291973B - Space crowdsourcing task allocation method based on alliance

Info

Publication number: CN111291973B
Application number: CN202010051354.4A
Authority: CN
Inventors: 郑凯; 赵艳; 李响
Original assignee: Maikesi Wuxi Data Technology Co ltd
Current assignee: Maikesi Wuxi Data Technology Co ltd
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2023-09-29
Anticipated expiration: 2040-01-17
Also published as: CN111291973A

Abstract

The invention discloses a space crowdsourcing task allocation method based on alliance, which comprises the following steps: (1) building workers into worker alliances; (2) Establishing a greedy task allocation method based on contracts; (3) formulating an equalization-based algorithm; (4) And (5) formulating a simulated annealing method to find better Nash equilibrium. By means of the mode, the space crowdsourcing task distribution method based on the alliance has a broad market prospect in popularization of the space crowdsourcing task distribution method based on the alliance by giving a group of workers and a group of tasks when the space crowdsourcing task is distributed, determining a stable worker alliance for each task so as to achieve the highest total return of task distribution.

Description

Space crowdsourcing task allocation method based on alliance

Method field

The invention relates to the field of space crowdsourcing, in particular to a space crowdsourcing task allocation method based on alliance.

Background method

Spatial crowdsourcing (Spatial Crowdsourcing, SC) is a new class of spatial crowdsourcing that enables people to move in a multi-mode sensor fashion, collecting and sharing various types of high-fidelity spatiotemporal data instantaneously. In particular, task requesters can issue spatial tasks to the SC server, which then uses the smart device carrier as a worker, who want to physically walk to a designated location and complete the tasks, an action known as task allocation.

Much of the current research focuses on single task allocation, where each task can only be allocated to one worker. However, in practice, there are still some SC applications inevitably, in which a single worker may not be able to perform tasks efficiently alone, such as monitoring traffic conditions or cleaning rooms. Thus, workers must be grouped into a team or alliance to collectively accomplish tasks that exceed the capabilities of individual workers. In addition, for each worker, she may prefer to work with other workers that may bring her a better reputation or monetary reward.

Recent work has explored redundant task allocation methods that allow each spatial task to be allocated to some reachable workers near the task. However, a significant portion of these studies employ a centralized control task allocation system, where in a large-scale SC scenario, computing a match for all reachable workers and tasks can require significant computational costs. As a result, they greatly increase the difficulty of system implementation and the human effort to apply these methods in practical situations. In addition, they disregard differences in worker behavior during federation formation, which results in federation instability due to the inability to guarantee the utility of federation members. In other words, members of the federation may leave the federation due to unsatisfactory utility obtained, resulting in a failure of the task to complete. More importantly, the existing strategies described above naturally lack coordination. Recently Cheng Peng et al was the first to consider collaboration in task allocation, where workers co-work and co-complete tasks to achieve a higher overall collaborative quality score. They consider that workers would like to perform tasks gratuitously, which is considered impractical because workers may not have power to perform the assigned tasks unless they are faced with a mobile device battery that may receive a satisfactory return as compensation for their participation costs (e.g., for sensing and data processing). In addition, cheng Peng only aims to find one nash equilibrium, without further exploration of the better equilibrium points that may be present.

Disclosure of Invention

The invention mainly solves the problem of providing a space crowdsourcing task allocation method based on alliances, which is to formulate a novel task allocation in the space crowdsourcing task, namely the task allocation based on alliances, wherein workers need to interact with others by constructing worker alliances to execute corresponding tasks, a stable worker alliance is determined for each task given a group of workers and a group of tasks to realize the highest total return of the task allocation, further, a greedy task allocation method based on contracts is adopted to effectively allocate the tasks, penalty contracts are formulated and implemented, penalties are applied to workers leaving the alliances, the task allocation problem based on the alliances is converted into multiplayer games, a solution based on balance is provided, wherein Nash balance is found based on an optimal response framework, meanwhile, a simulated annealing strategy is further introduced in the process of updating worker strategies, so that the effectiveness of the task allocation is improved, and the space crowdsourcing task allocation method based on alliances has wide market prospect in popularization.

In order to solve the problems of the above methods, the invention provides a space crowdsourcing task allocation method based on alliance, comprising the following steps:

(1) Constructing worker alliances by workers to execute corresponding tasks so as to interact with other people;

(2) Formulating a contract-based greedy task allocation method to form a group of worker alliances so as to effectively allocate tasks;

(3) Formulating an equalization-based algorithm, wherein Nash equalization is found based on an optimal response framework, and a stable worker alliance is formed for the artifacts to obtain higher total return;

(4) Because the balance point obtained by the optimal response method in the step (3) is not unique and is not optimal in terms of total rewards, a simulated annealing method is formulated to find better Nash balance, and the strategy of workers is continuously and asynchronously updated by adopting the optimal response method, so that pure Nash balance is achieved, and the optimal response and the simulated annealing method are converged to one balance point.

In a preferred embodiment of the invention, the worker is broadly directed to persons who are paid for space tasks only.

In a preferred embodiment of the invention, the worker's modes of operation include an online mode and an offline mode, when in the online mode, indicating that he is ready to accept tasks, once a task is assigned to a worker by the server, the worker is considered to be in the offline mode until completion of the assigned task.

In a preferred embodiment of the present invention, the constraints in step (1) include reach and task failure time.

In a preferred embodiment of the present invention, the greedy task allocation method based on contracts in step (2) assumes that workers all act in a selfish manner.

In a preferred embodiment of the present invention, the greedy task allocation method based on contracts in step (2) specifically makes and enforces punishment contracts, and applies punishment to workers leaving the federation.

In a preferred embodiment of the invention, the penalty contract in step (3) is specifically that once a worker's consortium is established to perform the task to be performed, the members of the consortium will be penalized by fines if they leave the consortium, the amount of the penalty being equal to the rewards earned for the task.

In a preferred embodiment of the present invention, the equalization-based algorithm in step (3) is specifically that the workers form the coalitions sequentially and update their policies in turn, select the best response task to maximize their own utility until nash equalization is achieved, where none of the workers can unilaterally switch from the assigned coalition to the other coalition to increase their utility.

In a preferred embodiment of the invention, the utility is a reward for a federation left by a worker.

The beneficial effects of the invention are as follows: the space crowdsourcing task allocation method based on the alliance provided by the invention is characterized in that a novel task allocation is formulated when the space crowdsourcing task allocation is carried out, namely, the task allocation based on the alliance is carried out, wherein workers need to interact with other people by constructing the worker alliance, a group of workers and a group of tasks are given, a stable worker alliance is determined for each task so as to realize the highest total return of the task allocation, furthermore, a greedy task allocation method based on a contract is adopted so as to effectively allocate the task, wherein penalty contracts are formulated and implemented, penalty is applied to workers leaving the alliance, the problem of the task allocation based on the alliance is converted into a multiplayer game, a solution based on balance is provided, wherein the balance of the suscept is found based on an optimal response framework, meanwhile, in the process of updating the worker strategy, a simulated annealing strategy is further introduced so as to promote the effectiveness of the task allocation, the highest total return of the task allocation is realized, and the space crowdsourcing task allocation method based on the alliance has wide market prospect.

Drawings

For a clearer description of the method solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the description below are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person of ordinary skill in the art, wherein:

FIG. 1 is an example of the operation of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 2 is a task rewards pricing model of a preferred embodiment of the federation-based spatial crowdsourcing task allocation method of the present invention;

FIG. 3 is a task s of a preferred embodiment of the federation-based spatial crowdsourcing task distribution method of the present invention ₂ Worker alliance { w ₂ ,w ₃ Workload allocation;

FIG. 4 is an illustration of the impact of a composite dataset |S| of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 5 is an illustration of the impact of gMission dataset |S| of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 6 is an illustration of the impact of a composite dataset |W| of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 7 is an illustration of the impact of a gMission dataset |W| of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 8 is an illustration of the impact of a synthetic dataset r of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 9 is an illustration of the impact of a synthetic dataset e of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 10 is an illustration of the impact of a synthetic dataset d-e of a preferred embodiment of the federation-based spatial crowdsourcing tasking method of the present invention;

FIG. 11 is a main flow of a greedy method based on contracts in accordance with a preferred embodiment of the present invention for federation-based spatial crowdsourcing task allocation method;

FIG. 12 is a main flow of the best response method of a preferred embodiment of the federation-based spatial crowdsourcing task allocation method of the present invention;

FIG. 13 is an experimental diagram of a preferred embodiment of the spatial co-wraparound task allocation method based on federation of the present invention.

Detailed Description

The following will clearly and fully describe the method aspects of embodiments of the present invention, it being apparent that the described embodiments are only some, but not all, embodiments of the present invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1 to 13, an embodiment of the present invention includes:

a space crowdsourcing task distribution method based on alliance comprises the following steps:

Preferably, the worker is broadly directed to a person who is paid only to perform a space task.

Preferably, the worker's modes of operation include an online mode and an offline mode, when in the online mode, indicating that he is ready to accept tasks, once the server assigns a task to a worker, the worker is considered to be in the offline mode until completion of the assigned task.

Preferably, the constraint in step (1) includes an reach and a task failure time.

Preferably, in the greedy task allocation method based on contracts in the step (2), workers are assumed to act in a selfish manner.

Preferably, the greedy task allocation method based on contracts in the step (2) specifically includes making and implementing a punishment contract, and punishing workers leaving the federation.

Preferably, the penalty contract in step (3) is specifically that once a worker's federation is established to perform the task to be performed, the federation members will be penalized by fines if they leave the federation, the amount of the penalty being equal to the reward earned by the task.

Preferably, the equalization-based algorithm in step (3) is specifically that the workers form the coalitions sequentially and update their policies in turn, select the best response task to maximize their own utility until nash equalization is achieved, where none of the workers unilaterally switches from the assigned coalition to the other coalitions to increase their utility.

Preferably, the utility is a reward for a federation left by a worker.

In this embodiment, the definition of the relevant symbol is: s represents a space task, s.l represents a position of the space task S, s.p represents a release time of the space task S, s.e represents a desired completion time of the space task S, s.d represents a failure time of the space task S, s.wl represents a workload of the space task S, s.maxr represents a maximum reward of the space task S, s.pr represents a penalty rate of the space task S, S represents a space task set, w represents a worker, and w.l represents a position of the worker w; w.r the achievable radius of worker W, W represents the set of workers, AW(s) represents the set of available workers for task s, t (a, b) represents the travel time from position a to position b, d (a, b) represents the travel distance from position a to position b, w.RS represents the set of achievable tasks for worker W, WC(s) represents the worker alliance for task s, R _WC(s) Rewards representing completion of task s by worker alliance WC(s), s.t _e Representing completion time of completion task s of worker alliance WC(s), s.t _s Represents the start time (i.e., allocation time) of the task s, T _WC(s) Denoted as the duration w.wl (WC (s)) of task s denotes the workload contribution of worker w when executing task s in WC(s), MWC(s) denotes the minimum worker coalition of task s, a denotes a spatial task allocation,representing task allocation->Is (are) total rewards, ->Indicating all possible allocation patterns, +.>Representing a globally optimal allocation.

1. Definition of the definition

Definition 1 (space task): the spatial task may be expressed as s= < s.l, s.p, s.e, s.d, s.wl, s.maxr, s.pr >, where s.l represents the issue location of the spatial task s, may be described by a point (x, y) in two-dimensional space, s.p represents the issue time of s, s.e represents the time that s is expected to complete, and s.d represents the failure time of s. Each task is also marked with the amount of work s.wl required for a normal worker to complete task s (we use the time required to complete the task to represent s.wl in the present invention). Maxr represents the maximum rewards that the requestor of task s can offer, s.pr is a penalty rate that establishes a correlation between task completion time and rewards.

Definition 2 (worker): a worker refers to a person who can only perform a space task with paid attention, denoted as w= < w.l, w.r >. The worker may select an online mode or an offline mode. When the worker is in the online mode, it indicates that he is ready to accept tasks.

Each worker w on-line has a location w.l where it is currently located and the area where it is reachable is a circular area centered at w.l and at w.r, where it can only accept space task allocation.

In our defined model, each worker can only handle one task at a particular time, whether she is alone or as part of a federation that works together to accomplish the task, which is reasonable in real life. Once the server assigns a task to a worker, the worker is considered to be in a positive offline mode before completing the assigned task.

Due to worker reach and task failure time constraints, each task can be completed by only a small fraction of workers (i.e., a set of workers, available worker set), which is defined as follows.

Definition 3 (set of available workers): the set of available workers for task s, which can be represented as AW(s), is a series of workers that satisfy the following two conditions for

1) Worker w may arrive at the position of task s before task s fails, i.e., t _now +t(w.l,s.l)<s.d，

2) Task s is within reach of worker w, i.e., d (w.l, s.l). Ltoreq. w.r

Wherein t is _now Representing the current time, t (a, b) represents the travel time from position a to position b, and d (a, b) is the distance between a given position a and position b. The two conditions described above ensure that a worker can travel directly from her origin to her location where the task s is reachable before the task fails. If worker w is available for task s, i.e., w e AW(s), we say s is the reachable task of w, and the reachable task set of worker w is denoted w.RS.

Definition 4 (worker alliance): given a task s to be allocated and its set of available workers AW(s), a worker consortium of tasks s may be represented as WC(s), a subset of AW(s), such that all workers in WC(s) have enough time to complete task s together before the task fails, i.e., Σ _w∈WC(s) (s.d-(t _now +t(w.l,s.l)))≥s.wl。

Taking the example of figure 1 as an example,task s ₁ Is { w } ₂ ,w ₃ ,w ₅ ,w ₆ -wherein all available workers can be at s ₁ Reach s before d ₁ L and s ₁ Is accessible to workers. Since all workers in each federation can be at s ₁ Task s was completed in tandem before d ₁ So task s ₁ Is { w } ₃ },{w ₂ ,w ₃ Sum { w } ₂ ,w ₃ ,w ₆ }。

As for task rewards calculation, we use a rewards pricing model (Reward Pricing Model, RPM) which can effectively quantify the time constraints of the task. More specifically, RPM considers a single task s and one of the worker alliances, with emphasis on task completion time and true rewards (i.e., actual payments to the task by the requester), as shown in FIG. 2.

The penalty rate s.pr and maximum prize s.maxr, RPM, are limited by the task's main time (i.e., the task's issue time s.p, the expected completion time s.e and the failure time s.d), and can be expressed as follows:

wherein R is _WC(s) Representing the actual rewards of task s provided by task requesters to workers in WC(s), s.t _e The completion time of completion task s of worker alliance WC(s) is represented.

To calculate s.t _e We use s.t _s Representing the start time (i.e., allocation time) of a task s, using T _WC(s) Expressed as the duration of the task s (i.e., the elapsed time from allocation to completion), w.wl (WC (s)) is expressed as the workload contribution (measured in time) of the worker w when executing the task s in WC(s). Shown by FIG. 3, which depicts task s ₁ Worker alliance { w ₂ ,w ₃ Workload distribution (where the worker's tasks come from the running example in fig. 1), is easily understood. For any w e WC(s), the duration of the task is equal to the travel time of the worker w plus her workload tributeDonation, namely:

T _WC(s) ＝t(w.l,s.l)+w.wl(WC(s)),

by adding the right formulas for all workers in the worker alliance WC(s), one can get:

in view of s.wl= Σ _w∈WC(s) w.wl (WC (s)), the above formula can be changed as:

finally, s.t _e Can be calculated as s.t _e ＝s.t _s +T _WC(s) The workload of each worker can be calculated as w.wl (WC (s))=t _WC(s) -t (w.l, s.l). From the perspective of a worker's alliance, we consider that the goal of a worker joining the alliance is to promote the overall rewards of the alliance, thereby bringing satisfactory rewards and reputation rewards to the worker.

Notably, because we require w.wl (WC (s)) >0, so if the travel time of a worker exceeds the duration of the task, i.e. T (w.l, s.l). Gtoreq.T _WC(s) The worker w does not contribute to the task s and the worker should delete from WC(s). In addition, as shown by the RPM in fig. 2, when a worker alliance WC(s) can cooperatively complete a task s before its expected completion time s.e, this means that they get the maximum reward (s.maxr) for this task, even if more workers are added to the WC(s) it cannot get a higher reward. In other words, the more workers in WC(s) does not mean an earlier completion time or more total rewards. This observation motivates the concept of the minimal worker alliance.

Definition 5 (minimum workers alliance): if a subset of the worker alliance WC(s) for task s is not availableAt R _WC(s) The worker alliance is minimal, which is denoted as MWC(s).

Although { w ₃ },{w ₂ ,w ₃ Sum { w } ₂ ,w ₃ ,w ₆ Is task s ₁ Alliance of all workers, but due toThe sum { w may be generated ₂ ,w ₃ ,w ₆ Same rewards (i.e +.>) Therefore { w ₂ ,w ₃ ,w ₆ Not the smallest worker alliance. For task s ₄ ，{w ₇ },{w ₁ ,w ₇ All of them are worker alliances, but only { w } ₇ The least worker alliance is because at w ₇ Independently execute task s ₄ At the time, worker w ₇ Tasks s can be obtained ₄ Without requiring the cooperation of other people.

Definition 6 (spatial task allocation): given a worker set W and a task set S, spatial task allocationIncluded<task,MVC>Aggregation pairs, of the form of<s ₁ ,MVC(s ₁ )>,<s ₂ ,MVC(s ₂ )>,…,<s _|S| ,MVC(s _|S| )>Wherein

We useRepresenting task allocation->Is->Wherein R is _MWC(s) Can be calculated from formula (1) using +.>Indicating all possible allocation patterns. The problems studied in the present invention can be formally expressed as follows.

Problem definition: given a set of workers W and a set of tasks S at a time, a joint task allocation (CTA) problem aims to find a globally optimal allocationSo that for->

Lemma 1: the CTA problem is the NP-hard problem.

Demonstration 1: this quotation can be demonstrated by the protocol to 0-1 knapsack problem. The 0-1 backpack problem can be described as follows: given a C set containing n items, wherein each item C _i E C are marked with weight m _i Sum v _i . The 0-1 knapsack problem is to determine a subset C' of C such thatAt->Maximization is effected, where M represents the maximum weighted capacity.

For a given 0-1 backpack problem, one can translate into an example of a CTA problem as follows. We present a set of spatial tasks S comprising n tasks, where each task S _i And release time s _i P=0, expected completion time s _i E=1, failure time s _i D=1, workload s _i .wl＝m _i Maximum prize s _i .maxR＝v _i And (5) associating. In addition, we give a worker set W containing M workers, each worker W _i Only one workload is allowed to complete. All workers and tasks are located in the same location. In this case, in order to obtain each task s _i Is a reward v of (2) _i The system needs to allocate m _i The individual workers give tasks s _i 。

Given this mapping, we can demonstrate that the converted CTA problem can be solved if and only if the 0-1 knapsack problem can be solved. Since the 0-1 backpack problem is known as the NP-hard problem, the CTA problem is also the NP-hard problem.

2. Greedy method based on contract

A straightforward solution to maintain worker federation stability is to execute contracts in which workers cannot leave their constituent federations at will. Thus, in this section, we have designed a contract-based greedy algorithm that encourages the worker consortium to get more total rewards and manages worker behavior based on contracts. The algorithm is based on the following consensus: because rewards do not increase over time, workers in the vicinity of selecting to perform a task may generate higher rewards (i.e., first the maximum reward provided by the task requester remains stable and then gradually decreases, as known from the Rewards Pricing Model (RPM) of FIG. 2); workers in the vicinity of the rewards can handle more work in order to get more rewards. In addition, to eliminate worker-to-worker federation instability (i.e., certain workers leave from their formed federations resulting in corresponding tasks not being completed), we have devised a mandatory contract between workers. The basic principle of contracts is that once a worker's federation is established to perform a task to be performed, the federation members will be penalized by fines if they leave the federation, which is equal to the rewards earned by the task.

Algorithm 1 outlines the main flow of the contract-based greedy approach, which takes as input a worker set W and a task set S, and first computes the available worker set AW per task SS), after initialization, for each task S e S, the algorithm generates the minimum worker-alliance MWC (S) by selecting the neighbor workers that can contribute a higher total rewards for task S and assigns that worker-alliance MWC (S) to S. Furthermore, if worker w e MWC(s) leaves MWC(s) and stops executing task s, the algorithm invokes the penalty clause in the contract and worker w will pass payment R _MWC(s) The penalty is penalized (calculated based on the total rewards of the coalition MWC (s)). This means that task s fails to be assigned and will be reassigned (line 22), eventually, algorithm 1 will get a proper task assignment result.

Algorithm 1 is shown in figure 11 of the specification.

As illustrated in fig. 1, a contract-based greedy algorithm may obtain a task allocation,

{<s ₁ ,{w ₂ ,w ₃ ,w ₅ }>,<s ₃ ,{w ₁ ,w ₄ }>,<s ₄ ,{w ₇ }>and, wherein the prize is 8.53.

3. Equalization-based method

While contract-based algorithms may alleviate the problem of federation instability to some extent, they reduce the interest of workers initially participating in existing federations. The essence of the CTA problem is that each worker needs to select a task to be performed by interacting with other workers during task allocation, which suggests that the task selection of one worker depends on decisions made by other workers, such interdependent decisions can be modeled by gambling theory, and the workers can be considered independent players of the game. Based on this, the CTA problem can be regarded as a multiplayer game.

More specifically, our problem can be modeled as a force field game with at least one Nash equilibrium in a pure strategy (i.e., nash equilibrium). We then use the best response algorithm, which is one of the most basic algorithms in strict force field gaming, because it can effectively resolve the previous conflicts for the player. In the optimal response dynamic, the participant is required to update the strategies of other participants sequentially and asynchronously in a short visual way on the basis of the optimal response utility function, and finally, the pure Nash equilibrium is achieved. Nash equilibrium represents the state of a game where no single worker can improve its utility by unilaterally steering from the assigned coalition to other coalitions while they remain in the assigned coalitions, indicating that workers may voluntarily select assigned tasks when they have the freedom to choose. In this case, the formed worker alliance may be regarded as a stable alliance. However, since there may be multiple equalization points, the nash equalization achieved by the best response algorithm may not be optimal. To address this problem, we introduce a simulated annealing strategy (Simulated Annealing, SA) in the best response dynamics that finds a better nash balance corresponding to near-optimal task allocation. With the help of the SA policy, the update process has a better chance to achieve a near optimal task allocation. Finally we analyze the feasibility of our algorithm.

Gaming modeling and Nash equalization

We first expressed the CTA problem as a strategic game of n players, which can be expressed asConsists of players, strategic spaces, and utility functions. The detailed description is as follows:

W＝{w ₁ ,…,w _n -n.gtoreq.2)) represents a finite set of workers playing a game player. In the following article, we alternate using user players and workers when the context is clear.

Is all the policy set (i.e., the strategic space of the game) for all players. ST (ST) _i Is w _i Contains w _i Reachable task set and empty task (meaning w _i Without selecting any task to be performed), can be represented as ST _i ＝{w _i RS, null }, where w _i RS represents worker w _i Is a null representation of the reachable task set of (1)Empty tasks.

Representing utility functions of all players, U _i :/>Is player w _i Utility function of (3). For each federation policy-> Representing player w _i The utility of (c) can be calculated as follows:

wherein the method comprises the steps ofIs the alliance MWC(s) U { w } _i Total prize, MWC (s)/(w) _i Comprises MWC(s) and { w } is _i New worker alliance, MWC(s) ₀ )-{w _i And represents a slave MWC(s) ₀ ) Delete { w } _i Post-worker alliance, MWC(s) and MWC(s) ₀ ) Respectively represent w _i Worker alliance willing to join, w _i The worker alliance is in progress. When- >Where the context is clear we use U _i Representation->

In the strategyIn game, policy configuration file(wherein->ST _i →[0,1]Is ST _i Is known as the Nash equalization of the mixing strategy (i.e., mixing Nash equalization) if and only if for any w _i E, W, satisfies the following conditions:

wherein Σ is _i Representing player w _i Is a strategic space in the system. Pi= (pi) for any given policy profile ₁ ,...,π _n ) Then there isNash equalization is only one pure Nash equalization (i.e., nash equalization with pure strategy) when the player plays deterministic strategy, meaning that worker w _i Can be from ST _i The policy probability of the selection is 1, and worker w _i Can be from ST _i The other policy probability of the selection is 0.

As demonstrated by John f.nash, each game with a limited number of players and a limited set of policies is a hybrid nash equilibrium, which only means a steady probability distribution over the profile, not a fixed play of a particular joint policy profile. This uncertainty is not acceptable in our task allocation scenario, where each worker needs to have an explicit policy (i.e., choose the task to perform or do nothing). Thus, we next demonstrate that my CTA gaming is a pure nash equilibrium, where each player can deterministic selection strategies.

Given a joint policyst _i (i.e. s.epsilon.)w _i RS or null) represents w _i (0<i.ltoreq.n) the policy chosen. For player w _i A combination strategy->Can be expressed as +.>Wherein the method comprises the steps ofRepresenting the federation policy of all other players.

And (4) lemma 2: CTA gaming is pure nash equalization.

Proof 2: to prove primer 2, it is necessary to prove that CTA gaming is a strict potential force field game (Exact Potential Game, EPG) where all players' motivations can be mapped to the global potential function. For EPGs, the best response framework can always converge to pure nash equalization for a countable strategy.

Next, we introduce the definition of EPG and illustrate that CTA game is EPG.

Definition 7 (strict potential field game (Exact Potential Game)): policy gamingIs a strict potential field game (EPG) only when there is a function phi: +.>So that for all->The following conditions are satisfied: for any worker w _i ∈W，

Wherein st' _i And st _i Is available for workers w _i The policy of the selection is that,is in addition to w _i The function phi is called game +.>Is a strict potential function of (a).

And (3) lemma 3: CTA is a strict potential field function (EPG).

Demonstration 3: we define the potential function asIt represents the total rewards for all tasks in S. It can then be calculated as follows:

Wherein in policy st _i ' and st _i The tasks selected in (a) are s _k Sum s _g The strategic game of CTA problem is a strict potential field game according to definition 7, for which CTA game has nash equilibrium in a pure strategy.

UsingRepresenting player w _i Can be combined with other strategies>The best strategy to respond. Thus for a given->Maximizing utility->By means of a combination strategy->Can reach pure NashEqualization, so that any player cannot achieve any utility by unilaterally changing policies.

(II) optimal response method

Because CTA gaming is pure nash balancing, we solve the CTA problem using an optimal response method that creates a large number of stable worker alliances to perform tasks by achieving pure nash balancing. Specifically, the designed optimal response algorithm is composed of the fact that players adjust own strategies in turn according to the latest strategies of other people, and finally Nash equilibrium of local optimal task allocation is achieved. Algorithm 2 describes the overall framework of the best response method.

Task allocation given a worker set W and a task set S to be allocatedIs initialized to->(line 1) the algorithm first selects one available worker for each task, obtains the corresponding policy for each worker (i.e. one reachable task or nothing) and assigns +. >The algorithm then iteratively adjusts each worker's strategy to the best response strategy, maximizing the growth of the federation (as shown in equation (9)) based on the current joint strategy of others, until Nash equilibrium is found (i.e., no one changes his strategy). In each iteration, only one worker is allowed to select his best response strategy and the game should be played in sequence.

Algorithm 2 is shown in figure 12 of the specification.

Specifically, for each worker w _i E W, we first find the best response task s with the largest bonus increase ^* Which can be calculated by equation (9):

for the current task allocation, if worker w _i Without best response task, w _i Its policy is not altered. For workers to obtain the best response task, we check the current strategy as follows:

if none of her current policies is done, i.e. w _i St=null, we will best respond to the task (i.e. w _i .st＝s ^* ) Assigning to her and updating task s ^* Is a minimum worker alliance.

If her current policy contains a task (labeled w _i St), which means task w _i St is assigned to federation MWC (w _i W in st) _i Then the federation MWC (w _i The strategy of the other workers in st) depends on whether they can complete the task w on time _i St. Subsequently, w _i Is updated.

Finally, we update the task allocation according to Nash equilibrium

(III) optimization strategy based on simulated annealing

While pure Nash equalization calculated by the best response algorithm can yield acceptable task allocation results under a steady worker consortium, this is a locally optimal approach to the CTA problem and is not necessarily unique. In this case where there are multiple pure Nash equalizations, we want to get better results than the best response algorithm generates. As the name suggests, simulated Annealing (SA) results from the annealing process of metals and is a stochastic optimization procedure. Heuristic by David e.kaufman et al to solve the discrete optimization problem using SA we use SA to determine a better task allocation that approximates global optima.

Specifically, when each worker is according to a givenSequentially updating her policiesst _i To maximize utility functionAt this time, the worker may reach Nash equilibrium, i.e., a steady state. Considering that the search space is discrete (i.e. policy set +.>Discrete) a simulated annealing method can be applied to update each worker strategy to achieve better local optima. SA is considered as an effective probability scheme for solving the discrete optimum problem, and can solve the non-uniform markov chain x (k) =st of discrete time ₁ ,…,st _n . In the present invention, the state x (k) =st ₁ ,…,st _n Is the combination of policies for the worker in the kth iteration in algorithm 2. For worker w _i Policy st _i Can reserve the current task s ₀ Or alter other reachable tasks (i.e. w _i .RS-{s ₀ }). For the simulation of heat (randomness), we consider worker w _i Can be achieved by using another (|w) _i RS| -1) have the same probability +.> Randomly changing its current policy, wherein st _i ′＝s(s∈w _i .RS-{s ₀ }). Each worker may update its own policy in the following rule order.

If it isThen->

If it isThen->The probability is as follows: />

Where T (k) >0 represents the temperature of the kth iteration, which gradually decreases throughout the update.

Otherwise the first set of parameters is selected,

by following the above rules we can update lines 15-28 in algorithm 2, we omit detailed pseudo code because of the limited space. Formally, transition probabilities can be calculated using the following formula:

function T (k):is non-additive, called cooling schedule, wherein +.>Is a positive integer set. From equation (11), policy selection is almost random when T (k) is large, and better policies with greater utility are more likely to be selected when T (k) approaches 0. While allowing a smaller utility task selection would result in a decrease in overall utility, this "irregular" strategy selection is more likely to result in better Nash equalization (i.e., better task allocation), as verified in the experiment.

(IV) Convergence analysis

In the field of game theory, the problem of convergence to nash equilibrium has attracted a great deal of attention. We therefore subsequently demonstrate that our solution converges to a pure nash equilibrium point, where no worker is encouraged to deviate unilaterally.

And 4, lemma: the best response algorithm converges to a pure Nash equilibrium.

Demonstration 4: as shown in equation (8), the utility of all workers is mapped to a potential function (i.e., Φ), which indicates that each worker's individual adjustment of his strategy results in the same number of changes in utility and potential function. For a potential game, each worker sequentially updates its strategy for maximum utility by an optimal response algorithm, and the potential function reaches a local maximum accordingly, where the optimal response is dynamically equivalent to a local search of the potential game's potential function. The Noam Nisan et al has demonstrated that in any limited potential game, sequential updates with optimal response dynamics always converge to nash equilibrium.

In terms of convergence time, fabrikant et al propose that determining that pure Nash equalization in a potential game is a polynomial local search (Polynomial Local Search, PLS) complete when the best response of each player can be found in polynomial time. In our CTA game, each worker w _i Only |w _i Rs|policies and his reachable tasks w _i RS correlation, due to space-time constraints of workers and tasks, in practical application |w _i Rs| is not large. Each worker may choose a task from his reachable tasks to maximize the existing utility in polynomial time. Therefore, convergence to nash equalization is quite fast.

And (5) lemma: when the cooling schedule is adjusted, the best response algorithm with simulated annealing optimization converges to pure Nash equilibrium.

Demonstration 5: by integrating the simulated annealing strategy into the optimal response algorithm, randomness is added to the update process of the worker strategy (i.e., worker w _i Other (|w) with probabilities can be used _i Rs| -1) reachable task to change its current policy randomly). Specifically, this process is "heated" prior to "cooling" to helpThe potential function avoids local optima (obtained by the best response algorithm) and converges to another better nash equilibrium in which the cooling schedule should be adjusted so that the process will eventually "freeze" (i.e., converge). LiuYanqing et al have demonstrated that when the cooling schedule is set toTime (beta is greater than or equal to d) ^* Is a normal number) can ensure the convergence of the simulated annealing strategy. If- >To->With one path, d ^* Represents the maximum depth of a path from any joint policy +.>Start and end at final federation policy +.>

The convergence speed of the best response algorithm with simulated annealing must be slower than that of the best response algorithm. In experiments we provide experimental evidence of this statement and demonstrate that pure nash equalization can be effectively calculated over a limited time, as described by lemma 4 and lemma 5.

4. Experiment

In this section, we evaluated our method based on real and synthetic datasets, all experiments were performed on Intel (R) Xeon (R) CPU E5-2650 v2@2.60GHZ 128GB RAM memory machines.

Experimental setup

The experiment was performed based on two data sets, the gmtion data set (labeled GM) and the synthetic data set (labeled SYN). Specifically, GMission is an open source crowdsourcing platform in which each task is associated with its location, deadline, and rewards, and each worker is associated with its location and reachable radius. Since gmill data does not generate time, workload, expected completion time and penalty rate for tasks, we generate these attributes from a uniform distribution. For a composite data set, based on observations of the real data set, we generate the position of the task, the generation time, the expected completion time, the deadline, the workload, and the penalty rate from the uniform distribution, and generate their maximum rewards from the normal distribution. The location and the achievable radius of the worker are generated from the uniform distribution.

We studied and compared the performance of the following algorithm:

OTA: an optimal task allocation (Optimal Task Assignment) algorithm based on a tree decomposition strategy, the OTA finds the minimum worker alliance for each task by using a dynamic planning strategy, and then obtains the optimal task allocation for the maximum rewards using the tree decomposition based algorithm.

CGTA: our Contract-based greedy task allocation (Contract-based Greedy Task Assignment, CGTA) algorithm.

BR: we propose Best-Response (BR) method.

BR+SA: our best response method with simulated annealing optimization, wherein the cooling schedule is set toAnd k represents the kth iteration of the algorithm.

We compare the algorithm described above using three metrics, maximum prize, CPU time, update number, representing the number of updated policies that the worker makes, to find the final task allocation. Description figure 13 shows our experimental setup where all parameter defaults are underlined.

(II) results of experiments

a) Influence of S. To study the expansibility of all algorithms, we randomly selected to generate 5 datasets from the synthetic dataset (gmill dataset), containing 1000 to 5000 (100 to 500) tasks. As shown in fig. 4 (a) and fig. 4 (b), the total rewards of all methods show a similar trend of increase as |s| increases. Obviously, the OTA gets the highest total rewards, in the composite dataset and the gmill dataset, which are actually br+sa, BR and CGTA. The BR+SA prize is always higher than BR, which indicates the superiority of the simulated optimization strategy. In fig. 4 (b), fig. 4 (c), fig. 5 (b), and fig. 5 (c), although the CPU time of all methods increases with an increase in |s|, the proposed algorithm (including br+sa, BR, and CGTA) has significantly better performance than OTA. The deterioration speed of OTA is significantly accelerated in terms of efficiency. Although not expected to produce less rewards than other algorithms, CGTA is the fastest algorithm. Br+sa may have an excellent compromise between efficiency and accuracy. As for the update numbers in (d) of fig. 4 and (d) of fig. 5, the worker updates the policy at br+sa more frequently than BR because br+sa allows the worker to randomly change her current policy, whereas the worker in BR updates its current policy only when there is an optimal response policy. We also note that as the number of tasks |s| increases, both br+sa and BR updates increase, the reason behind this is self-evident, i.e., the more tasks to be allocated, the more tasks that each worker will have available (i.e., available policies) so that he has more opportunities to update his policies.

b) Influence of W. Next we studied the effect of |w|, i.e. the number of workers to be dispensed. As shown in fig. 6 (a), 6 (b), 7 (a), and 7 (b), br+sa and BR obtain a higher global prize than CGTA, sacrificing some efficiency. However, the computational efficiency of BR+SA and BR is acceptable. Although the total prize of the OTA is certainly the highest, the OTA is also the most time consuming as can be seen in fig. 6 (c) and fig. 7 (c). More precisely, br+sa may obtain 98.6% of the maximum prize, but its CPU time is almost negligible compared to OTA. From fig. 6 (d) and fig. 7 (d), we know that the update number is on the rise as |w| changes, because more workers need to change their policies as the number of workers to be distributed becomes larger. Notably, the number of worker policy updates is much lower than the number of workers to be dispensed. This may be explained by a number of reasons. The first is that some workers adhere to their originally assigned tasks (br+sa and BR algorithms give each worker an initial policy, i.e., one of the tasks is reachable or the task is empty, please refer to algorithm 2) without any policy updates. The second reason is that there are some workers that have no reachable tasks, which means that their number of policy updates is 0. In order to save space, we did not show the CPU time of the OTA (this is large) nor did we show the results of the gMisson dataset (which were similar to those of the synthetic dataset) in the next experiments.

c) r. Fig. 8 depicts the effect of the achievable radius r of the worker on the performance of the algorithm, r ranging from 1km to 5km. As the worker's reach radius increases, the worker has more available tasks, and as a result they have more opportunities to select higher rewards for tasks, which also explains the tendency of the total rewards in fig. 8 (a) to rise with increasing r. Accordingly, in fig. 8 (b), as r is changed, the CPU time of all methods increases because the worker must search for more reachable tasks to find the appropriate one. In addition, as can be seen from fig. 8 (c), the number of updates of the br+sa algorithm is always higher than the number of updates of the BR algorithm, regardless of r.

d) e influence of e. Next we studied how the expected time of the task affects the efficiency and effectiveness of all methods. It is apparent that the total prize for all methods increases progressively with increasing e, as shown in figure 9 (a), as a larger e means that more tasks can reach the maximum prize. The OTA still obtains the maximum rewards, and the performance of BR+SA is better than that of BR and CGTA. However, it is worth noting that all methods tend to remain stable when e >8h, possibly because most tasks can be completed before 8h to reach their own maximum rewards. As for CPU time, all methods tend to rise and then remain stable because initially as e increases, each worker tends to have more reachable tasks and more CPU time is required to search for these tasks, and then as the task expected completion time continues to extend, the number of reachable tasks per worker will remain stable due to space-time constraints (e.g., worker reachable radius) so that the CPU time to find task assignments remains stable. In fig. 9 (c), the number of updates of br+sa and BR decreases as e increases, since as e becomes larger, workers are more likely to choose their satisfactory tasks to avoid policy updates.

e) d-e. In the last set of experiments, we studied the effect of d-e. Not surprisingly, as shown in fig. 10 (a) and 10 (b), all methods have higher rewards and higher CPU times when the deadlines are more relaxed. As expected, a larger d-e means that on average each worker has more tasks available, which can lead to more total rewards and more CPU time spent. Another observation is that the performance of the BR-related method and the CGTA algorithm is increasingly poor in terms of the total rewards, since the total rewards are more sensitive to the average number of workers available per task when the BR-related algorithm is applied. In this case, the benefits of the BR-related methods become more important. As shown in fig. 10 (c), the update numbers of BR and br+sa are both increasing.

The space crowdsourcing task allocation method based on the alliance has the beneficial effects that:

a novel task allocation is formulated when space crowdsourcing task allocation, namely, task allocation based on alliances, wherein workers need to perform corresponding tasks by constructing worker alliances so as to interact with others, a stable worker alliance is determined for each task given a group of workers and a group of tasks so as to achieve the highest total return of task allocation, further, a greedy task allocation method based on contracts is adopted so as to effectively allocate tasks, punishment contracts are formulated and implemented, punishment is applied to workers leaving the alliances, the problem of task allocation based on alliances is converted into multiplayer games, a solution based on equalization is proposed, wherein Nash equalization is found based on an optimal response framework, and meanwhile, a simulated annealing strategy is further introduced in the process of updating worker strategies so as to improve the effectiveness of task allocation so as to achieve the highest total return of task allocation.

The foregoing description is only exemplary of the present invention and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures disclosed herein or modifications in equivalent processes, or any application of the invention directly or indirectly to other related fields of practice, which are within the scope of the present invention.

Claims

1. A space crowdsourcing task distribution method based on alliance is characterized by comprising the following steps:

(1) Constructing a worker alliance by workers to execute corresponding tasks so as to interact with other people, wherein the workers are limited by constraint conditions;

(2) A greedy task allocation method based on contracts is formulated to form a group of worker alliances so as to effectively allocate tasks, workers are assumed to act in a selfish mode, a punishment contract is formulated and implemented in the greedy task allocation method based on contracts, punishment is applied to workers leaving the alliances, the punishment contract is that once a worker alliance is established to execute tasks to be executed, alliance members are punishd in a punishment mode if leaving the alliance, and the punishment amount is equal to rewards of the tasks;

(3) Formulating an equalization-based algorithm, wherein Nash equalization is found based on an optimal response framework, a stable worker alliance is formed for artifacts to obtain higher total returns, the equalization-based algorithm is characterized in that workers form alliances in sequence and update their strategies in turn, and optimal response tasks are selected to maximize the utility of themselves until Nash equalization is achieved, wherein any worker cannot unilaterally switch from an allocated alliance to other alliances to improve the utility of the alliances left by the workers;

2. The federation-based space crowdsourcing task allocation method of claim 1, wherein the worker is generally directed to a person who is paid only for performing a space task.

3. The federation-based space crowd-sourced task allocation method of claim 2, wherein the worker's modes of operation include an online mode and an offline mode, when in the online mode, indicating that he is ready to accept tasks, once a server allocates a task to a worker, the worker is considered to be in the offline mode before completion of the allocated task.

4. The federation-based spatial crowdsourcing tasking method of claim 1, wherein the constraints in step (1) comprise reach and task failure time.