CN114282645A

CN114282645A - DQN-based space-time crowdsourcing task allocation method

Info

Publication number: CN114282645A
Application number: CN202111404758.8A
Authority: CN
Inventors: 彭占魁; 李玉; 殷昱煜
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-04-05
Anticipated expiration: 2041-11-24
Also published as: CN114282645B

Abstract

The invention discloses a DQN-based space-time crowdsourcing task allocation method. The invention aims to provide an allocation strategy for space-time crowdsourcing tasks. Extracting feature vectors according to the history records of crowdsourcing initiators and workers, and continuously training in a neural network structure designed based on DQN, so that the characteristics and the preference of the crowdsourcing initiators and the workers can be better identified, and the distribution of space-time crowdsourcing tasks is completed.

Description

DQN-based space-time crowdsourcing task allocation method

Technical Field

The invention relates to the field of space-time task crowdsourcing distribution, in particular to a method for distributing space-time crowdsourcing tasks based on DQN (deep Q network).

Background

Crowdsourcing refers to the practice of "a company or organization outsourcing work tasks performed by employees in the past to an unspecified (and often large) mass network in a free-voluntary manner; crowdsourced tasks are typically undertaken by individuals, but may also occur in a form that relies on the production of an open source individual if it involves a task that requires multiple persons to collaborate to accomplish.

Spatial crowd sourcing is the process of crowd sourcing a set of spatial tasks to a set of workers, which requires that the workers are physically located at the location to perform the corresponding tasks. The flow of the spatial crowdsourcing task is as follows: 1. a Requester (Requester) submits its own task and its information accompanying the task to a crowdsourcing platform (agent). 2. The platform publishes a set of location-related tasks in a crowd-sourced form to some crowd-sourced practitioners (Worker). 3. The crowdsourcing practitioners accept the tasks delivered by the platform and need to go to the designated locations to perform the crowdsourcing tasks. Many practical problems are therefore modeled as crowd-sourced task problems to solve.

DQN is a deep reinforcement learning, an algorithm combining deep learning and reinforcement learning, and DQN is a variant of Q-learning algorithm. Q-Learning is a value-based algorithm in a reinforcement Learning algorithm, Q is Q (S, a), namely in an S state (S belongs to S) at a certain moment, an action a (a belongs to A) is taken to obtain the expectation of profit, and the environment feeds back corresponding reward according to the action of agent. The iterative formula for Q-learning is as follows:

Q(s_i,a_i)←(s_i,a_i)+α[r_i+1+γmaxQ(s_i+1a_i+1,)-Q(s_i,a_i)]

Q(s_i,a_i) Is the state and action at time i, r is the actual gain of the current action, γ is the attenuation, maxQ(s)_i+1a_i+1And) is the maximum value in the Q table that can be obtained when s' is selected from the Q table, and α is the learning degree.

The DQN adds a neural network on the basis of Q-learning, and is different from other machine learning algorithms, and the DQN can solve the problems of random conversion and reward without adjustment. DQN combines the convolutional neural network with Q-learning and introduces an empirical playback mechanism, so that a computer can learn a control strategy directly according to high-dimensional perception input.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a DQN-based space-time crowdsourcing task allocation method.

The invention comprises the following steps:

step 1, first, from the ringSituation acquisition to worker (W)_i) And its selectable task list (T)_i). The method specifically comprises the following substeps:

step 1.1, at the moment i, a requester issues a space-time crowdsourcing task to a crowdsourcing platform.

Step 1.2, at a certain worker W_iAfter arrival, a series of crowdsourcing tasks are obtained.

Step 1.3, according to a series of simple constraints (removing too far distance, completed tasks and the like), screening to form an optional task list T_i。

Step 2, extracting the worker W_iFeature vector and optional task list T of_iAre connected into a feature vector f_si. The method specifically comprises the following substeps:

step 2.1, extracting optional task list T_iThe feature vector of (2). Current research generally considers that the allocation strategy of spatio-temporal crowdsourcing task needs to consider factors such as return, cost, task type and current position. Therefore, we also consider the above factors. As shown in table 1: for the reward, the revenue represented as the task is ranked among all available tasks. For cost, consider two parts, one is the pick-up cost (i.e., the ranking of the acquisition task among all tasks) and one is the delivery cost (i.e., the ranking of the completion task among all tasks). For task types, it is considered that tasks are initiated at different stages of the day (such as morning, noon and evening). For the positions, they are encoded by the geohash method. Expressed as: f. of_ti。

Type (B)	Means of
		Reporting	Ranking of the benefits of the task among all available tasks
Cost of	Pick-up and delivery costs
		Task type	The task is initiated at different stages of the day
Position of	Encoding them by the geohash method

TABLE 1 composition of feature vectors

Step 2.2, extracting worker W_iThe feature vector of (2). For worker W_iFeature vector, worker W_iRecently completed task features may be used to simulate the computation of worker Wi_iProbability of completing the task in the future. Therefore, the worker W_iExpressed as the mean of the recently completed task features, expressed as f_wi. Meanwhile, in the completed task, the worker W_iThe closer the time of completion is to time i, the higher the occupancy of the feature vector. The expression formula is:

wherein n represents a worker W_iTotal number of recently completed tasks (in chronological order, from small to large), T_kTask completed the kth, (1/2)^n-kRepresenting the attenuation factor.

Step 2.3, putting the worker W_iAnd an optional task list T_iAre combined into a complete feature vector f_si. By combining f_tiAnd f_wiTo obtain f_si. Will f is_siAs input to the DQN. At f_tiThe characteristics of the tasks comprise the ranking of the rewards and costs, so the characteristic values of different available tasks are different. At f_wiMiddle worker W_iIs characterized by being completed byWhile taking into account the decay over time. Thus, f_wiRepresents a worker W_iA property when processing a task. The number of optional tasks is different at different times, so the maximum number of tasks is limited (set to maxT) and when the number of optional tasks is insufficient, padding with 0 to fix f_siThe number of the cells. Thus, f_siThe number of (2) is maxT.

Step 3, predicting recommendation W through a neural network_iThe possibility of (a). The method comprises the following specific steps:

step 3.1, feature vector f_siPut into Q network (Q-network (W)) representing worker and Q network (Q-network (R)) representing requester, respectively, and predict recommended behavior (a)_i) Scores in both networks. The structure of both networks is similar, as shown in fig. 2: composed of two types of Layers, i.e. Linear Layers (row-wise) and Attention Layers (multi-head), the feature vector f_siAnd converted to a Q value. The Linear Layers is a feed-forward layer (rFF: row-wise fed-forward layer), and the Attention Layers is an Attention mechanism layer. Input feature vector f_siRepresents a worker W_iAnd an optional task list T_iThe characteristics of (1). The initial Linear Layers are used to apply the feature vector f_siInto higher dimensional features. The conversion formula is as follows:

rFF(X)＝relu(XW+b)

x is the input, W and b are the learning parameters, relu is the activation function. The Attention Layers are used to calculate: worker W_iAnd optional task list T_iWith different combined feature weights. Here, two types of attachment Layers, soft-attachment layer and self-attachment layer, are used to achieve more accurate feature weights.

Then, after the above-mentioned Attention Layers, a Linear Layers is used. Helping to keep the network stable.

Next, two Attention Layers are again used, which enables the Q-Network to compute the worker W_iAnd optional task list T_iHigh order pairwise interactions between.

The final layer of Linear Layers reduces the characteristics of each element to a value.

And 3.2, after the scores of the two Q-networks are weighted and averaged, sorting the scores into a recommendation list.

Step 4, as worker W_iAfter obtaining the recommendation list, we consider W_iAnd after browsing in sequence, selecting one of the browsing to finish.

Step 5, according to the worker W_iUpon completion of the task, the recommendation list is quantified as an evaluation (r)_i)。

Step 6, counting the successful transfer process (S)_i，a_i，r_i，S_i+1) (when a)_iBy worker W_iSelection, successfully completed task) and failed transfer process (S)_i，a_i，0，S_i+1)(S_iIs represented by W_iCurrent state, a_iIs not processed by worker W_iSelecting the completed task, S_i+1Represents W_iState after performing the action), it is put into a training pool (memory pool).

And 7, using the data in the training pool for Q-network (W) training.

And 8, using the data in the training pool for Q-network (R) training.

The invention has the beneficial effects that: the invention provides a DQN-based space-time crowdsourcing task allocation method, and aims to provide an allocation strategy for space-time crowdsourcing tasks. Extracting feature vectors according to historical records of crowdsourcing initiators and workers, and continuously training in a neural network structure designed based on DQN, so that the network can better identify the characteristics and preferences of the crowdsourcing initiators and the workers to complete the distribution of space-time crowdsourcing tasks.

Drawings

FIG. 1 is a block diagram of the DAIN system;

FIG. 2 is a Q-network architecture diagram of the DAIN;

FIG. 3 is a comparative CR for Workers diagram;

FIG. 4 nDCG-CR for Workers comparison chart;

FIG. 5 is a comparative graph of CR for requests;

FIG. 6 nDCG-CR for requests comparison scheme;

FIG. 7 is a comparative CR for Balance diagram;

FIG. 8 nDCG-CR for Balance comparison chart.

Detailed Description

The invention provides a DQN-based space-time crowdsourcing task allocation method, and aims to provide an allocation strategy for space-time crowdsourcing tasks. And designing a neural network structure based on the DQN, extracting feature vectors for continuous training according to the history of crowdsourcing initiators and workers, and identifying the characteristics and the preference of the crowdsourcing initiators and the workers to complete the distribution of space-time crowdsourcing tasks.

In spatiotemporal crowdsourcing tasks, principals typically have Requesters (Requesters), crowdsourcing platforms (agents), and Workers (Workers). The working process is as follows; at some point, the requestor initiates a spatio-temporal crowdsourcing task. The platform allocates proper workers for the space-time crowdsourcing task, and the workers complete the task.

For the above problem, the present invention models a Markov Decision Process (Markov Decision Process) so that the allocation strategy is optimized.

FIG. 1 is a block diagram of a DAIN (deep Adaptive Interest Network for Task Assignment in Spatial Crowndsourceing) system. At time i, the requestor issues a spatiotemporal crowdsourcing task into the crowdsourcing platform. In a worker (W)_i) When the task arrives, a series of crowdsourcing tasks (T) are acquired_i). And screening the task list into an optional task list according to a series of constraints (step 1). Extraction worker W_iFeature vector and optional task list T of_iAre connected into a feature vector f_si(step 2). Then, the feature vector f is divided into_siPut into two DQNs (Q-network (W) and Q-network (R) on behalf of worker and requester, respectively), each behavior a is predicted_iFraction of two DQN. The two scores are combined and sorted into a recommendation list (step 3). When worker W_iAfter seeing the recommendation list, assume worker W_iBrowsing in sequence, and selecting one of them to finish(step 4). According to the worker W_iUpon completion of the task, the recommendation list is quantified as an evaluation (r)_i) (step 5), for Q-network (W), r_iThe calculation formula is as follows:

for Q-network (R), r_iThe calculation formula is as follows:

next, the successful transfer process is counted (S)_i，a_i，r_i，S_i+1) (when a)_iIs worked by worker W_iSelection, successfully completed task) and failed transfer process (S)_i，a_i，0，S_i+1) (when a)_iIs not processed by worker W_iThe completed task is selected) and placed in a training pool (memory pool) (step 6). The data in the training pool is used for the training of Q-network (W) and Q-network (R) (steps 7, 8).

Worker W_iRecently completed task features are used to simulate the computation of worker W_iProbability of completing the task in the future. Therefore, the worker W_iIs represented as a weighted average of its most recently completed features.

Current research generally considers that the allocation strategy of spatio-temporal crowdsourcing task needs to consider factors such as return, cost, task type and current position. As shown in table 1: for the reward, the revenue represented as the task is ranked among all available tasks. For cost, consider two parts, one is the pick-up cost (i.e., the ranking of the acquisition task among all tasks) and one is the delivery cost (i.e., the ranking of the completion task among all tasks). For task types, it is considered that tasks are initiated at different stages of the day (such as morning, noon and evening). For positions, they are coded by the geohash method, expressed as: f. of_ti。

TABLE 1 composition of feature vectors

For worker W_iFeature vector, worker W_iRecently completed task features may be used to simulate the computation of worker W_iProbability of completing the task in the future. Therefore, the worker W_iExpressed as the mean of the most recently completed tasks, expressed as f_wi. Meanwhile, in the completed task, the worker W_iThe closer the time of completion is to time i, the higher the occupancy of the feature vector. The expression is as follows:

By combining f_tiAnd f_wiTo obtain f_siA 1 is to f_siAs input to the DQN. At f_tiThe characteristics of the tasks include the ranking of rewards and costs, so the characteristic values of different available tasks are not the same. At f_wiMiddle worker W_iIs characterized by the history of the task it has completed, while taking into account the decay over time. Thus, f_wiRepresents a worker W_iA property when processing a task. The number of optional tasks is different at different times, so the maximum number of tasks is limited (set to maxT) and when the number of optional tasks is insufficient, padding with 0 to fix f_siThe number of the cells. Thus, f_siThe number of (2) is maxT.

As shown in FIG. 2, Linear Layers and Attention Layers are used to apply the feature vector f_siAnd converted to a Q value. The Linear Layers is a row-wise fed-forward layer, and the Attention Layers is an Attention mechanism. Input feature vector f_siRepresents a worker W_iAnd an optional task list T_iThe characteristics of (1). The initial Linear Layers are used to apply the feature vector f_siTo a higher dimensional trait. The conversion formula is as follows:

rFF(X)＝relu(XW+b)

x is the input, W and b are the learning parameters, relu is the activation function. Attention Layers for computing worker W_iAnd optional task list T_iWith different combined feature weights. Both soft-attribute layer and self-attribute layer are used to achieve more accurate feature weights. Then, adding rFF layer on the basis of original characteristics helps to keep the network stable. Then, again two Attention Layers are used, which enables the Q-Network to compute high order pairwise interactions between them, the last rFF layer reducing the characteristics of each element to a value.

The above is the main idea of the present invention, and then the validity of the method is verified on the real data set. The experimental data set used data of about 30000 tasks in west ann, china in 2016 and 10 months, provided by the dribble trip (Didi Chuxing) cover data open program. Each piece of data includes: driver ID, order ID, timestamp, longitude and latitude. From these data, information of each task is obtained. Considering that a task or a task list is recommended, the following manner is selected as the evaluation criterion:

CR (worker Completion Rate), when a worker arrives, the agent recommends a task or task list. If the task or the task list is consistent with the set task list, the task or the task list is 1;

nDCG-CR (normalized discrete temporal gain), NDCG represents the normalized loss-accumulation gain. The calculation formula is as follows:

the present invention will be compared with four other types of methods. All these methods are trained in real data. The methods are respectively random, Greedy + cosine similarity, neural network Greedy + neural network and DQN-based neural network, the methods calculate the return of tasks, and a proper task is selected for recommendation by predicting the return of the calculation tasks.

Random: a task recommendation is randomly selected for it in the selectable task list.

Greedy CS: the cosine similarity between the worker features and the task features is used as a completion rate, and the tasks are greedy selected or ordered according to the completion rate.

Greedy NN: the worker and task features are input into a neural network of two hidden layers to predict completion rates.

DDQN: two DQNs, based on a framework of deep reinforcement learning, maximize the benefit of the worker and requester, respectively. The Q-network is composed of a layer of Linear Layers and Attention Layers.

The model of the invention consists of two DQNs, and through experimental evaluation, the number of Q-network neurons is set to be 160. For other super-parameters, the settings are as in table 2:

radix Ginseng	Value of
		Number of neurons	160
target Q update frequency	50 (time) update
		Size of Buffer	500
Learning rate	0.002
		Reward gamma	0.98
Data size of training batch	64

TABLE 2 Superreference setup

The pytorech is used to implement the entire algorithm, and the code runs on a block of GeForce GTX 2080 Ti GPU. In terms of efficiency, the DAIN model of the present invention is similar to the comparative DDQN.

Since the data provided by the drip line does not take into account the diverse preferences of the workers, the present invention requires that it be pre-processed for evaluation. Workers are mainly divided into 4 types:

long-distance orders are favored and traffic congestion is rejected.

The short-distance order is favored, and the traffic jam is refused.

The long-distance order is favored, and the traffic jam can be accepted.

The short-distance order is liked, and the traffic jam can be accepted.

Each worker is randomly set its type of preference and sets its "selected" tasks as tasks in its set of available tasks that match the preference.

As shown in fig. 3-8, the DAIN model of the present invention is divided into two DQNs, and therefore, DQN (r) and DQN (w) are also compared with each algorithm, respectively. Random is less effective because it does not predict whether the worker will complete the task. Both Greedy CS and Greedy NN deal with only short-term rewards in front of the eye, which results in poor performance. These algorithms do not work effectively in the face of complex crowdsourcing tasks. The DDQN algorithm performs relatively well, but the DAIN model of the invention performs better due to the lack of consideration of the spatio-temporal behavior of the crowdsourcing task in the DDQN model.

In the DAIN model of the invention, the task order is sorted according to the Q values summarized by Q-network (W) and Q-network (R). In fig. 7 and 8, performance is evaluated in terms of overall benefit. Clearly, both Random and Greedy are not effective. DDQN achieves good results by modeling complex relationships between workers and tasks through neural networks, predicting current and future rewards, and updating parameters, but the model structure of DDQN is relatively simple for handling complex spatio-temporal crowdsourcing tasks. Compared with DDQN, the DQN structure of the DAIN model is more complex, and the experimental performance is better. The DAIN model not only has the advantages of the DDQN, but also is more suitable for processing space-time crowdsourcing tasks.

Claims

1. A DQN-based space-time crowdsourcing task allocation method is characterized by comprising the following steps:

step 1, obtaining a worker W from an environment_iAnd its selectable task list T_i；

Step 2, extracting the worker W_iSum of feature vectors ofTask selection list T_iAre connected into a feature vector f_siThe method comprises the following specific steps:

step 2.1, extracting optional task list T_iCharacteristic vector f of_ti；

For the return, expressed as a ranking of the revenue for that task among all available tasks;

regarding the cost, two parts are considered, one part is the picking cost, namely the ranking of the tasks in all the tasks is obtained; another part is the delivery cost, i.e. the ranking of the completed task among all tasks;

for the task type, consider that the task initiates in different stages of a day;

for the positions, they are encoded by the geohash method;

step 2.2, extracting worker W_iThe feature vector of (2);

will worker W_iCharacteristic vector f of_wiA weighted average of features expressed as their most recently completed tasks;

step 2.3, putting the worker W_iAnd an optional task list T_iAre combined into a complete feature vector f_si(ii) a By combining f_tiAnd f_wiTo obtain f_si(ii) a Will f is_siAs input to DQN;

step 3, predicting recommendation W through a neural network_iThe specific steps are as follows:

step 3.1, feature vector f_siPut into Q network representing worker and Q network representing requester respectively, and predict recommended behavior (a)_i) Scores at both networks;

step 3.2, after the two scores are weighted and averaged, sorting the scores into a recommendation list;

step 4, as worker W_iAfter obtaining the recommendation list, consider W_iAfter browsing in sequence, selecting one of the browsing;

step 5, according to the worker W_iUpon completion of the task, the recommendation list is quantified as an evaluation (r)_i)；

Step 6, counting the successful transfer process (S)_i，a_i，r_i，S_i+1) And failed transfer procedure (S)_i，a_i，0，S_i+1) Putting the mixture into a training pool (memory pool);

step 7, training a Q network representing a worker by using the data in the training pool;

and 8, training the Q network representing the requester by using the data in the training pool.

2. The DQN-based space-time crowdsourcing task allocation method according to claim 1, wherein the step 1 specifically comprises:

step 1.1, at a moment i, a requester issues a space-time crowdsourcing task to a crowdsourcing platform;

step 1.2, at a certain worker W_iAcquiring a series of crowdsourcing tasks after arrival;

step 1.3, screening the task list into an optional task list T according to the constraint_i。

3. The DQN-based space-time crowdsourcing task distribution method according to claim 2, characterized in that in step 2.2: in the completed task, the worker W_iThe closer the time of completion is to time i, the higher the occupancy of the feature vector.

4. The DQN-based space-time crowdsourcing task distribution method according to claim 1, characterized in that in step 2.2:

at different moments, the number of the selectable tasks is different, the maximum number of the tasks is limited, and the maximum number is set to be maxT;

when the number of optional tasks is not sufficient, fill with 0 to fix f_siThe number of the cells.