CN112700074A

CN112700074A - Express task planning method and device

Info

Publication number: CN112700074A
Application number: CN201911007457.4A
Authority: CN
Inventors: 刘宇航; 王晗; 郑欣欣
Original assignee: Navinfo Co Ltd
Current assignee: Navinfo Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2021-04-23
Anticipated expiration: 2039-10-22
Also published as: CN112700074B

Abstract

The embodiment of the invention provides a method and a device for planning express delivery tasks. By combining the Markov decision process with the planning of the express task, on the basis of meeting the position constraint of the consignee and the pickup time constraint, the proper consignee is matched for the order, and the work efficiency of the consignee is improved.

Description

Express task planning method and device

Technical Field

The invention relates to the technical field of logistics, in particular to a method and a device for planning an express task.

Background

Under internet technology and electronic commerce's rapid development, the change of covering over the sky has also taken place for the operation mode of commodity circulation trade, for example, the express delivery is from the appearance of carrying some, intelligent express delivery cabinet, and people can realize receiving and posting the express delivery by oneself, very big life that has made things convenient for people. Further, with the continuous abundance of mobile Application programs (APPs), the number of mobile Application programs related to logistics is increasing, and the user can also realize functions of placing orders, sending mails, querying logistics and the like without leaving a house by installing the related APPs.

In a logistics system, dispatchers send and receive express items are extremely important links. For collecting express mails, the traditional method is to distribute a pickup order to a distributor corresponding to the pickup address according to the pickup address in the pickup order, and then the distributor selects a path according to experience and contacts clients at the reserved time in the collection order to collect the express mails.

By adopting the mode, the work efficiency of the dispatching personnel is lower.

Disclosure of Invention

The embodiment of the invention provides a method and a device for planning an express task, which are used for improving the operation efficiency of a receiver and a dispatcher.

In a first aspect, an embodiment of the present invention provides a method for planning an express task, including:

the method comprises the steps of obtaining an order and determining a strategy set according to a pickup address in the order, wherein the strategy set comprises a plurality of distribution strategies, and each distribution strategy is used for indicating that the order is distributed to a dispatcher in a preset range including the pickup address;

according to a Markov decision process, obtaining a return value corresponding to each distribution strategy in the strategy set, wherein the return value represents the distance between an order and a consignee, and the consignee deviates from a first path which is a path corresponding to the order distributed to the consignee;

and distributing the order to a target dispatching member according to the return value, wherein the target dispatching member is one of the plurality of dispatching members.

Optionally, the target addressee is an addressee with the smallest return value in the plurality of addressees.

Optionally, the markov decision process is comprised of a five-tuple comprising: the order distribution condition, the state transition probability matrix, the return function, the discount factor and the strategy set of the dispatching member are obtained;

the state transition probability matrix represents the probability of sending a mail by a user in a preset area;

the reward function is used for calculating a reward value obtained by taking action by a dispatcher;

the discount factor is a discount coefficient of a return value obtained by an action taken by a dispatcher relative to a return value obtained by a last action taken.

Optionally, the obtaining, according to a markov decision process, a return value corresponding to each allocation policy in the policy set includes:

obtaining a return value of the dispatching member from the current state to the next state according to the return function, the discount factor and the state transition probability matrix;

and updating the next state to the current state, performing iterative computation until the return value is converged, and determining the return value as the return value corresponding to the distribution strategy.

Optionally, the method further comprises:

if the order receiving and dispatching member with the minimum return value does not receive the order, distributing the order to the next order receiving and dispatching member according to the sequence of the return values from small to large, and if the order receiving and dispatching member with the second return value receives the order, determining the order receiving and dispatching member with the second return value as a target order receiving and dispatching member;

if the order receiving and dispatching member with the second return value does not receive the order, distributing the order to the next receiving and dispatching member according to the sequence of the return values from small to large;

and repeating the above process, and if the first N dispatchers do not receive orders in the sequence of the return values from small to large, adding the orders into the order grabbing pool.

Optionally, after the allocating the order to the target receiving and dispatching person, the method further includes:

and replanning a second path of the target receiver according to the ant colony algorithm, the order and the order distribution condition of the target receiver.

and inserting the pickup address into a first path corresponding to the target receiving and dispatching person according to the pickup address in the order to obtain a second path.

Optionally, the method further comprises:

and updating the second path according to the order with changed reserved time in the orders of the target receiving and dispatching personnel.

In a second aspect, an embodiment of the present invention provides an express task planning device, where the device includes:

the system comprises a first acquisition module, a first storage module and a second acquisition module, wherein the first acquisition module is used for acquiring an order and determining a strategy set according to a pickup address in the order, the strategy set comprises a plurality of distribution strategies, and each distribution strategy is used for indicating that the order is distributed to a consignee in a preset range including the pickup address;

a second obtaining module, configured to obtain a return value corresponding to each policy in the policy set according to a markov decision process, where the return value represents a distance that an order is allocated to a consignee and the consignee deviates from a first path, and the first path is a path corresponding to the order allocated to the consignee;

and the allocation module is used for allocating the order to a target dispatcher according to the return value, wherein the target dispatcher is one of the dispatchers.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: memory, processor, and computer program instructions;

wherein the memory is to store the computer program instructions;

the processor executes the computer program instructions to perform the method of the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a readable storage medium, including: carrying out a procedure;

the program, when executed by a processor, is operable to perform the method of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1A is a schematic flow chart of a method for planning an express delivery task according to the present invention;

fig. 1B is a schematic flow chart of a first method for planning an express delivery task according to an embodiment of the present invention;

FIG. 1C is a diagram illustrating a Markov report process provided by the present invention;

FIG. 1D is a schematic diagram illustrating the change of order form factors during decision optimization according to the present invention;

fig. 2 is a schematic flow chart of a second express task planning method provided by the present invention;

fig. 3 is a schematic flow chart of a third method for planning an express delivery task according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a fourth method for planning an express delivery task according to the present invention;

fig. 5 is a schematic structural diagram of a first express task planning device provided by the present invention;

fig. 6 is a schematic structural diagram of a second express task planning device provided by the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a first embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a logistics system, dispatchers send and receive express items are extremely important links. Particularly, for collecting express mails, the conventional method is to allocate an order to a distributor corresponding to a parcel area to which the pickup address belongs according to the pickup address in the order, select a route according to experience by the distributor, and contact the reserved pickup time in the order with a client to collect the express mails.

By adopting the mode, the orders are distributed to the dispatching and receiving personnel corresponding to the areas according to the areas divided by the logistics company, whether the dispatching and receiving personnel are the most suitable personnel is not considered, the operation efficiency of the dispatching and receiving personnel is low, and the maximization of resource utilization cannot be realized. Therefore, the embodiment of the invention provides a method for planning an express task, which is used for solving the problems in the prior art, improving the operation efficiency of a dispatcher and maximizing the resource utilization.

Referring to fig. 1A, in the express task planning method provided in the embodiment of the present invention, which order is accepted by which order taker is used as a decision variable, the position of the order taker and the pickup time in the order are used as constraint conditions, the distance from the original driving path after the order taker accepts the order is the minimum or the consumed time is the minimum, and the pickup time in the order is reached as an optimization target, so that the work efficiency of the order taker is improved, and the maximization of resource utilization is achieved.

The express task planning method provided by the embodiment of the invention is described in detail through a plurality of specific embodiments.

Fig. 1B is a schematic flow chart of a first method for planning an express delivery task according to an embodiment of the present invention. The execution main body of the express task planning method provided by the embodiment of the invention can be the express task planning device provided by the embodiment of the invention, and the express task planning device can be realized in any software and/or hardware mode.

The courier mission planning tool may illustratively be an electronic device such as a terminal device, computer system, server, etc., which is operable with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, hand-held or laptop devices, microprocessor, CPU, GPU based systems, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above systems, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

In this embodiment, a planning apparatus that executes an express task will be described as an example.

As shown in fig. 1B, the method of the present embodiment includes:

s101, obtaining an order and determining a strategy set according to a pickup address in the order.

Specifically, when a user needs to send a mail, the user can place an order through the client to generate an order, wherein the order can comprise information such as a mail taking address, a mail taking time, a mail taking contact name and a mail taking contact telephone. After the planning device of the express delivery task acquires the order, according to the pickup address in the order, a plurality of dispatchers in a preset range including the pickup address are determined, so that a policy set is generated, that is, the policy set includes a plurality of distribution policies, and each distribution policy is used for indicating that the order is distributed to one of the dispatchers. The policy set referred to in this embodiment can reflect the allocation object of the order.

For example, the planning device for the express delivery task may determine the policy set by:

according to a possible implementation manner, the planning device of the express task determines the dispatchers in a range taking the pickup address as the center and taking the preset distance as the radius according to the pickup address in the order, and generates a strategy set according to the determined dispatchers. For example, if the longitude and latitude information corresponding to the pickup address is (a, B), the express task planning apparatus may determine that the current location is a recipient within a radius range of 3 kilometers around (a, B), and generate a policy set according to the determined recipient.

In another possible implementation manner, the express task planning device determines, according to the pickup address in the order, a consignee of the operation area where the pickup address is located and a consignee of at least one operation area adjacent to the operation area where the pickup address is located, and generates a policy set according to the determined consignee. Specifically, in logistics operation, an operation area is generally divided according to geographical location information, each operation area corresponds to one or more dispatchers, the express task planning device determines an operation area to which the pickup address belongs according to the pickup address, determines one or more dispatchers of the operation area, determines one or more dispatchers of at least one operation area adjacent to the operation area to which the pickup address belongs, and generates a policy set according to the determined dispatchers.

For example, the express task planning device determines, according to the pickup address, an operation area a to which the pickup address belongs, where the operation area a corresponds to a dispatcher a and a receiver B, the operation area adjacent to the operation area a includes an operation area B and an operation area C, the operation area B corresponds to a dispatcher C, and the operation area D corresponds to a dispatcher D and a receiver e, and then may generate a policy set according to the dispatcher a, the receiver B, the receiver C, the receiver D, and the receiver e, or may generate a policy set according to the dispatcher a, the receiver B, and the receiver C.

It should be noted that the two ways of determining the policy set shown above are only exemplary, and the policy set may also be determined in other ways in practical applications, which is not limited by the present invention.

S102, obtaining a return value corresponding to each distribution strategy in the strategy set according to the Markov decision process.

In order to make the technical solution shown in the embodiment of the present invention clearer, a markov decision process is first described in detail here:

markov process:

markov processes represent a class of stochastic processes that represent that future states are only related to the state at the current time, and not to the state at the historical time.

Setting { X (T), T ∈ T } as a random process, wherein X (T) represents a spatial state corresponding to T time in the random process, E represents a state space set, and if any T is detected, the state space set is updated₁<t₂<…<t_n<t, arbitrary x₁,x₂,…,x_nX ∈ E, random variable X (t) at known variable X (t)₁)＝x₁,…,X(t_n)＝x_nThe conditional distribution function below is only equal to X (t)_n)＝x_nRelated to X (t)₁)＝x₁,…,X(t_n-1)＝x_n-1Independent, i.e. the conditional distribution function satisfies the formula F (x, t | x)_n,x_n-1,…,x₂,x₁,t_n,t_n-1,…,t₂,t₁)＝F(x,t|t_n,t_n-1) This property is called a markov property, and a stochastic process satisfying the markov property is called a markov process.

Stochastic processes that have a markov property and exist within a discrete set of indices and a state space are also referred to as markov chains. The Markov chain is defined by the state transition probability, which is the probability that a random variable goes from one instant i to the next instant j, from state s_iTransition to state s_jCan be formulated as:

P(i→j)＝P_ij＝P(X_t+1＝s_j|X_t＝s_i)

wherein, P_ijRepresenting the probability of a state transition to a state; s represents a set of states, S_iIndicates the state, s, corresponding to the time i_jRepresents the state, s, corresponding to time j_iAnd s_jIs one state in the set of states S.

Markov reward process representation:

the Markov return process may be defined by a quadruple

Wherein S represents a set of states, S_i∈S，s_iIndicating the status of the ith step.

P denotes the state transition probability, in particular, P denotes the probability at s_iUnder the state of belonging to S, after the action of belonging to a belonging to A, the probability distribution condition of other states can be transferred, A represents a strategy set, and a is one strategy in the strategy set. For example, in state s_iLower execution policy a, transition to state s_jCan be expressed as P(s)_j|s_iA), or so to speak in state s_iExecute action a down, transition to state s_jCan be expressed as P(s)_j|s_i,a)。

It can be understood that, in this embodiment, executing a certain allocation policy in the policy set is to execute a certain action in the policy set.

U represents a reward function, if a set(s)_iA) transition to the next state s_jThen the return function can be noted as U(s)_j|s_iA). If(s)_iA) the corresponding next state s_jIs unique, then the reward function can also be denoted as U(s)_iA), wherein(s)_iA) is represented in the state s_iExecute action a down, state transition to state s_j。

And the discount factor is expressed, specifically, the discount factor of each return to the previous return is expressed, and the influence of executing a certain strategy in the future on the current state can be reduced by the discount factor.

In the Markov reward process, the expected reward for a state is formulated as:

wherein H(s) represents a state_sThe value of the return of (a) is,

the reward value representing the status s' is relative to the status_sP (s, s ') represents the probability of transition from state s to state s', and H (s ') represents the return value of state s'.

The markov reward process is illustrated here by a specific example: as shown in fig. 1C, the markov report process includes 6 nodes, which are node 0, node 1, node 2, node 3, node 4, and node 5, each node is in one state, and the states corresponding to the nodes may be the same or different. Taking node 1 as a starting point (node 1 represents the state at the current moment), one of the policies in the policy set may be executed according to the policy set, and then node 1 may transition from the current state to the next state, i.e., to the next node, as shown in fig. 1C, node 1 may transition from the current state (node 1) to the state in which node 2 or node 3 is located by executing a certain policy, and node 1 executes a policy. Similarly, the node 3 may execute a policy, and the state of the node 3 is transferred to the state of the node 4 or the node 5, and the policy executed by the node 3 may have a corresponding reward value.

As shown in fig. 1C, the return value of node 1 is 25.9, and specifically, the return value of node 1 is equal to the sum of the return value 20 obtained according to the return function corresponding to node 1, the product of the return value 5 obtained according to the return function corresponding to node 2 and the discount factor 0.1, and the product of the return value 6 obtained according to the return function corresponding to node 3 and the discount factor 0.9.

Similarly, if the current state is node 3, the return value of node 3 is 13.6, specifically, the return value of node 3 is equal to the return value 6 obtained by the return function corresponding to node 3, the product of the return value 2 obtained by the return function corresponding to node 4 and the discount factor 0.2, and the product of the return value 9 obtained by the return function corresponding to node 5 and the discount factor 0.8, which are the sum of the three values.

In the embodiment of the application, when the computing node executes the return generated by each strategy, the influence of the future on the current step is reduced by the discount factor.

Markov decision process:

in the Markov reporting process, a strategy set A is added, namely the Markov decision problem is obtained. And solving the Markov decision problem, namely solving the strategy for obtaining the maximum expected income. In the embodiment of the invention, solving the Markov decision problem is to obtain the most appropriate dispatcher.

In the Markov decision process, each strategy a in the strategy set A is subjected to iterative computation respectively to obtain a return value corresponding to each strategy a in the strategy set A, and then the strategy which can obtain the maximum expected income is determined according to the return value of each strategy a.

Specifically, in the markov decision process, the corresponding reward value of each policy can be obtained by:

step one, initializing return functions U(s) and H(s) of all policies;

step two, aiming at each strategy, evaluating the current strategy by using a current return function U(s) to obtain a return value of each strategy;

step three, carrying out iterative computation aiming at each strategy until the return value corresponding to the strategy is converged; the iterative method is as follows:

assuming that the number of iterations is k, the return value corresponding to the strategy is calculated according to the following formula:

wherein k is the number of iterations,

representing the discount factor at the first iteration,

indicating the discount factor at the second iteration, … …,

representing the discount factor at the k-th iteration.

The physical meaning is that the future k cases are considered, i.e. in this embodiment, the k orders that may appear in the future are considered.

Due to the fact that

Then as the number of iterations increases,

the value of (a) will tend to 0, and the value of H(s) will tend to stabilize. Let esp. H(s)_k-H(s)_k-1Wherein, H(s)_kRepresents the reward value corresponding to the strategy obtained by the k iteration, H(s)_k-1And the return value corresponding to the strategy obtained by the k-1 iteration is shown.

When the value of esp is less than a preset threshold, the iteration is considered complete, at which time H(s)_kI.e. the corresponding return value of the policy.

And step four, determining the strategy capable of obtaining the maximum expected income according to the return value corresponding to each strategy in the strategy set A.

Next, a detailed description is given of the markov decision process in order allocation:

in the three cases shown in fig. 1C, time T is the current time, when the decision is made at time T, the orders before time T are placed, the number of placed orders may be one or multiple, and only the case where there are multiple placed orders is shown in the three cases shown in fig. 1C. As shown in the first case of fig. 1C, an order may appear at some point in the future, and as shown in the second case of fig. 1D, a batch of orders (i.e., multiple orders) may also appear simultaneously at some point in the future. In the embodiment, it is considered that the orders that may appear in a future period of time may be one order as shown in the first case in fig. 1D, or a batch of orders as shown in the second case in fig. 1D, and for convenience of description, future orders as shown in the third case in fig. 1D are used for representation.

Because the occurrence of the logistics order is random, only one current order is considered in the traditional mode, namely, the traditional mode only optimizes a single decision for a single order, and the aim of the invention is to maximize the accumulated resource utilization rate after the application for a period of time in the future, namely, the method in the embodiment of the invention can realize the global optimum of the resource utilization rate.

In this embodiment, the markov decision process in order distribution is composed of five tuples, including: the order distribution condition of the order taker, the state transition probability matrix, the return function, the discount factor and the strategy set. The order distribution conditions of the above-mentioned receiving and dispatching members include the order distribution conditions corresponding to all the receiving and dispatching members in the policy set determined in step S101; the state transition probability matrix represents the probability distribution of the user sending in the preset area; the return function is used for calculating a return value obtained by taking action by the dispatcher; the discount factor is a discount factor of the return value obtained by the action taken by the dispatcher relative to the return value obtained by the last action taken.

In this embodiment, the reward value of the allocation policy indicates that the order is allocated to the order taker, and the order taker is away from the first path, which is the path corresponding to the order already allocated to the order taker. When the return value is smaller, the distance of the order taker deviating from the first path after receiving the order is smaller, and correspondingly, the time consumed by the order taker for completing the order is shorter, so that the resource utilization rate is higher; when the return value is larger, the distance of the order receiving and dispatching member deviating from the first path after receiving the order is larger, and correspondingly, the more time the order receiving and dispatching member spends to complete the order is, the lower the resource utilization rate is.

Specifically, for each allocation strategy, a return value of a distributor corresponding to the allocation strategy from the current state to the next state is obtained according to a return function, a discount factor and a state transition probability matrix. That is, for each dispatching member, the distance of the dispatching member deviating from the original path is obtained according to the return function, discount factor and state transition probability matrix corresponding to the dispatching member to distribute the order to the dispatching member.

And then updating the next state to the current state, wherein the next state is the state after the order is received by the consignee, and performing iterative computation until the return value corresponding to the allocation strategy is converged. That is, the next order that may appear in the future is assigned to the order taker, and the order taker receives the obtained return value, which is also obtained according to the return function, the discount factor and the state transition probability matrix. And aiming at the next order which is likely to appear in the future, and allocating the next order which is likely to appear in the future to the dispatching member, wherein the dispatching member receives the obtained return value, and the return value can be obtained according to the return function, the discount factor and the state transition probability matrix. And repeating the above processes continuously until the return value corresponding to the receiving and dispatching member is converged.

In the markov decision process, the discount factor and the state transition probability matrix can be obtained through historical mail data, and in practical application, the discount factor and the state transition probability matrix can be updated according to continuously updated historical mail data, so that the return value corresponding to the obtained distribution strategy is more accurate.

S103, distributing the order to the target collector according to the return value.

Specifically, in this step, the target addressee is determined according to the return values corresponding to all the distribution policies in the policy set.

According to the return value corresponding to each distribution strategy in the strategy set, the distribution strategy with the minimum return value is determined as the target distribution strategy, that is, the order is distributed to the distributor corresponding to the distribution strategy with the minimum return value.

Another possible implementation manner is that according to the return value corresponding to each allocation policy in the policy set, N allocation policies are determined to be alternative allocation policies, and any one allocation policy is selected from the N alternative allocation policies to be the target allocation policy, that is, the order is allocated to any one of the dispatchers corresponding to the first N allocation policies with return values sorted from small to large, where N is a positive integer. For example, N ═ 3. It will be appreciated that the number of alternative allocation policies is less than or equal to the number of allocation policies in the policy set.

For how to determine the N candidate allocation policies in the second implementation manner, the following may be implemented: in the first implementation manner, the return value corresponding to each allocation strategy in the strategy set is compared with a preset return value, and the allocation strategy with the return value smaller than the preset return value is determined as the alternative allocation strategy, so that the N alternative strategies are determined. And determining N distribution strategies ranked at the top as alternative distribution strategies according to the sequence of the return values from small to large of all the distribution strategies in the strategy set. Of course, the alternative allocation policy may also be determined in other manners, and the two implementations shown above are only exemplary and are not limiting on the implementation manner of determining the alternative allocation policy.

Optionally, after determining the target receiving and dispatching person, the express task planning device sends the order to a terminal device held by the target receiving and dispatching person, and the target receiving and dispatching person can check detailed information of the order through the terminal device and can autonomously select whether to take the order according to own will.

In this embodiment, an order is obtained, a policy set is determined according to a pickup address in the order, further, a return value corresponding to each policy in the policy set is obtained according to a markov decision process and the policy set, and the order is distributed to a consignee with the smallest return value among a plurality of consignees. By combining the Markov decision process with the express task planning, the most suitable consignee is matched for the order on the basis of meeting the position constraint of the consignee and the pickup time constraint, so that the operation efficiency of the consignee is improved.

Next, detailed descriptions are respectively given for two cases, namely, the target dispatcher is the dispatcher corresponding to the distribution strategy with the minimum return value in the distribution strategies, and the target dispatcher is the dispatcher corresponding to any one of the first N distribution strategies arranged according to the preset sequence in the distribution strategies.

Fig. 2 is a schematic flow chart of a second express task planning method provided by the present invention. As shown in fig. 2, the method of the present embodiment includes:

s201, obtaining an order and determining a strategy set according to a pickup address in the order.

S202, obtaining a return value corresponding to each distribution strategy in the strategy set according to the Markov decision process.

Steps S201 and S202 in the embodiment shown in fig. 2 are similar to steps S101 and S102 in the embodiment shown in fig. 1, respectively, and refer to the detailed description in the embodiment shown in fig. 1, which is not repeated herein.

And S203, distributing the order to the target consignee according to the return value, wherein the target consignee is the consignee corresponding to the distribution strategy with the minimum return value in the plurality of distribution strategies.

S204, judging whether the target dispatcher receives the order, if so, executing the step S205, and if not, executing the step S206.

S205, planning a second path of the target receiving and dispatching person according to the order and the order distribution condition of the target receiving and dispatching person.

In one possible implementation, the second path of the target dispatcher is re-planned according to the ant colony algorithm, the order and the order allocation condition of the target dispatcher. According to another possible implementation manner, the order is inserted into the first path corresponding to the target receiving and dispatching person according to the pickup address in the order, so that a second path is obtained.

And S206, updating the target consignee to the consignee corresponding to the second-ranked distribution strategy according to a preset sequence, and distributing the order to the updated target consignee.

Wherein, the preset sequence is the sequence of the return values from small to large.

And S207, judging whether the updated target dispatcher receives the order, if so, executing the step S205, and if not, executing the step S208.

And S208, updating the target dispatching member as the dispatching member corresponding to the distribution strategy ranked third according to a preset sequence, and distributing the order to the updated target dispatching member, wherein the preset sequence is the sequence of the return values from small to large.

S209, judging whether the updated target dispatcher receives the order, if so, executing the step S205, and if not, executing the step S210.

And S210, adding the order into an order grabbing pool.

The order taking pool comprises a plurality of orders which are not taken by the received and dispatched members, and the orders which are not taken by the received and dispatched members are added into the order taking pool, so that more receiving and dispatching members can obtain order information and select proper orders according to the actual conditions of the receiving and dispatching members. By the mode, the problem that after the order is placed, a user does not receive the order for a long time, and the user experience is poor can be avoided.

It should be noted that, in this embodiment, in order of the return value being small to large, the foregoing 3 allocation policies are described as an example of the alternative allocation policy. Of course, in practical applications, the number of alternative allocation strategies may be greater or smaller.

In this embodiment, according to the order from small to large of the return values of the allocation policies in the policy set, the order is preferentially allocated to the dispatching member corresponding to the allocation policy with the smallest return value, and in the case that the dispatching member has not received an order, the order is allocated to the dispatching member corresponding to the allocation policy with the second rank. Further, if order distribution is carried out for multiple times and no order is received by the receiving and dispatching personnel, the order is added into the order grabbing pool, so that more receiving and dispatching personnel can obtain order information, the order receiving probability is improved, and the user experience is effectively improved.

Optionally, on the basis of the embodiment shown in fig. 2, if an order with a changed reservation time in the order of the target dispatching member is selected, the second path is updated, where the preset time may be the reservation time of the dispatching order of the target dispatching member, or the preset time may also be the picking time in the picking order of the target dispatching member.

Fig. 3 is a schematic structural diagram of a third express task planning method provided by the present invention. As shown in fig. 3, the method of the present embodiment includes:

it should be noted that, in the embodiment shown in fig. 3, the target dispatcher in step S303 is a dispatcher corresponding to any one of the first N distribution policies arranged according to the preset sequence in the plurality of distribution policies.

S301, obtaining an order and determining a strategy set according to a pickup address in the order.

S302, obtaining a return value corresponding to each distribution strategy in the strategy set according to the Markov decision process.

Steps S301 and S302 in the embodiment shown in fig. 3 are similar to steps S101 and S102 in the embodiment shown in fig. 1, respectively, and refer to the detailed description in the embodiment shown in fig. 1, which is not repeated herein.

And S303, distributing the order to a target distributor according to the return value, wherein the target distributor is a distributor corresponding to any one of the first N distribution strategies which are arranged in a preset sequence in the plurality of distribution strategies.

S304, judging whether the target receiving and dispatching person receives the order, if so, executing the step S305, and if not, executing the step S306.

S305, planning a second path of the target receiving and dispatching person according to the order and the order distribution condition of the target receiving and dispatching person.

S306, distributing the order to the dispatching member corresponding to any one distribution strategy in the N-1 distribution strategies, and updating the target dispatching member.

Specifically, since the order is not received by the dispatching member corresponding to the selected allocation policy in step S303, one allocation policy may be selected from the remaining N-1 allocation policies, and the order may be allocated to the dispatching member corresponding to the allocation policy.

In one possible implementation, any one of the remaining N-1 allocation policies is selected, and the order is allocated to the corresponding distributor of the allocation policy.

S307, it is determined whether the updated target dispatcher accepts orders, if yes, step S305 is executed, and if not, step S308 is executed.

And S308, distributing the order to the dispatching member corresponding to any one distribution strategy in the N-2 distribution strategies, and updating the target dispatching member.

Specifically, since the order is not received by the dispatching member corresponding to the selected distribution policy in step S303 and step S306, one distribution policy may be selected from the remaining N-2 distribution policies, and the order may be distributed to the dispatching member corresponding to the distribution policy.

In one possible implementation, any one of the remaining N-2 allocation policies is selected, and the order is allocated to the consignee corresponding to the allocation policy.

S309, judging whether the updated target receiving and dispatching member receives the order, if so, executing the step S305, and if not, executing the step S310.

And S310, repeatedly executing the order distribution process until no order is received by the dispatching personnel corresponding to the N distribution strategies, and adding the order to the order taking pool.

Exemplarily, the policy set includes 10 allocation policies, and the first 3 allocation policies arranged in the order of the return value from small to large are alternative allocation policies, and are respectively denoted as allocation policy 1, allocation policy 2, and allocation policy 3. First, an order is assigned to a receiver corresponding to any one of the 3 allocation policies, for example, an order is assigned to a receiver corresponding to allocation policy 1. If the order is not received by the dispatching clerk corresponding to the distribution strategy 1, the order is distributed to any one of the remaining two distribution strategies, for example, the order is distributed to the dispatching clerk corresponding to the distribution strategy 2. And if the order receiving and dispatching member corresponding to the distribution strategy 2 still does not receive the order, distributing the order to the order receiving and dispatching member corresponding to the distribution strategy 3, and if the order receiving and dispatching member corresponding to the distribution strategy 3 does not receive the order, adding the order to the order grabbing pool so that more order receiving and dispatching members can obtain order information. In the 3-time allocation process, if any one of the allocation policy 1, the allocation policy 2 and the allocation policy 3 receives an order from the dispatching taker corresponding to the allocation policy, the second path of the dispatching taker is planned according to the order and the order allocation condition of the dispatching taker.

In this embodiment, by determining a plurality of candidate allocation policies (i.e., a plurality of allocation policies ranked top in descending order of return values), allocating an order to a consignee corresponding to any one of the plurality of candidate allocation policies, allocating the order to a consignee corresponding to any one of the remaining N-1 allocation policies if the consignee does not accept the order, and if the consignee does not accept the order yet, allocating the order to a consignee corresponding to any one of the remaining N-2 allocation policies, and by repeatedly executing the order allocation process, allocating the order to a consignee corresponding to any one of the plurality of candidate allocation policies, the location constraint condition and pickup time constraint condition of the consignee are satisfied, which not only can improve the work efficiency of the consignee, meanwhile, the probability of order taking is improved, and the user experience is effectively improved. Further, if no order is received by the order receiving and dispatching personnel corresponding to the multiple alternative allocation strategies, the order is added to the order grabbing pool, so that more order receiving and dispatching personnel can obtain order information, the order receiving probability is improved, and the user experience is effectively improved.

Optionally, on the basis of the embodiment shown in fig. 3, if an order with a changed reservation time in the order of the target dispatching member is selected, the second path is updated, where the preset time may be the reservation time of the dispatching order of the target dispatching member, or the preset time may also be the picking time in the picking order of the target dispatching member.

Fig. 4 is a flowchart of a third express task planning method provided by the present invention. As shown in fig. 4, a user sends a mail request to generate an order, a planning apparatus of an express task obtains order information, and determines a target order taker by using the method for planning an express task shown in fig. 2 or fig. 3, and performs a first order dispatch, if the first target order taker does not take an order, then further determines a next target order taker, if the target order taker does not take an order yet, then further determines a next target order taker, and if the target order taker does not take an order yet, then adds the order to an order taking pool.

By adopting the method in the embodiment of the invention, the operation efficiency of the order taker can be improved, the order taking probability of the order is improved, and the user experience is effectively improved. Furthermore, order distribution is carried out for many times, and orders are added to the order grabbing pool if a plurality of order acceptors do not accept orders, so that more order acceptors can obtain order information, the order accepting probability is improved, and the user experience is effectively improved.

Fig. 5 is a schematic structural diagram of a first express task planning device provided by the present invention. As shown in fig. 5, the apparatus 50 of the present embodiment includes: a first acquisition module 51, a second acquisition module 52 and an assignment module 53.

The first obtaining module 51 is configured to obtain an order, and determine a policy set according to a pickup address in the order, where the policy set includes a plurality of allocation policies, and each allocation policy is used to indicate that the order is allocated to a dispatcher in a preset range including the pickup address.

A second obtaining module 52, configured to obtain a return value corresponding to each policy in the policy set according to a markov decision process, where the return value represents a distance that an order is allocated to a consignee and the consignee deviates from a first path, and the first path is a path corresponding to the order that has been allocated to the consignee.

Optionally, the markov decision process described above is comprised of a five-tuple comprising: the order distribution condition, the state transition probability matrix, the return function, the discount factor and the strategy set of the dispatching member are obtained;

the state transition probability matrix represents the probability of sending a mail by a user in a preset area; the reward function is used for calculating a reward value obtained by taking action by a dispatcher; the discount factor is a discount coefficient of a return value obtained by an action taken by a dispatcher relative to a return value obtained by a last action taken.

Optionally, the second obtaining module 52 is configured to obtain a reward value corresponding to each allocation policy in the policy set by: for each distribution strategy, obtaining a return value of a dispatcher corresponding to the distribution strategy from the current state to the next state according to the return function, the discount factor and the state transition probability matrix; and updating the next state to the current state, performing iterative computation until the return value is converged, and determining the return value as the return value corresponding to the distribution strategy.

An assigning module 53, configured to assign the order to a target dispatching member according to the return value, where the target dispatching member is one of the plurality of dispatching members.

Optionally, the target dispatching and receiving member is a dispatching and receiving member corresponding to a distribution strategy with the smallest return value in the distribution strategies, or the target dispatching and receiving member is a dispatching and receiving member corresponding to any one of the first N distribution strategies arranged according to a preset sequence in the distribution strategies, where the preset sequence is an arrangement sequence of return values from small to large.

The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1B, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 6 is a schematic structural diagram of a second express task planning device provided by the present invention. As shown in fig. 6, the apparatus 60 of the present embodiment further includes, on the basis of the embodiment shown in fig. 5: a path planning module 54.

And a path planning module 54, configured to plan a second path of the target order taker according to the order and the order allocation condition of the target order taker after the target order taker takes an order, where the second path is a path corresponding to all orders of the target order taker.

In one possible implementation, the path planning module 54 replans the second path of the target dispatcher according to the ant colony algorithm, the order and the order allocation of the target dispatcher.

In another possible implementation manner, the path planning module 54 inserts the pickup address into the first path corresponding to the target dispatcher according to the pickup address in the order to obtain the second path.

In some embodiments, the allocating module 53 allocates the order to the target dispatching member according to the return value, and after the target dispatching member is the dispatching member corresponding to the distribution policy with the smallest return value in the multiple distribution policies, the allocating module is further configured to: if the target dispatching and receiving member does not receive orders, updating the dispatching and receiving member corresponding to the distribution strategy with the return value being the second order according to the sequence of the return values from small to large; if the order receiving and dispatching member with the second return value does not receive the order, distributing the order to the next receiving and dispatching member according to the sequence of the return values from small to large; and repeatedly executing the process, and if the order is not received by the receiving and dispatching members corresponding to the first N distribution strategies in the sequence of the return values from small to large, adding the order into the order grabbing pool, wherein N is a positive integer.

In other embodiments, the allocating module 53 allocates the order to a target acquirer and the target acquirer is further configured to, after allocating, to the acquirer corresponding to any one of the first N allocation policies arranged according to the preset sequence in the multiple allocation policies, an order to the target acquirer: if the target order taker does not take the order, the order is allocated to the order taker corresponding to any one of the remaining N-1 allocation strategies; and repeatedly executing the steps, and if no order is received by the dispatching personnel corresponding to the N distribution strategies, adding the order into an order grabbing pool.

The apparatus of this embodiment may be used to implement the technical solutions of any method embodiments shown in fig. 2 and fig. 3, and the implementation principles and technical effects are similar, which are not described herein again.

Optionally, on the basis of the embodiment shown in fig. 6, the path planning module 54 is further configured to update the second path according to an order with a changed reservation time in the orders of the target consignee.

Fig. 7 is a schematic structural diagram of an electronic device according to a first embodiment of the present invention. As shown in fig. 7, the electronic apparatus 70 of the present embodiment includes: memory 71, processor 72 and computer programs.

Wherein the computer program is stored in the memory 71 and configured to be executed by the processor 72 to implement the express task planning method shown in fig. 1B and any one of fig. 2 to 4 in the embodiment of the present invention. The related description may be understood by referring to fig. 1B and the related description and effects corresponding to the steps in fig. 2 to fig. 4, which are not described herein again.

In the present embodiment, the memory 71 and the processor 72 are connected by a bus 73.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method for planning an express task shown in fig. 1B and any one of fig. 2 to 4 in the embodiment of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.

Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Finally, it should be noted that: although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely exemplary forms of implementing the claims; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for planning an express delivery task is characterized by comprising the following steps:

2. The method according to claim 1, wherein the target dispatcher is a dispatcher corresponding to an allocation policy with a smallest return value among the allocation policies, or the target dispatcher is a dispatcher corresponding to any one of the first N allocation policies arranged according to a preset sequence among the allocation policies, wherein the preset sequence is an arrangement sequence of return values from small to large.

3. The method of claim 1, wherein the markov decision process is comprised of a five-tuple comprising: the order distribution condition, the state transition probability matrix, the return function, the discount factor and the strategy set of the dispatching member are obtained;

4. The method of claim 3, wherein obtaining the return value corresponding to each assigned policy in the set of policies according to a Markov decision process comprises:

for each distribution strategy, obtaining a return value of a dispatcher corresponding to the distribution strategy from the current state to the next state according to the return function, the discount factor and the state transition probability matrix;

5. The method of claim 2, wherein the target dispatcher is a dispatcher corresponding to an allocation policy with a minimum return value among the plurality of allocation policies, and the method further comprises:

if the target dispatching and receiving member does not receive orders, updating the dispatching and receiving member corresponding to the distribution strategy with the return value being the second order according to the sequence of the return values from small to large;

and repeatedly executing the process, and if the order is not received by the receiving and dispatching members corresponding to the first N distribution strategies in the sequence of the return values from small to large, adding the order into the order grabbing pool, wherein N is a positive integer.

6. The method of claim 2, wherein the target dispatcher is a dispatcher corresponding to any one of the first N distribution policies arranged according to a preset order in the plurality of distribution policies, and the method further comprises:

if the target order taker does not take the order, the order is allocated to the order taker corresponding to any one of the remaining N-1 allocation strategies;

and repeatedly executing the steps, and if no order is received by the dispatching personnel corresponding to the N distribution strategies, adding the order into an order grabbing pool.

7. The method of any of claims 1 to 6, wherein after said assigning said order to a target consignee, further comprising:

and if the target order receiving and dispatching member receives the order, planning a second path of the target order receiving and dispatching member according to the order and the order distribution condition of the target order receiving and dispatching member, wherein the second path is a path corresponding to all orders of the target order receiving and dispatching member.

8. The method of claim 7, wherein after the allocating the order to a target consignee, further comprises:

9. The method of claim 7, wherein after the allocating the order to a target consignee, further comprises:

10. The method according to claim 8 or 9, characterized in that the method further comprises:

11. An express delivery task planning device, comprising: