CN112700074B

CN112700074B - Express delivery task planning method and device

Info

Publication number: CN112700074B
Application number: CN201911007457.4A
Authority: CN
Inventors: 刘宇航; 王晗; 郑欣欣
Original assignee: Navinfo Co Ltd
Current assignee: Navinfo Co Ltd
Priority date: 2019-10-22
Filing date: 2019-10-22
Publication date: 2024-05-03
Anticipated expiration: 2039-10-22
Also published as: CN112700074A

Abstract

The embodiment of the invention provides a planning method and a planning device for an express delivery task, wherein an order is firstly obtained, a strategy set is determined according to a piece taking address in the order, and further, a return value corresponding to each strategy in the strategy set is obtained according to a Markov decision process and the strategy set, and the order is distributed to a receiver with the minimum return value among a plurality of receivers. By combining the Markov decision process with the planning of the express task, the order is matched with a proper receiver on the basis of meeting the position constraint of the receiver and the time constraint of the receiver, and the working efficiency of the receiver is improved.

Description

Express delivery task planning method and device

Technical Field

The invention relates to the technical field of logistics, in particular to a planning method and device for express delivery tasks.

Background

Under the rapid development of internet technology and electronic commerce, the operation mode of the logistics industry also changes over the sky and over the ground, for example, the self-lifting point of express delivery and the appearance of an intelligent express delivery cabinet are realized, and people can receive and send express delivery by themselves, so that the life of people is greatly facilitated. Further, with the continuous abundance of mobile Application programs (APP), more and more mobile Application programs are related to logistics, and the user can also realize the functions of issuing a bill, inquiring logistics and the like without leaving home through installing the related APP.

In a logistics system, a receiver and a dispatcher send and receive the express mail are extremely important links. For collecting the express mail, the traditional mode is to distribute the express mail to the receiving and dispatching personnel corresponding to the express mail taking address according to the express mail taking address in the express mail taking order, and then the receiving and dispatching personnel select a path according to experience and contact with a customer at the reservation time in the express mail taking order to take the express mail.

By adopting the mode, the operation efficiency of the receiver is lower.

Disclosure of Invention

The embodiment of the invention provides a planning method and a planning device for an express delivery task, which are used for improving the operation efficiency of a dispatcher.

In a first aspect, an embodiment of the present invention provides a method for planning an express task, including:

acquiring an order, and determining a policy set according to a pickup address in the order, wherein the policy set comprises a plurality of allocation policies, and each allocation policy is used for indicating that the order is allocated to a receiver within a preset range including the pickup address;

obtaining a return value corresponding to each allocation strategy in the strategy set according to a Markov decision process, wherein the return value represents the distance that an order is allocated to a receiver and the receiver deviates from a first path, and the first path is a path corresponding to the order allocated to the receiver;

And distributing the order to a target dispatcher according to the return value, wherein the target dispatcher is one dispatcher of the plurality of dispatcher.

Optionally, the target dispatcher is a dispatcher with the smallest return value among the plurality of dispatcher.

Optionally, the markov decision process is constituted by a five-tuple comprising: the order allocation situation of the dispatcher, the state transition probability matrix, the return function, the discount factors and the strategy set;

The state transition probability matrix represents the probability of user mailing in a preset area;

The return function is used for calculating a return value obtained by taking action by a receiver;

The discount factor is a discount coefficient of the return value obtained by the action of the receiver relative to the return value obtained by the last action.

Optionally, the obtaining, according to a markov decision process, a return value corresponding to each allocation policy in the policy set includes:

Obtaining a return value of the dispatcher from the current state to the next state according to the return function, the discount factor and the state transition probability matrix;

and updating the next state to be the current state, performing iterative calculation until the return value converges, and determining the return value as the return value corresponding to the allocation strategy.

Optionally, the method further comprises:

If the order receiving and dispatching person with the minimum return value does not receive the order, distributing the order to the next receiving and dispatching person according to the order from small return value to large return value, and if the order receiving and dispatching person with the secondary return value receives the order, determining that the receiving and dispatching person with the secondary return value is the target receiving and dispatching person;

if the dispatcher with the second return value does not receive the order, distributing the order to the next dispatcher according to the order from the smaller return value to the larger return value;

And repeatedly executing the process, and if none of the first N receipts is received in the order from small to large according to the return value, adding the order into the robbery pool.

Optionally, after the distributing the order to the target dispatcher, the method further includes:

And re-planning a second path of the target receiver according to an ant colony algorithm, the order and the order distribution condition of the target receiver.

And inserting the pick-up address into a first path corresponding to the target receiver according to the pick-up address in the order to obtain a second path.

Optionally, the method further comprises:

and updating the second path according to the order with the changed reservation time in the orders of the target receiver.

In a second aspect, an embodiment of the present invention provides a planning apparatus for an express task, where the apparatus includes:

The system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring an order and determining a strategy set according to a pick-up address in the order, the strategy set comprises a plurality of allocation strategies, and each allocation strategy is used for indicating that the order is allocated to a receiver within a preset range including the pick-up address;

The second acquisition module is used for acquiring a return value corresponding to each strategy in the strategy set according to a Markov decision process, wherein the return value represents the distance that an order is distributed to a receiving and dispatching person and the receiving and dispatching person deviates from a first path, and the first path is a path corresponding to the order distributed to the receiving and dispatching person;

and the distribution module is used for distributing the order to a target dispatcher according to the return value, wherein the target dispatcher is one dispatcher among the plurality of dispatcher.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: memory, processor, and computer program instructions;

Wherein the memory is for storing the computer program instructions;

the processor executes the computer program instructions to perform the method of the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a readable storage medium, including: a program;

the program, when executed by a processor, performs the method of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to the drawings without inventive effort to a person skilled in the art.

FIG. 1A is a flow chart of a planning method for an express task provided by the invention;

FIG. 1B is a schematic flow chart of a first embodiment of a method for planning an express task according to the present invention;

FIG. 1C is a schematic diagram of a Markov report process according to the present invention;

FIG. 1D is a schematic diagram of the change of order factors during decision optimization in the present invention;

Fig. 2 is a schematic flow chart of a second embodiment of a method for planning an express task according to the present invention;

Fig. 3 is a schematic flow chart of a third embodiment of a method for planning an express task according to the present invention;

fig. 4 is a flow chart of a fourth embodiment of a method for planning an express task according to the present invention;

fig. 5 is a schematic structural diagram of a first embodiment of a planning apparatus for an express task according to the present invention;

fig. 6 is a schematic structural diagram of a second embodiment of a planning apparatus for an express task according to the present invention;

fig. 7 is a schematic structural diagram of a first embodiment of an electronic device according to the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In a logistics system, a receiver and a dispatcher send and receive the express mail are extremely important links. In particular, for collecting the express mail, the conventional method is to distribute the order to the receiver corresponding to the region to which the order belongs according to the order taking address, and the receiver selects a path according to experience and contacts with the customer at the reserved time of taking the express mail in the order to collect the express mail.

By adopting the method, the orders are distributed to the receiving and dispatching personnel corresponding to the areas according to the areas divided by the logistics company, whether the receiving and dispatching personnel are the most suitable personnel is not considered, the working efficiency of the receiving and dispatching personnel is low, and the maximization of resource utilization cannot be realized. Therefore, the embodiment of the invention provides a planning method for express delivery tasks, which aims to solve the problems in the prior art, improve the work efficiency of a dispatcher and maximize the utilization of resources.

Referring to fig. 1A, in the method for planning an express delivery task according to the embodiment of the present invention, which receiver receives an order is used as a decision variable, the position of the receiver and the time for picking up a part in the order are used as constraint conditions, the receiver receives the order, and the distance from the original driving path is the smallest or the time for picking up the part in the order is the shortest, so that the working efficiency of the receiver is improved, and the maximization of resource utilization is achieved.

The method for planning the express delivery task provided by the embodiment of the invention is described in detail through a plurality of specific embodiments.

Fig. 1B is a flowchart of a first embodiment of a method for planning an express task according to the present invention. The execution subject of the express task planning method provided by the embodiment of the invention can be the express task planning device provided by the embodiment of the invention, and the express task planning device can be realized by any software and/or hardware mode.

By way of example, the programming means for the courier task may be an electronic device such as a terminal device, a computer system, a server, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, hand-held or laptop devices, microprocessor, CPU, GPU-based systems, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

In this embodiment, a planning apparatus with an execution subject as an express task will be described as an example.

As shown in fig. 1B, the method of the present embodiment includes:

S101, acquiring an order, and determining a strategy set according to a pick-up address in the order.

Specifically, when a user needs to send a mail, the user can place an order through the client side to generate an order, wherein the order can comprise information such as a mail taking address, mail taking time, a mail taking contact name, a mail taking contact phone and the like. After the planning device of the express delivery task obtains the order, a plurality of receiving and dispatching members including the receiving and dispatching address in a preset range are determined according to the receiving and dispatching address in the order, so that a strategy set is generated, that is, the strategy set comprises a plurality of distribution strategies, and each distribution strategy is used for indicating to distribute the order to one receiving and dispatching member of the plurality of receiving and dispatching members. The policy set referred to in this embodiment can reflect the allocation object of the order.

Illustratively, the planning apparatus for the courier task may determine the policy set by:

In one possible implementation manner, the planning device of the express delivery task determines a receiver with the receiver address as a center and a preset distance as a radius according to the receiver address in the order, and generates a policy set according to the determined receiver. For example, if the latitude and longitude information corresponding to the pickup address is (a, B), the planning device for the express delivery task may determine that the current position is in a receiving and dispatching person with (a, B) as the center and 3 km as the radius range, and generate the policy set according to the determined receiving and dispatching person.

In another possible implementation manner, the planning device of the express delivery task determines, according to the pickup address in the order, a dispatcher in an operation area where the pickup address is located and a dispatcher in at least one operation area adjacent to the operation area where the pickup address is located, and generates a policy set according to the determined dispatcher. Specifically, in the logistics operation, an operation area is generally divided according to geographic location information, each operation area corresponds to one or more receiving and dispatching operators, the planning device of the express delivery task determines the operation area to which the pick-up address belongs according to the pick-up address, determines one or more receiving and dispatching operators of the operation area, determines one or more receiving and dispatching operators of at least one operation area adjacent to the operation area to which the pick-up address belongs, and generates a policy set according to the determined receiving and dispatching operators.

For example, the planning device for the express delivery task determines, according to the pickup address, an operation area a to which the pickup address belongs, the operation area a corresponds to a receiver a and a receiver B, an operation area adjacent to the operation area a includes an operation area B and an operation area C, the operation area B corresponds to a receiver C, the operation area D corresponds to a receiver D and a receiver e, and then the policy set may be generated according to the receiver a, the receiver B, the receiver C, the receiver D and the receiver e, or may also be generated according to the receiver a, the receiver B and the receiver C.

It should be noted that the above two ways of determining the policy set are merely exemplary, and the policy set may be determined by other ways in practical application, which is not limited by the present invention.

S102, obtaining a return value corresponding to each allocation strategy in the strategy set according to the Markov decision process.

In order to make the technical solution shown in the embodiments of the present invention more clear, a detailed description of a markov decision process is first provided herein:

Markov process:

Markov processes represent a class of stochastic processes that represent future states that are only related to states at the current time and are not related to states at historical times.

Let { X (T), T ε T } be a random process, where X (T) represents the spatial state corresponding to time T in the random process, E represents the state space set, and if for any T ₁<t₂<…<t_n < T, any X ₁,x₂,…,x_n, X ε E, the conditional distribution function of the random variable X (T) under the known variable X (T ₁)＝x₁,…,X(t_n)＝x_n) is related to X (T _n)＝x_n) only, and is independent of X (T ₁)＝x₁,…,X(t_n-1)＝x_n-1), i.e., the conditional distribution function satisfies the equation F(x,t|x_n,x_n-1,…,x₂,x₁,t_n,t_n-1,…,t₂,t₁)＝F(x,t|t_n,t_n-1),, which is called Markov property, the random process satisfying the Markov property is called Markov process.

A random process that has markov properties and exists within discrete sets of indices and state spaces is also known as a markov chain. The Markov chain is defined by a state transition probability, which refers to the probability of a random variable transitioning from one time instant i to the next time instant j, from state s _i to state s _j, and can be expressed by the formula:

P(i→j)＝P_ij＝P(X_t+1＝s_j|X_t＝s_i)

Wherein P _ij represents the probability of state transition to state; s denotes a state set, S _i denotes a state corresponding to time i, S _j denotes a state corresponding to time j, and S _i and S _j are both one state of the state set S.

The Markov reward process represents:

The Markov report process may be defined by a four tuple

Where S represents a state set and S _i∈S,s_i represents the state of step i.

P represents the state transition probability, specifically, P represents the probability distribution situation that the state is transited to other states after the action of a epsilon A under the state of S _i epsilon S, A represents the strategy set, and a is one strategy in the strategy set. For example, when policy a is executed in state s _i, the probability of transition to state s _j may be expressed as P (s _j|s_i, a), or when action a is executed in state s _i, the probability of transition to state s _j may be expressed as P (s _j|s_i, a).

It can be understood that executing a certain allocation policy in the policy set in this embodiment is to execute a certain action in the policy set.

U represents a reward, which may be noted as U (s _j|s_i, a) if a group (s _i, a) transitions to the next state s _j. If (s _i, a) the corresponding next state s _j is unique, then the payback function may also be denoted as U (s _i, a), where (s _i, a) indicates that action a was performed in state s _i and the state transitions to state s _j.

Representing the discount factor, in particular, the discount factor of each return on the previous return, by which the effect of executing a certain policy in the future on the current state can be reduced.

In the Markov report process, the expected report for a state is formulated as:

Wherein H(s) represents the return value of state _s, A discount factor representing the return value of state s ' versus state _s, P (s, s ') represents the probability of transition from state s to state s ', and H (s ') represents the return value of state s '.

The Markov rewards process is illustrated herein by way of a specific example: as shown in fig. 1C, the markov report process includes 6 nodes, which are respectively node 0, node 1, node 2, node 3, node 4 and node 5, each node is in a state, and the states corresponding to the nodes may be the same or different. With node 1 as a starting point (node 1 is a state representing the current moment), one of the policies in the policy set may be executed according to the policy set, after which node 1 may transition from the current state to the next state, i.e. to the next node, as shown in fig. 1C, node 1 may transition from the current state (node 1) to the state in which node 2 or node 3 is located, by executing a certain policy, node 1 executing a policy. Similarly, the node 3 may transition from the state of the node 3 to the state of the node 4 or the node 5 by executing a policy, and the node 3 may execute the policy with a corresponding return value.

As shown in fig. 1C, the return value of the node 1 is 25.9, specifically, the return value of the node 1 is equal to the return value 20 obtained according to the return function corresponding to the node 1, the product of the return value 5 obtained according to the return function corresponding to the node 2 and the discount factor 0.1, and the product of the return value 6 obtained according to the return function corresponding to the node 3 and the discount factor 0.9.

Similarly, if the current state is node 3, the return value of node 3 is 13.6, specifically, the return value of node 3 is equal to the return value 6 obtained by the return function corresponding to node 3, the product of the return value 2 obtained by the return function corresponding to node 4 and the discount factor 0.2, and the product of the return value 9 obtained by the return function corresponding to node 5 and the discount factor 0.8.

In embodiments of the present application, the future impact on the present step is reduced by a discounting factor as the computing node executes the returns generated by each policy.

Markov decision process:

And adding a strategy set A in the Markov report process, namely the Markov decision problem. Solving the Markov decision problem, namely solving the strategy capable of obtaining the maximum expected benefit. In the embodiment of the invention, solving the Markov decision problem is the most suitable dispatcher.

In the Markov decision process, iterative computation is respectively carried out on each strategy a in the strategy set A, so that a return value corresponding to each strategy a in the strategy set A is obtained, and then the strategy which can obtain the maximum expected benefit is determined according to the return value of each strategy a.

Specifically, in the markov decision process, the return value corresponding to each policy may be obtained by:

step one, initializing return functions U(s) and H(s) of all strategies;

Step two, aiming at each strategy, evaluating the current strategy by using a current return function U(s) to obtain a return value of each strategy;

step three, carrying out iterative computation aiming at each strategy until the return value corresponding to the strategy is converged; the iteration method is as follows:

assuming that the number of iterations is k, the return value corresponding to the policy is calculated according to the following formula:

wherein k is the number of iterations, Representing the discount factor at the first iteration,/>Representing the discount factor at the second iteration … …,/>Representing the discount factor at the kth iteration.

The physical meaning is to take into account k future cases, in this embodiment k orders that may occur in the future.

Due toThen as the number of iterations increases,/>The value of H(s) will tend to be 0 and the value of H(s) will tend to be stable. Let esp=h(s) _k-H(s)_k-1, where H(s) _k represents the return value corresponding to the policy obtained by performing the kth iteration, and H(s) _k-1 represents the return value corresponding to the policy obtained by performing the kth-1 iteration.

When the value of esp is smaller than the preset threshold, the iteration is considered to be completed, and the H(s) _k at the moment is the return value corresponding to the strategy.

And step four, determining a strategy capable of obtaining the maximum expected benefit according to the return value corresponding to each strategy in the strategy set A.

Next, a detailed description is made of the markov decision process in order allocation:

In the three cases shown in fig. 1C, the time T is the current time, and when the decision is made at the time T, the order before the time T is the placed order, and the number of placed orders may be one or more, and in the three cases shown in fig. 1C, only the case in which the number of placed orders is more is shown. As shown in the first case in fig. 1C, an order may occur at some point in the future, as shown in the second case in fig. 1D, a batch of orders (i.e., multiple orders) may also occur at some point in the future. In this embodiment, it is considered that orders that may occur in a future period of time may be one order as shown in the first case in fig. 1D, or a batch of orders as shown in the second case in fig. 1D, and all future orders shown in the third case in fig. 1D are used for convenience of description.

Because of randomness of the occurrence of the logistics orders, only one currently occurring order is considered in the traditional mode, namely, only single decision optimization for a single order is performed in the traditional mode, and the purpose of the invention is to maximize the accumulated resource utilization rate after a period of time in the future is achieved, namely, the method in the embodiment of the invention can achieve global optimal resource utilization rate.

In this embodiment, the markov decision process in order allocation is composed of five tuples, including: the order allocation situation of the dispatcher, the state transition probability matrix, the return function, the discount factors and the policy set. The order allocation conditions of the receiver include the order allocation conditions corresponding to all the receiver in the policy set determined in step S101; the state transition probability matrix represents probability distribution of user mailing in a preset area; the return function is used for calculating a return value obtained by taking action by the receiver; the discount factor is a discount coefficient of the return value obtained by the action taken by the acquirer relative to the return value obtained by the last action taken.

In this embodiment, the return value of the allocation policy indicates that the order is allocated to the receiver, and the receiver deviates from the first path, which is a path corresponding to the order allocated to the receiver. When the return value is smaller, the smaller the distance that the receiver deviates from the first path after receiving the order, and correspondingly, the smaller the time spent by the receiver to complete the order, the higher the resource utilization rate; when the return value is larger, the distance that the receiver deviates from the first path after receiving the order is larger, and accordingly, the more time the receiver spends completing the order, the resource utilization is lower.

Specifically, for each allocation policy, according to the return function, the discount factor and the state transition probability matrix, a return value of a dispatcher corresponding to the allocation policy for transitioning from the current state to the next state is obtained. That is, for each receiver-dispatcher, the order is distributed to the receiver-dispatcher according to the corresponding return function, discount factor and state transition probability matrix, and the distance of the receiver-dispatcher from the original path is obtained.

And then updating the next state into the current state, wherein the next state is the state after the order is accepted by the dispatcher, and performing iterative calculation until the return value corresponding to the allocation strategy is converged. That is, for the next order that may occur in the future, the next order that may occur in the future is assigned to the acquirer, and the acquirer receives the return value obtained by the acquirer, which is also obtained according to the return function, the discount factor, and the state transition probability matrix. For the next order which may occur in the future, the next order which may occur in the future is allocated to the receiver, and the receiver receives the report value obtained by the order, and the report value can be obtained according to the report function, the discount factor and the state transition probability matrix. And repeating the process until the return value corresponding to the receiving and dispatching person converges.

In the markov decision process, the discount factor and the state transition probability matrix can be obtained through historical host data, and in practical application, the discount factor and the state transition probability matrix can be updated according to the continuously updated historical host data, so that the obtained return value corresponding to the allocation strategy is more accurate.

S103, distributing the order to a target receiver according to the return value.

Specifically, in this step, the target receiver is determined according to the return values corresponding to all the allocation policies in the policy set.

In one possible implementation, according to the return value corresponding to each allocation policy in the policy set, the allocation policy with the smallest return value is determined as the target allocation policy, that is, the order is allocated to the dispatcher corresponding to the allocation policy with the smallest return value.

In another possible implementation manner, according to the return value corresponding to each allocation policy in the policy set, determining that N allocation policies in the policy set are alternative allocation policies, and selecting any allocation policy from the N alternative allocation policies as a target allocation policy, that is, allocating an order to any one of the senders corresponding to the first N allocation policies with return values ordered from small to large, where N is a positive integer. For example, n=3. It is appreciated that the number of alternative allocation policies is less than or equal to the number of allocation policies in the policy set.

For the second implementation, how to determine N alternative allocation policies may be implemented by: in the first implementation manner, the return value corresponding to each allocation policy in the policy set is compared with a preset return value, and the allocation policy with the return value smaller than the preset return value is determined as the alternative allocation policy, so that N alternative policies are determined. And in the second implementation mode, all allocation strategies in the strategy set are determined to be alternative allocation strategies according to the order of the return values from small to large. Of course, the alternative allocation policy may also be determined in other ways, and the two implementations shown above are merely exemplary and not limiting of the implementation of determining the alternative allocation policy.

Optionally, after determining the target dispatcher, the planning device of the express task sends the order to a terminal device held by the target dispatcher, and the target dispatcher can check the detailed information of the order through the terminal device and can autonomously select whether to receive the order according to own wish.

In this embodiment, by acquiring an order, determining a policy set according to a pickup address in the order, further, acquiring a return value corresponding to each policy in the policy set according to a markov decision process and the policy set, and distributing the order to a receiver with the minimum return value among a plurality of receivers. By combining the Markov decision process with the planning of the express task, the most suitable receiver is matched with the order on the basis of meeting the position constraint and the time constraint of the receiver, so that the receiver improves the working efficiency of the receiver.

Next, detailed descriptions are respectively made for the situation that the target receiver corresponds to the allocation policy with the smallest return value among the allocation policies and the situation that the target receiver corresponds to any one of the first N allocation policies arranged according to the preset sequence among the allocation policies.

Fig. 2 is a schematic flow chart of a second embodiment of a method for planning an express task according to the present invention. As shown in fig. 2, the method of the present embodiment includes:

s201, acquiring an order, and determining a strategy set according to a pick-up address in the order.

S202, according to a Markov decision process, obtaining a return value corresponding to each allocation strategy in the strategy set.

Steps S201 and S202 in the embodiment shown in fig. 2 are similar to steps S101 and S102 in the embodiment shown in fig. 1, respectively, and reference may be made to the detailed description in the embodiment shown in fig. 1, which is not repeated here.

And S203, distributing the order to a target dispatcher according to the return value, wherein the dispatcher corresponding to the distribution strategy with the smallest return value among the multiple distribution strategies of the target dispatcher.

S204, judging whether the target receiver-dispatcher receives the order, if so, executing step S205, and if not, executing step S206.

S205, planning a second path of the target receiver according to the order and the order distribution condition of the target receiver.

One possible implementation re-plans the second path of the target dispatcher according to the ant colony algorithm, the order and the order allocation of the target dispatcher. In another possible implementation manner, according to the pickup address in the order, the order is inserted into a first path corresponding to the target receiver, so as to obtain a second path.

S206, updating the target receiver-transmitter as the receiver-transmitter corresponding to the second-ranking allocation strategy according to the preset sequence, and allocating the order to the updated target receiver-transmitter.

The preset sequence is the sequence of the return value from small to large.

S207, judging whether the updated target receiver-dispatcher receives the order, if yes, executing step S205, and if not, executing step S208.

And S208, updating the target receiver-transmitter as the receiver-transmitter corresponding to the third allocation strategy according to the preset sequence, and allocating the order to the updated target receiver-transmitter, wherein the preset sequence is the sequence of the return value from small to large.

S209, judging whether the updated target receiver-dispatcher receives the order, if yes, executing step S205, and if not, executing step S210.

S210, adding the order into a robbery pool.

The emergency call pool comprises a plurality of orders which are not received by the dispatcher, and the orders which are not received by the dispatcher are added into the emergency call pool, so that more dispatcher can obtain order information and select proper orders according to the actual situation. By the mode, the problem of poor user experience caused by long-time unmanned bill receiving after the bill is placed by the user can be avoided.

In this embodiment, in order of the report values from small to large, the previous 3 allocation policies are exemplified as the alternative allocation policies. Of course, in practical application, the number of alternative allocation policies may be greater or less.

In this embodiment, according to the order of the report values of the allocation policies in the policy set from small to large, the order is preferentially allocated to the receiver corresponding to the allocation policy with the smallest report value, and in the case that the receiver does not receive an order, the order is allocated to the receiver corresponding to the allocation policy with the second rank, if the receiver does not receive an order yet, the order is allocated to the receiver corresponding to the next allocation policy, and by repeatedly executing the above process, a more suitable receiver is selected from the policy set to bear the order, thereby not only satisfying the position constraint condition and the picking time constraint condition of the receiver, but also improving the work efficiency of the receiver, and simultaneously improving the probability of receiving an order and effectively improving the user experience. Further, if multiple order allocation is performed, the order receiving and dispatching personnel do not receive the order, and then the order is added into the order grabbing pool, so that more receiving and dispatching personnel can acquire order information, the probability of receiving the order is improved, and user experience is effectively improved.

Optionally, on the basis of the embodiment shown in fig. 2, if the order with the changed reservation time in the order of the target receiver, the second path is updated, where the preset time may be the reservation time of the dispatch order of the target receiver, or the preset time may also be the pickup time in the pickup order of the target receiver.

Fig. 3 is a schematic structural diagram of a third embodiment of a method for planning an express task according to the present invention. As shown in fig. 3, the method of the present embodiment includes:

It should be noted that, in the embodiment shown in fig. 3, the target dispatcher in step S303 is a dispatcher corresponding to any one of the first N distribution policies arranged according to the preset sequence in the plurality of distribution policies.

S301, acquiring an order, and determining a strategy set according to a pick-up address in the order.

S302, obtaining a return value corresponding to each allocation strategy in the strategy set according to the Markov decision process.

Steps S301 and S302 in the embodiment shown in fig. 3 are similar to steps S101 and S102 in the embodiment shown in fig. 1, respectively, and reference may be made to the detailed description in the embodiment shown in fig. 1, which is not repeated here.

S303, distributing the order to a target dispatcher according to the return value, wherein the target dispatcher is a dispatcher corresponding to any one of the first N distribution strategies arranged according to a preset sequence in the distribution strategies.

S304, judging whether the target receiver-dispatcher receives the order, if so, executing step S305, and if not, executing step S306.

S305, planning a second path of the target receiver according to the order and the order distribution condition of the target receiver.

S306, distributing the order to a dispatcher corresponding to any one of the N-1 distribution strategies, and updating the target dispatcher.

Specifically, since the order is not received by the receiver corresponding to the selected allocation policy in step S303, one allocation policy may be selected from the remaining N-1 allocation policies, and the order may be allocated to the receiver corresponding to the allocation policy.

One possible implementation way is to select any allocation policy from the remaining N-1 allocation policies, and allocate the order to the dispatcher corresponding to the allocation policy.

S307, judging whether the updated target receiver receives the order, if yes, executing step S305, and if not, executing step S308.

S308, distributing the order to a dispatcher corresponding to any one of the N-2 distribution strategies, and updating the target dispatcher.

Specifically, since the order is not received by the receiver corresponding to the selected allocation policy in step S303 and step S306, one allocation policy may be selected from the remaining N-2 allocation policies, and the order may be allocated to the receiver corresponding to the allocation policy.

One possible implementation way is to select any allocation policy from the remaining N-2 allocation policies, and allocate the order to the dispatcher corresponding to the allocation policy.

S309, judging whether the updated target receiver receives the order, if yes, executing step S305, and if not, executing step S310.

And S310, repeatedly executing the order allocation process until all the receivers corresponding to the N allocation strategies do not receive the orders, and adding the orders into the order-robbing pool.

Illustratively, the policy set includes 10 allocation policies, and the first 3 allocation policies, which are arranged in order of from small to large according to the return value, are respectively denoted as allocation policy 1, allocation policy 2 and allocation policy 3. First, an order is assigned to a dispatcher corresponding to any one of the 3 allocation policies, for example, an order is assigned to a dispatcher corresponding to allocation policy 1. If the dispatcher corresponding to allocation policy 1 does not pick up an order, the order is allocated to any of the remaining two allocation policies, for example, the order is allocated to the dispatcher corresponding to allocation policy 2. If the receiver corresponding to the allocation policy 2 does not receive the order yet, the order is allocated to the receiver corresponding to the allocation policy 3, and if the receiver corresponding to the allocation policy 3 does not receive the order, the order is added into the robbery pool, so that more receiver can acquire the order information. In the 3-time allocation process, any one of allocation policies 1, 2 and 3 receives an order from a receiver corresponding to the allocation policy, and a second path of the receiver is planned according to the order and the allocation situation of the receiver.

In this embodiment, by determining a plurality of alternative allocation policies (i.e., a plurality of allocation policies ranked in a descending order according to return values, and a plurality of allocation policies ranked at a top position), an order is allocated to a receiver corresponding to any one of the plurality of alternative allocation policies, and if the receiver does not receive an order, the order is allocated to a receiver corresponding to any one of the remaining N-1 allocation policies, and if the receiver does not receive an order yet, the order is allocated to a receiver corresponding to any one of the remaining N-2 allocation policies, and by repeatedly executing the order allocation process, the order is allocated to a receiver corresponding to any one of the plurality of alternative allocation policies, thereby satisfying a position constraint condition and a picking time constraint condition of the receiver, not only improving the work efficiency of the receiver, but also improving the probability of receiving an order, and effectively improving user experience. Further, if all the receiving and dispatching operators corresponding to the multiple alternative allocation strategies do not receive the orders, the orders are added into the robbery pool, so that more receiving and dispatching operators can acquire order information, the probability of receiving the orders is improved, and user experience is effectively improved.

Optionally, on the basis of the embodiment shown in fig. 3, if the order with the changed reservation time in the order of the target receiver, the second path is updated, where the preset time may be the reservation time of the dispatch order of the target receiver, or the preset time may also be the pickup time in the pickup order of the target receiver.

Fig. 4 is a flowchart of a third embodiment of a method for planning an express task provided by the present invention. As shown in fig. 4, a user sends a mail sending request to generate an order, a planning device of an express task obtains order information, a target receiving and dispatching person is determined by adopting a planning method of the express task in the embodiment shown in fig. 2 or fig. 3 to dispatch a first order, if the first target receiving and dispatching person does not receive the order, a next target receiving and dispatching person is further determined, if the target receiving and dispatching person does not receive the order yet, the order is added into a robbed pool.

By adopting the method provided by the embodiment of the invention, the working efficiency of the receiver can be improved, the probability of receiving orders is improved, and the user experience is effectively improved. Further, multiple orders are distributed, and if no order is received by a plurality of receiving and dispatching operators, the orders are added into the first-aid single pool, so that more receiving and dispatching operators can acquire order information, the probability of receiving the orders is improved, and user experience is effectively improved.

Fig. 5 is a schematic structural diagram of a first embodiment of a planning apparatus for an express task according to the present invention. As shown in fig. 5, the apparatus 50 of the present embodiment includes: a first acquisition module 51, a second acquisition module 52 and a distribution module 53.

The first obtaining module 51 is configured to obtain an order, and determine a policy set according to a pickup address in the order, where the policy set includes a plurality of allocation policies, and each allocation policy is configured to instruct allocation of the order to a dispatcher within a preset range including the pickup address.

A second obtaining module 52, configured to obtain a return value corresponding to each policy in the policy set according to a markov decision process, where the return value represents a distance that an order is allocated to a receiving and dispatching person and the receiving and dispatching person deviates from a first path, and the first path is a path corresponding to the order that has been allocated to the receiving and dispatching person.

Optionally, the markov decision process described above is composed of five tuples, the five tuples comprising: the order allocation situation of the dispatcher, the state transition probability matrix, the return function, the discount factors and the strategy set;

The state transition probability matrix represents the probability of user mailing in a preset area; the return function is used for calculating a return value obtained by taking action by a receiver; the discount factor is a discount coefficient of the return value obtained by the action of the receiver relative to the return value obtained by the last action.

Optionally, the second obtaining module 52 is configured to obtain a return value corresponding to each allocation policy in the policy set by: for each allocation strategy, obtaining a return value of a dispatcher corresponding to the allocation strategy from a current state to a next state according to the return function, the discount factor and the state transition probability matrix; and updating the next state to be the current state, performing iterative calculation until the return value converges, and determining the return value as the return value corresponding to the allocation strategy.

And an allocation module 53, configured to allocate the order to a target dispatcher according to the return value, where the target dispatcher is one dispatcher of the plurality of dispatchers.

Optionally, the target dispatcher is a dispatcher corresponding to a distribution strategy with the smallest return value in the plurality of distribution strategies, or the target dispatcher is a dispatcher corresponding to any one of the first N distribution strategies arranged according to a preset sequence in the plurality of distribution strategies, wherein the preset sequence is the sequence of the return values from small to large.

The device of the present embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1B, and its implementation principle and technical effects are similar, and are not described here again.

Fig. 6 is a schematic structural diagram of a second embodiment of the express task planning apparatus provided by the present invention. As shown in fig. 6, the apparatus 60 of this embodiment further includes, on the basis of the embodiment shown in fig. 5: the path planning module 54.

And the path planning module 54 is configured to plan, after the target dispatcher receives the order, a second path of the target dispatcher according to the order and the order allocation situation of the target dispatcher, where the second path is a path corresponding to all orders of the target dispatcher.

In one possible implementation, the path planning module 54 re-plans the second path of the target dispatcher according to an ant colony algorithm, the order, and the order distribution of the target dispatcher.

In another possible implementation manner, the path planning module 54 inserts the pick-up address into the first path corresponding to the target receiver according to the pick-up address in the order, and obtains the second path.

In some embodiments, the allocation module 53 allocates the order to a target dispatcher according to the return value, and after the target dispatcher is a dispatcher corresponding to an allocation policy with the smallest return value among the allocation policies, the allocation module is further configured to: if the target receiver-dispatcher does not receive the order, updating the receiver-dispatcher corresponding to the allocation strategy with the target receiver-dispatcher as the return value in the order of the return value from small to large; if the dispatcher with the second return value does not receive the order, distributing the order to the next dispatcher according to the order of the return value from small to large; and repeatedly executing the process, and if no order is received by the receiver corresponding to the first N allocation strategies in the order from the small return value to the large return value, adding the order into the robbery list pool, wherein N is a positive integer.

In other embodiments, the allocation module 53 allocates the order to a target dispatcher according to the return value, and after the target dispatcher is a dispatcher corresponding to any one of the first N allocation policies arranged in the preset order in the plurality of allocation policies, the method is further used for: if the target receiver-dispatcher does not receive the order, distributing the order to the receiver-dispatcher corresponding to any one of the remaining N-1 distribution strategies; and repeatedly executing the steps, and if all the receivers corresponding to the N allocation strategies do not receive orders, adding the orders into the robbery cell.

The device of this embodiment may be used to implement the technical solution of any of the method embodiments shown in fig. 2 and fig. 3, and its implementation principle and technical effects are similar, and are not repeated here.

Optionally, based on the embodiment shown in fig. 6, the path planning module 54 is further configured to update the second path according to an order with a changed reservation time in the order of the target dispatcher.

Fig. 7 is a schematic structural diagram of a first embodiment of an electronic device according to the present invention. As shown in fig. 7, the electronic apparatus 70 of the present embodiment includes: memory 71, processor 72 and computer program.

The computer program is stored in the memory 71 and is configured to be executed by the processor 72 to implement the method for planning an express task shown in any one of the embodiments of fig. 1B and fig. 2 to fig. 4 in the embodiment of the present invention. The description may be correspondingly understood with reference to fig. 1B and the description and effects corresponding to the steps of fig. 2 to fig. 4, which are not repeated herein.

In this embodiment, the memory 71 and the processor 72 are connected through a bus 73.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to implement the method for planning the express task shown in any one of fig. 1B and fig. 2 to fig. 4 in the embodiment of the invention.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.

The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in hardware plus software functional modules.

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Moreover, although operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Finally, it should be noted that: although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely exemplary forms of implementing the claims; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The method for planning the express delivery task is characterized by comprising the following steps:

distributing the order to a target dispatcher according to the return value, wherein the target dispatcher is one dispatcher of the plurality of dispatcher;

The target dispatcher is a dispatcher corresponding to a distribution strategy with the smallest return value in the plurality of distribution strategies, or is a dispatcher corresponding to any one of the first N distribution strategies arranged according to a preset sequence in the plurality of distribution strategies, wherein the preset sequence is the arrangement sequence of the return values from small to large;

The markov decision process is comprised of five tuples, the five tuples comprising: the order allocation situation of the dispatcher, the state transition probability matrix, the return function, the discount factors and the strategy set;

The discount factor is a discount coefficient of a return value obtained by taking action by a receiver relative to a return value obtained by taking action last time;

The obtaining, according to a markov decision process, a return value corresponding to each allocation policy in the policy set includes:

for each allocation strategy, obtaining a return value of a dispatcher corresponding to the allocation strategy from a current state to a next state according to the return function, the discount factor and the state transition probability matrix;

2. The method of claim 1, wherein the target dispatcher is a dispatcher corresponding to an allocation policy having a smallest return value among the plurality of allocation policies, the method further comprising:

if the target receiver-dispatcher does not receive the order, updating the receiver-dispatcher corresponding to the allocation strategy with the target receiver-dispatcher as the return value in the order of the return value from small to large;

if the dispatcher with the second return value does not receive the order, distributing the order to the next dispatcher according to the order of the return value from small to large;

and repeatedly executing the process, and if no order is received by the receiver corresponding to the first N allocation strategies in the order from the small return value to the large return value, adding the order into the robbery list pool, wherein N is a positive integer.

3. The method of claim 1, wherein the target dispatcher is a dispatcher corresponding to any one of the first N distribution policies arranged in a preset order among the plurality of distribution policies, the method further comprising:

If the target receiver-dispatcher does not receive the order, distributing the order to the receiver-dispatcher corresponding to any one of the remaining N-1 distribution strategies;

and repeatedly executing the steps, and if all the receivers corresponding to the N allocation strategies do not receive orders, adding the orders into the robbery cell.

4. A method according to any one of claims 1 to 3, wherein after said assigning said order to a target dispatcher, further comprising:

And if the target receiving and dispatching personnel receives the order, planning a second path of the target receiving and dispatching personnel according to the order and the order allocation condition of the target receiving and dispatching personnel, wherein the second path is a path corresponding to all orders of the target receiving and dispatching personnel.

5. The method of claim 4, wherein after the order is assigned to the target dispatcher, further comprising:

6. The method of claim 5, wherein after the order is assigned to the target dispatcher, further comprising:

7. The method according to claim 5 or 6, characterized in that the method further comprises:

8. The utility model provides a planning device of express delivery task which characterized in that includes:

The distribution module is used for distributing the order to a target dispatcher according to the return value, wherein the target dispatcher is one dispatcher among the plurality of dispatcher;

The markov decision process is comprised of five tuples, the five tuples comprising: the order allocation situation of the dispatcher, the state transition probability matrix, the return function, the discount factors and the strategy set; the state transition probability matrix represents the probability of user mailing in a preset area; the return function is used for calculating a return value obtained by taking action by a receiver; the discount factor is a discount coefficient of a return value obtained by taking action by a receiver relative to a return value obtained by taking action last time;

the second obtaining module is specifically configured to obtain, for each allocation policy, a return value of a dispatcher corresponding to the allocation policy for transferring from a current state to a next state according to the return function, the discount factor, and the state transfer probability matrix; and updating the next state to be the current state, performing iterative calculation until the return value converges, and determining the return value as the return value corresponding to the allocation strategy.