CN115933554A

CN115933554A - Path planning method, path planning model and training method for selection model

Info

Publication number: CN115933554A
Application number: CN202211490812.XA
Authority: CN
Inventors: 曹宜超; 丁建辉; 陈珍
Original assignee: Baidu China Co Ltd
Current assignee: Baidu China Co Ltd
Priority date: 2022-11-25
Filing date: 2022-11-25
Publication date: 2023-04-07

Abstract

The disclosure provides a path planning method, a path planning model and a training method for selecting the model, and relates to the technical field of computers, in particular to the technical field of artificial intelligence, intelligent traffic and deep learning. The specific implementation scheme of the path planning method is as follows: dividing a first path set corresponding to the first path planning information to obtain a plurality of second path sets; re-planning the paths according to second path planning information corresponding to the target path set to obtain an optimized third path set; the first path set is updated with the third path set. According to the scheme of the embodiment of the disclosure, a subset with a larger improvement space in path cost is selected from a plurality of subsets contained in the first path set, and the corresponding local path planning problem is solved again, so that the optimization of the current path set is realized, the quality of the path set is improved, and the path planning speed is increased.

Description

Path planning method, path planning model and training method for selection model

Technical Field

The present disclosure relates to the field of computer technology, and more particularly to the field of artificial intelligence, intelligent transportation, and deep learning technology.

Background

In the field of logistics distribution, the main vehicle path planning problem is how to serve spatially dispersed customers with minimal cost. The distribution path is optimized, so that the working efficiency can be improved, and the working quality can be improved. In the prior art, when the scale of a client is large, the speed of path planning is too low, and the requirement is difficult to meet.

Disclosure of Invention

The disclosure provides a path planning method, a path planning model, a training method, a training device and a storage medium of a selection model.

According to an aspect of the present disclosure, there is provided a path planning method, including:

dividing a first path set corresponding to the first path planning information to obtain a plurality of second path sets; wherein the second path set is a subset of the first path set;

selecting a target path set from the plurality of second path sets;

re-planning the path according to second path planning information corresponding to the target path set to obtain an optimized third path set; the second path planning information comprises partial information of the first path planning information;

the first path set is updated with the third path set.

According to another aspect of the present disclosure, there is provided a training method of selecting a model, including:

determining first parameters of all path subsets contained in a sample path set, wherein the sample path set has corresponding first path planning information;

determining a second parameter of each path subset by using the first selection model according to each path subset and the corresponding second path planning information; the second path planning information corresponding to the path subset comprises partial content of the first path planning information of the sample path set to which the path subset belongs; and

and training the first selection model according to the difference value between the first parameter and the second parameter to obtain a trained second selection model.

According to another aspect of the present disclosure, there is provided a training method of a path planning model, including:

inputting the path planning information sample into an encoder of a first path planning model to obtain a coding vector corresponding to the path planning information sample, wherein the coding vector comprises a characterization vector of at least one node in the path planning information sample;

and iteratively executing the following steps until each second path included in the training path set is obtained: inputting the characterization vector of the node included in the second path into a decoder of the first path planning model to obtain a decoding vector of the second path; fusing the decoding vector of the second path and the coding vector corresponding to the path planning information sample based on an attention mechanism to obtain a next node in the second path; and

and training the first path planning model by using a gradient ascent method and a cost function of the training path set to obtain a trained second path planning model.

According to another aspect of the present disclosure, there is provided a path planning apparatus including:

the dividing module is used for dividing a first path set corresponding to the first path planning information to obtain a plurality of second path sets; wherein the second set of paths is a subset of the first set of paths;

the selection module is used for selecting a target path set from the plurality of second path sets;

the optimization module is used for re-planning the paths according to the second path planning information corresponding to the target path set to obtain an optimized third path set; the second path planning information comprises partial information of the first path planning information;

and the updating module is used for updating the first path set by utilizing the third path set.

According to another aspect of the present disclosure, there is provided a training apparatus for selecting a model, including:

the first determining module is used for determining first parameters of all path subsets contained in a sample path set, and the sample path set has corresponding first path planning information;

a second determining module, configured to determine, according to each path subset and second path planning information corresponding to the path subset, a second parameter of each path subset by using the first selection model; the second path planning information corresponding to the path subset comprises partial content of the first path planning information of the sample path set to which the path subset belongs; and

and the first training module is used for training the first selection model according to the difference value between the first parameter and the second parameter so as to obtain a trained second selection model.

According to another aspect of the present disclosure, there is provided a training apparatus for a path planning model, including:

the input module is used for inputting the path planning information sample into an encoder of the first path planning model to obtain a coding vector corresponding to the path planning information sample, wherein the coding vector comprises a characterization vector of at least one node in the path planning information sample;

the iteration module is used for iteratively executing the following steps until each second path included in the training path set is obtained: inputting the characterization vector of the node included in the second path into a decoder of the first path planning model to obtain a decoding vector of the second path; fusing the decoding vector of the second path and the coding vector corresponding to the path planning information sample based on an attention mechanism to obtain a next node in the second path;

and the second training module is used for training the first path planning model by using a gradient ascent method and a cost function of the training path set to obtain a trained second path planning model.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.

According to the scheme of the embodiment of the disclosure, a subset with a larger improvement space in path cost is selected from a plurality of subsets contained in the first path set, and the corresponding local path planning problem is solved again, so that the optimization of the current path set is realized, the quality of the path set is improved, and the path planning speed is increased.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a schematic flow diagram of a path planning method according to an embodiment of the present disclosure;

FIG. 2 is an architectural diagram of a path planning model according to an embodiment of the present disclosure;

FIG. 3 is a flow diagram of divide and conquer iterative solution according to an embodiment of the disclosure;

FIG. 4A is a schematic diagram of a path before fine tuning according to an embodiment of the present disclosure;

FIG. 4B is a schematic diagram after path trimming according to an embodiment of the present disclosure;

FIG. 5 is a schematic overall flow chart diagram of a path planning method according to an embodiment of the present disclosure;

FIG. 6 is a schematic flow diagram of a training method for selecting a model according to an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a training method of selecting a model according to an embodiment of the present disclosure;

FIG. 8 is a flow chart diagram of a method of training a path planning model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a path planning apparatus according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a structure of a training apparatus for selecting a model according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a training apparatus for a path planning model according to an embodiment of the present disclosure;

FIG. 12 is a block diagram of an electronic device used to implement an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the related art, how to perform logistics distribution faster and more intelligently becomes an urgent problem to be solved in the current practical application. Taking the capacity-limited path planning problem (CVRP) as an example, there are several customers with needs, each customer has a position coordinate and its own demand, and the coordinates of a distribution station are known, so that the total cost (total path mileage or other costs) needs to be minimized on the premise that the planned distribution route meets all the customer needs. Currently, open-source solving Tools such as OR-Tools, LKH-3 and the like are generally used for solving the CVRP problem, however, at present, these solving Tools can solve a better result on a small-scale CVRP problem within an acceptable time range, for example, a better feasible solution (closer to an optimal solution) of a CVRP problem of 100 clients is solved within 300 seconds. However, when the number of clients becomes large, such as a CVRP problem involving 1000 clients, these solvers may take several hours to solve a good solution, which is unacceptable in real-world scenarios.

In order to at least partially solve one or more of the above problems and other potential problems, embodiments of the present disclosure provide a path planning method, a path planning model, and a training method for selecting a model, and with the technical solution of the embodiments of the present disclosure, a better solution of a large-scale Vehicle planning Problem can be solved quickly, and the present disclosure can be popularized and applied to a large-scale CVRP Problem and a VRPTW Problem (Vehicle Routing with Time Windows, vehicle path planning Problem with Time Windows).

Fig. 1 is a schematic flow chart of a path planning method according to an embodiment of the present disclosure, as shown in fig. 1, the method at least includes the following steps:

s101, dividing a first path set corresponding to first path planning information to obtain a plurality of second path sets; wherein the second set of paths is a subset of the first set of paths.

In the embodiment of the present disclosure, the path planning information may be understood as a path planning task, a path planning condition, a path planning problem, and the like. In one example, taking a logistics distribution scenario as an example, the first path planning information may include customer information, vehicle information, distribution station (warehouse) information, and the like. The customer information may include location information, demand information, for each customer. The demand information may further include the volume of the goods to be delivered, the time window during which the goods may be received, etc. The vehicle information may include capacity information of each delivery vehicle. The delivery station information may include location information of the delivery station.

In one example, the first path plan information may include a CVRP problem. After digitizing various types of information, the CVRP problem may be specifically: there are 1000 customers, each with coordinates of [ [1203.76,76.78],. And [1243.98,34,62] ], each customer corresponding to demand [27,87.,. 91], warehouse point coordinates of [1287.12,66,98], n delivery vehicles, each with a capacity of 8000.

The first path set may include a set of multiple paths passing through the location of the client, and the paths included in the first path set may collectively cover all clients that need to be passed through by the first path planning information. Generally, the distribution vehicle has a limited capacity, and after the distribution vehicle distributes goods loaded in the vehicle, the distribution vehicle needs to return to a distribution station for reloading. One path may include: and the delivery vehicle starts from the starting point, and returns to the starting point again after delivery is completed. The distribution vehicle distributes the clients on the paths in sequence along the paths, and each client is represented as a node on the path.

The second set of paths is a subset of the first set of paths, and the second set of paths may include partial paths of the first set of paths. There may or may not be partial intersection between different second sets of paths.

And S102, selecting a target path set from the plurality of second path sets.

In the embodiment of the present disclosure, the target path set may be a path set that needs to be further processed. And selecting a target path set from the plurality of second path sets, and selecting a path set capable of improving the distribution efficiency to a greater extent and reducing the distribution cost from the second path sets. That is, a larger promoted path set can be obtained by optimizing the target path set. The specific selection method is described later.

And S103, re-planning the paths according to the second path planning information corresponding to the target path set to obtain an optimized third path set. Wherein the second path planning information includes partial information of the first path planning information.

In an embodiment of the present disclosure, the second path planning information may be a subset of the first path planning information. If the first path planning information is referred to as a path planning problem, the second path planning information may also be referred to as a sub-problem of the path planning problem. Each second set of paths may have corresponding second path planning information. And path planning can be carried out again on second path planning information, namely the subproblems, corresponding to the target path set, so that a third path set which is better than the target path set is obtained in a quicker time. Any path planning method may be used for re-planning the path, and the method is not limited herein.

And S104, updating the first path set by using the third path set.

In the embodiment of the present disclosure, the original target path set in the first path set is replaced with the third path set, so that the quality of the first path set can be improved, and part of the distribution cost can be reduced. The delivery cost, also referred to as a cost (cost), is typically determined by the length of the delivery path.

In the embodiment of the present disclosure, S101 to S104 may be executed once or may be executed iteratively for multiple times. A specific number of iterations or a condition to stop the iteration may be set. For example, after the first path set is updated each time, S101 to S104 are repeatedly executed based on the updated first path set, the updated first path set is divided to obtain a second path set, a target path set is selected from the plurality of second path sets, path planning is performed again, and the optimized third path set is used to update the first path set. Through multiple iterations, a sub-problem can be selected from the path planning problem for multiple times to solve. The large problem is divided into sub-problems to be treated by dividing, and each iteration optimizes the solution of one sub-problem, so that the solution of the division and treatment iteration can be solved through multiple iterations, the quality of the whole solution is improved, and a better feasible solution of the path planning problem is obtained.

According to the scheme of the embodiment of the disclosure, a subset capable of reducing the path cost to a greater extent is selected from a plurality of subsets contained in the first path set, and the local path planning problem corresponding to the subset is solved again, so that the optimization of the current path set is realized, and a better feasible solution for the path planning problem is obtained. Improving the quality of the feasible solution can improve the delivery efficiency in the delivery scene. Meanwhile, the number of clients corresponding to the subproblems is smaller, the solving speed is higher, and a better and feasible solution can be quickly obtained.

In a possible implementation manner, the path planning method according to the embodiment of the present disclosure includes steps S101 to S104, and the method may further include the steps of:

s201, initial path planning is carried out according to the first path planning information to obtain an initial first path set.

In the embodiment of the present disclosure, S201 may precede S101. And planning the path of the path planning problem represented by the first path planning information to obtain an initial first path set, namely an initial solution. The quality of the initial solution may not be required and may be feasible, so the initial solution may also be understood as a generally feasible solution. In one example, a general feasible solution may be generated using a pre-trained path planning model. It should be noted that various path planning methods in the related art may also be adopted to obtain a general feasible solution.

According to the scheme of the embodiment of the disclosure, a general feasible solution is generated for the first path planning information to obtain an initial first path set for subsequent gradual optimization, so that the path planning efficiency is higher.

In a possible implementation manner, the step S201 performs initial path planning according to the first path planning information to obtain an initial first path set, and further includes the steps of:

and S2011, inputting the first path planning information into an encoder of the path planning model to obtain a coding vector corresponding to the first path planning information, wherein the coding vector comprises a characterization vector of at least one node in the first path planning information.

In the embodiment of the present disclosure, as shown in the architecture diagram of the path planning model in fig. 2, the model adopts a neural network model based on a Transformer (Transformer) structure. The model contains several encoders and decoders. The path planning problem of the first path planning information is input into the encoder after the problem is converted into a vector through the encoding input layer. Specifically, the encoding input layer is vectorized of the position information and the demand information of the node represented by each client.

And S2012, iteratively executing the following steps until obtaining. Each first path included in the first set of paths:

and inputting the characterization vectors of the nodes included in the first path into a decoder of the path planning model to obtain the decoding vectors of the first path.

And fusing the decoding vector of the first path and the coding vector corresponding to the planning information of the first path based on an attention mechanism to obtain the next node in the first path.

In the embodiment of the disclosure, the encoder is used for encoding the whole path planning problem, encoding the whole path planning problem into a vector, inputting the vector into the decoder for decoding the problem, and decoding a feasible solution of the problem. Since the decoding is done in cycles, the decoder parses out one node in the path each time a cycle is made. The input layer of the decoder inputs the vector of the client node which is already decoded, namely the decoded vector of the existing first path. By attention-based fusion of the decoded vector together with the encoded vector for the entire path planning problem, the model can output which node is the next node of the current first path in the feasible solution. Specifically, the decoder outputs a probability for each node. According to the probability, the node with the maximum probability is selected as the next node each time by using the greedy (beam search) idea, and the path covering each node is generated finally by continuously iterative decoding, so that a general feasible solution is obtained.

According to the scheme of the embodiment of the disclosure, the pre-trained model is used for generating the initial solution, and the model adopts an offline training online prediction mode, so that the solution can be quickly carried out after a path planning problem is given on the line, and the initial first path set is obtained.

In a possible implementation manner, the step S2011 inputs the first path planning information into an encoder of the path planning model, so as to obtain an encoding vector corresponding to the first path planning information, and the method further includes the steps of:

the first path plan information is converted into a first vector through an input layer of an encoder.

And sequentially processing the first vector through a multi-head attention mechanism layer, a first residual error and normalization layer, a feedforward network layer, a second residual error and a normalization layer of the encoder to obtain a characterization vector of at least one node in the first path planning information.

In the embodiment of the disclosure, after the path planning problem is converted into the first vector through the encoding input layer, the first vector sequentially passes through the multi-head attention mechanism layer, the first residual and normalization layer, the feedforward network layer, the second residual and the normalization layer, and then the characterization vector of at least one node is obtained. If the layer structure of the encoder in the path planning model changes, the encoding process can also change accordingly.

According to the scheme of the embodiment of the disclosure, the path planning problem is encoded by using the encoder, and the vector representing the problem is obtained based on the attention mechanism, so that the quality of the obtained initial solution is improved. The quality of the initial solution has certain influence on the result of the iterative solution, so that the quality of a better feasible solution can be improved.

In a possible implementation manner, the step S2011 inputs the first path planning information into an encoder of the path planning model to obtain a coding vector corresponding to the first path planning information, and further includes the steps of:

and splicing the characterization vectors of the plurality of nodes in the first path planning information through the full connection layer to obtain the coding vector corresponding to the first path planning information.

In the embodiment of the present disclosure, after obtaining the characterization vectors of each node, in order to obtain the characterization vectors of the entire customer, the characterization vectors of all nodes are spliced together and pass through a full connection layer, so that the entire path planning problem, that is, the characterization vector of the first path planning information can be obtained.

According to the scheme of the embodiment of the disclosure, the vectors of all the nodes are spliced through the full-connection layer to obtain the characterization vector of the first path planning information, so that the features extracted by the model are more comprehensive.

In a possible implementation manner, the step S102 selects a target path set from the plurality of second path sets, and further includes the steps of:

s301, inputting the plurality of second path sets and second path planning information corresponding to the plurality of second path sets into a selection model, and predicting respective optimization parameters of the plurality of second path sets.

In the embodiment of the present disclosure, the optimization parameter may be understood as an estimation value used to evaluate the optimization effect after the optimization is performed on each second path set. By taking the optimized parameter as cost as an example, the cost of a plurality of second path sets after optimization can be predicted by selecting a model and training in advance. cost may be determined based on the length of the path. The length of the path determines the cost to the greatest extent, and the cost may also include the number of traffic lights on the path, the traffic flow of the road, and other factors.

S302, determining a second path set with the maximum difference between the optimization parameters in the second path sets and the initial parameters of the first path set as a target path set.

In the embodiment of the present disclosure, the initial parameter may be an original length of each path in the second path set that is not optimized. And determining a second path set with the maximum difference as a target path set by selecting the difference between the optimized cost predicted by the model and the initial cost.

According to the scheme of the embodiment of the disclosure, the optimization parameters of each second path set are predicted by selecting the model, and compared with the initial optimization parameters, the subsequent path optimization is performed on the seat target path set with the largest difference, so that the quality of a feasible solution can be improved to the greatest extent by each optimization, and the length of a distribution path is reduced.

In one possible embodiment, the plurality of second path sets includes: n adjacent paths in the first path set; wherein, N is an integer greater than 1, and N is less than M, M is the path quantity in the first path set.

In an embodiment of the present disclosure, each second path set includes N adjacent paths in the first path set. Taking M as 5,N as 3 for example, the first path set includes 5 paths, and each path and the nearest two paths around it constitute a second path set, so there are 5 different second path sets.

In one example, the divide and conquer iterative solution flow shown in fig. 3 includes the following steps:

step (a): a path set is obtained, and the path set may be an initial first path set or an updated first path set. In the figure, the five-pointed star is a distribution station, and the circle is a node of each path.

Step (b): the center point of each path in the set of paths is determined (diamond in the figure).

Step (c): for each path, 2 paths adjacent to the path are selected as a second set of paths.

A step (d): a set of target paths is determined by selecting a model.

A step (e): and solving the target path set to obtain an optimized third path set.

Step (f): and (c) updating the first path set by using the third path set, and returning the updated first path set to the step (b) for iteration.

According to the scheme of the embodiment of the disclosure, the complexity of the sub-problem of the path planning problem can be controlled by setting the parameter N, so that the solving speed and quality of the sub-problem can be adjusted conveniently.

In a possible implementation manner, the step 103 performs path planning again according to the second path planning information corresponding to the target path set to obtain an optimized third path set, and further includes the steps of:

and extracting the information of each node to be processed from the paths included in the target path set.

In the embodiment of the present disclosure, the node information to be processed is path planning information corresponding to each node in the target path set, and includes position information and demand information of the node.

And re-planning the path of each node information to be processed by using an open source solver according to the second path planning information corresponding to the target path set to obtain an optimized third path set.

In the embodiment of the disclosure, the open source solver can be solving Tools such as OR-Tools and LKH-3.

According to the scheme of the embodiment of the disclosure, the selected sub-problem is solved by using the open source solver, and as the number of clients involved in the sub-problem is small, the solving speed is high, the first path set can be optimized rapidly with low cost, and a better and feasible solution is obtained.

s401, determining a path to be adjusted according to the distance between the nodes of each path in the first path set.

S402, adjusting the nodes and/or the node sequence through which the paths to be adjusted pass to obtain an adjusted fourth path set.

In the embodiment of the present disclosure, the path to be adjusted may be understood as a path that may exist in the path set and may be further optimized. For example: some nodes in one of the paths in the set of paths are far from other nodes in the path, but are closer to another path, which may significantly degrade the quality of the solution. Thus, the partial node may be transferred from the original path to the other, closer path.

It should be noted that the path to be adjusted may be generated in the initial feasible solution, or may be generated in the better feasible solution after optimization. That is, the path set to be adjusted may exist in the first path set before the update or may exist in the first path set after the update with the third path set.

According to the scheme of the embodiment of the disclosure, the optimized first path set is further adjusted, so that the better feasible solution can be further optimized, and the obtained fourth path set is closer to the solution of the optimal solution or becomes the optimal solution.

In a possible implementation manner, the step S401 determines a path to be adjusted according to a distance between nodes of each path in the first path set, and further includes the steps of:

s4011, dividing all nodes on each path according to the distance between each node of each path in the first path set to obtain a plurality of cluster nodes and the central point of each cluster node.

In the embodiment of the present disclosure, an average value of distances between nodes on each path may be obtained through calculation, and when a distance between two consecutive nodes on a path is greater than the average value of distances between nodes on the path and exceeds a preset threshold, the former and the latter of the two nodes are divided into different clusters, so as to divide the nodes on each path into a plurality of clusters. And after the division is finished, determining the central point of each cluster node according to the position of each cluster node.

S4012, determining the central point of each path contained in the target path set.

The center point of the path may be determined from the center points of the positions of all nodes on the path.

S4013, determining the path to be adjusted according to the distance between the center point of each cluster node and the center point of each path.

In the embodiment of the present disclosure, as shown in fig. 4A, a cluster of nodes in the frame area on the path 4 in fig. 4A is far from the center point of the path 4 where they are located, but is close to the center point of the path 1. Therefore, whether the cluster node needs to be adjusted or not can be determined by calculating the central point of the cluster node and then respectively calculating the distance between the central point of the cluster node and the central points of all paths. Specifically, if the cluster node is added to other paths, the correlation condition can be satisfied, and the adjusted cost is smaller than the previous cost, the cluster node may be adjusted and added to the path 1 closest to the cluster node. In this case, the path where the cluster node is located is determined as the path to be adjusted. The above-mentioned relevant conditions may include: whether the time window requirement of the cluster of clients is met, whether the capacity of the delivery vehicles exceeds the standard after adjustment, and the like. The schematic diagram after adjusting the opposite frame area is shown in fig. 4B.

According to the scheme of the embodiment of the disclosure, the path to be adjusted can be determined through the distance between the center point of each cluster node on each path and the center point of each path, so that further fine adjustment and correction of the path set are realized, and the superiority of the final result is ensured.

In a possible implementation manner, step S4013 determines a path to be adjusted according to a distance between a center point of each cluster node and a center point of each path, and further includes the steps of:

and determining a first distance between the center point of each cluster node and the center point of the path where the cluster node is located.

And determining second distances between the center point of each cluster node and the center points of other paths.

And under the condition that the first distance of any cluster node is greater than any second distance of the corresponding cluster node, determining the cluster node as a node to be adjusted, wherein the path where the node to be adjusted is located is the path to be adjusted.

In one example, assume that there are A, B, C paths in the path set, and nodes on the three paths are divided into 3 clusters, 2 clusters, and 4 clusters, respectively, which are denoted as: a1, A2, A3, B1, B2, C1, C2, C3, C4. And determining a first distance between the center point of each cluster node and the center point of the path where the cluster node is located, wherein the first distance is the distance from the center point of the three cluster nodes A1, A2 and A3 to the center point of the path A. B. The C path is the same. And determining a second distance between the center point of each cluster node and the center point of each other path, namely determining the distances from the center points of the three cluster nodes A1, A2 and A3 to the centers of the path B and the path C respectively. B. The C path is the same. The first distance between any cluster node is greater than any second distance between corresponding cluster nodes, that is, the distance from a cluster node to the center point of the path where the cluster node is located is greater than the distance from the cluster node to the centers of other paths, which indicates that the cluster node is closer to other paths and may need to be adjusted. Therefore, the cluster node is determined as the node to be adjusted, and the path where the node to be adjusted is located is the path to be adjusted.

According to the scheme of the embodiment of the disclosure, whether each cluster node is closer to other paths except for the path where the cluster node is located is determined according to the first distance and the second distance of each cluster node, so that the path can be corrected.

In a possible implementation manner, the step S402 adjusts nodes and/or node sequences through which the paths to be adjusted pass, to obtain an adjusted fourth path set, and further includes the steps of:

and determining a path with the minimum second distance with the node to be adjusted as a target path.

And deleting the nodes to be adjusted from the paths to be adjusted and adding the nodes to be adjusted into the target path.

And re-determining the node sequence of the target path and the path to be adjusted to obtain an adjusted fourth path set.

In the embodiment of the present disclosure, the node sequence of the target path and the path to be adjusted is re-determined, which may be determined according to the distance between the center point of each cluster of nodes of the target path and the center point of the node to be adjusted. The node order of the target path. Specifically, two clusters with the closest distance are selected according to the distance between each cluster node in the target path and the node to be adjusted, and the node sequence of the node to be adjusted is arranged between the two clusters according to the sequence of the two clusters. And path planning can be carried out again according to the path planning information corresponding to the target path and the path to be adjusted respectively, so that the node sequence is determined.

According to the scheme of the embodiment of the disclosure, the quality of a better feasible solution is further improved by re-determining the nodes and the node sequence of the path to be adjusted.

In a possible implementation manner, the path planning method according to the embodiment of the present disclosure includes step S201, step S101 to step S104, and step S401 to step S402. Specifically, the overall flow chart of the path planning method shown in fig. 5 is illustrated. The path planning method can be executed by the following modules: the device comprises an initial solution generation module, a divide-and-conquer iteration solving module and a solution fine adjustment module. For example, for a path planning problem, steps S201, S2011 to S2012 are executed by the initial solution generating module (path planning model), so as to obtain a general feasible solution. Inputting the general feasible solution into an iterative solution module, and executing steps S101 to S104 and steps S301 to S302 to obtain a better feasible solution. And inputting the better feasible solution into a solution fine adjustment module, and executing steps S401 to S402 and S4011 to S4013 to obtain an optimal solution or a solution close to the optimal solution.

In an example, taking path planning information as a CVRP question, a specific CVRP question may include: there are 1000 customers, each with coordinates of [ [1203.76,76.78],.., [1243.98,34,62] ], each with a corresponding demand of [27,87.,. 91], warehouse point coordinates of [1287.12,66,98], n cars, each with a capacity of 8000. Firstly, the information of the problem is input into an initial solution generation module, the initial general feasible solution is generated by the module according to the problem and the problem type, and the solution has certain influence on the result of the subsequent iterative solution. Thus using a pre-trained path planning model to generate an initial solution to the problem; after the initial general feasible solution is generated, the solution and the current problem are input into an iterative solution module to iteratively solve the large-scale VRP problem, and a better feasible solution is output and can be used as a final output solution. However, practice shows that the better feasible solution has some optimizable places, so that a heuristic algorithm is used for fine adjustment of the better solution, so that the solution is closer to the optimal solution or becomes the optimal solution. The method uses the mode of online prediction of offline training, so that a problem can be given online and then solved quickly and a better solution can be output, and compared with an open source solver, the method is faster in solving speed and better in effect.

The initial solution generation module, the divide-and-conquer iteration solving module and the solution fine adjustment module are specifically introduced as follows:

initial solution generation module

A good initial solution can provide a good starting point for solving the problem by subsequent iteration division, so that the generation of the solution can be accelerated to improve the quality of the solution. We therefore add an initial solution generation module before iterative divide and conquer solving a large-scale VRP problem. The initial solution generation module is mainly improved based on a Transformer, and the overall structure of the model is shown in fig. 2.

Model structure: the whole model comprises two parts: a plurality of encoders and a plurality of decoders. In accordance with conventional transformers, multi-layered encoders are used, each encoder consisting of an attention mechanism layer, a feedforward neural network layer, a residual and an annotation layer. The input path planning problem is converted into vectors through an encoding input layer and then input into an encoder, and the vectors sequentially pass through a multi-head attention mechanism, a residual error and standardization layer, a feedforward network layer, the residual error and standardization layer to obtain a characterization vector of each client node. In order to obtain the characterization vectors of the whole client, the characterization vectors of all clients are spliced together and pass through a layer of fully-connected neural network layer, so that the characterization vectors of the whole problem can be obtained. The encoder is used for encoding the whole path planning problem, and the encoded path planning problem can be input into the decoder for decoding the problem after being encoded into a vector, so that a feasible solution of the problem is decoded. Since decoding is performed in cycles, each time the decoder is to resolve one node in a feasible solution path, the input layer of the decoder is the vector of client nodes that have been previously decoded, i.e., nodes that have been included by the existing path. By fusing the decoded vector and the entire problem's encoded vector together in an attention mechanism, the model can output what the next node in the current path is. The standard for determining the next node is the probability output by the decoder, the decoder outputs a probability for each client node, and the node with the highest probability is selected as the next node each time by using the greedy (beam search) idea, so that a feasible solution path is generated by continuously iterative decoding.

Model training: the initial solution generation model is trained by using reinforcement learning, for example, a path planning problem of 1000 for 100 ten thousand clients is randomly constructed when the initial solution generation model is trained on line, one problem is randomly selected each time in the reinforcement learning training to be encoded and decoded to obtain a feasible solution, then the cost of the feasible solution is calculated, and then the network is updated according to the cost by using a gradient ascent method. Since the smaller the cost, the better, passing-cost to the model as loss allows model learning to reduce cost as much as possible until the model convergence training is over.

Model prediction: after the model training is completed, the initial solution generation prediction of the relevant problem can be performed on line. After inputting a new path planning problem, the model will quickly output a feasible solution as the initial solution of the problem.

Divide and conquer (two) iteration solving module

For a large-scale path planning problem (for example, 1000 client nodes), an initial solution of the problem can be obtained through the initial solution generation module, but because there are many client nodes in the problem, an initial solution generated by a pure end-to-end model may be far from an optimal solution and cannot meet actual business requirements. It is therefore necessary to further solve the problem based on this initial solution using a divide and conquer iterative solution module.

And (3) dividing and calculating iterative solution: for a specific path planning problem P, the obtained feasible solution X contains k paths, and then a trained subproblem selector f θ is used to select an area S on the feasible solution X, where S contains some paths in the k paths. The area S is a subproblem of the path planning problem P, and the trained f theta needs to select an area S which can improve the overall solution to a greater extent. And then extracting nodes of all paths in the S, solving the S by using an open source solver, and obtaining a solution Xs' of the subproblem S. And then replacing the solution of the sub-problem with the solution of the corresponding sub-problem in the original feasible solution X. A new feasible solution X1 with a certain improvement in quality is obtained. The above operations are iteratively performed until a certain number of steps is reached or an optimal solution is obtained. How to select the subproblems is related to the solution quality of the final problem, the embodiment of the present disclosure trains a subproblem selection model by using a Transformer-based encoder module, so as to select the subproblems at each step of iteration. The structure of the sub-problem selection model is shown in FIG. 7.

Training of the subproblem selection model: for solving the path planning problem that the number of the client nodes reaches 1000, 5000 path planning problems that the number of the client nodes is 1000 can be randomly generated, then the 5000 problems are used for respectively solving for 3 thousands of steps by using an iterative solution method, the whole problem and the inner sub-problem are solved, and the initial cost of the sub-problem S and the corresponding cost of the sub-problem S in the better solution are recorded. Since the selection of the training model is not yet trained, the selection of the sub-problem is performed in a random manner. By solving the 5000 problems, 100 ten thousand training samples can be obtained, and then the samples are used for training the subproblem selection model. And inputting the node coordinate information and the requirement information of the subproblems into the model, and outputting cost fitting the current subproblems after sequentially passing through a 6-layer Transformer coding layer, a full-connection layer and an average pooling layer. Thus allowing the model to fit a cost of the sub-problem at a better solution.

And (3) prediction of a model: in the prediction stage, a sub-problem selection model can be used for predicting each sub-problem respectively, and the sub-problem with the largest difference is the sub-problem with the largest reduction of the cost of the whole problem by selecting the cost predicted by the model and the cost of the current initial solution to make a difference. And solving the subproblem by using an open source solver, and replacing the original solution by the solution of the subproblem, so that iteration is continuously carried out until a better solution is solved or the set iteration step number is reached. And each iteration selects the subproblem with the most cost reduction, and the subproblem is subjected to divide-and-conquer solution, so that the cost can be reduced quickly, and the solution quality is improved.

(III) fine tuning module

A better solution can be obtained after the divide and conquer iteration solving module is used for solving the large-scale path planning problem, but the better solution can be further optimized when the better solution is found in an actual service scene. For example, a small number of client nodes in one of the paths of the solution are far from most of the client points of the path but close to some other path, which significantly degrades the quality of the solution. As shown in the block of fig. 4A, the small number of client nodes are far from the center point of the line where they are located and are close to the center points of other paths, so that the distances between the small number of client nodes and the center points of all paths in the current path are calculated respectively after calculating the center points of the small number of client nodes. If the small part of the client points can be added into other paths to satisfy the relevant conditions, and the cost of the whole path is smaller than that before, the small part of the client points are adjusted and added into the paths closest to the small part of the client points, and the schematic diagram after the adjustment of the frame area is shown in fig. 4B.

By the heuristic micro-adjustment method, the better solution generated by the model can be further optimized, so that a solution closer to the optimal solution or the optimal solution is obtained, and the superiority of the final result is ensured.

The path planning method provided by the embodiment of the disclosure solves the problem by continuously iterative optimization under the synergistic effect of the three modules, and when a large-scale path planning problem is processed, a better solution can be solved for the large-scale path planning problem in a shorter time, and the time spent on solving the same solution on service data is reduced by more than 2 times.

Fig. 6 is a flow chart of a training method for selecting a model according to an embodiment of the disclosure, as shown in fig. 6, the method at least includes the following steps:

s601, determining first parameters of each path subset contained in a sample path set, wherein the sample path set has corresponding first path planning information.

S602, according to each path subset and the corresponding second path planning information thereof, determining second parameters of each path subset by using the first selection model. The second path planning information corresponding to the path subset comprises partial content of the first path planning information of the sample path set to which the path subset belongs.

S603, training the first selection model according to the difference value between the first parameter and the second parameter to obtain a trained second selection model.

In the embodiment of the present disclosure, the first parameter and the second parameter may be understood as a cost of the path, and the cost may be generally determined by the length of the path. The first parameter is the actual cost after optimization, i.e. the true value. The second parameter is the optimized cost, i.e., predicted value, predicted by the model. The first selected model may be trained based on a difference between the predicted value and the true value. Therefore, the trained second selection model can accurately predict the optimized cost of each path subset.

As shown in fig. 7, the selected model adopts a Transformer structure-based encoder, and the model includes a Transformer encoder layer, a full-link layer and an average pooling layer. In one example, the number of transform encoder layers is 6 layers.

The number of customer nodes of the sample path set may be comparable to the number of customers of the path planning problem to be solved on-line. In one example, it may be generated specifically for a path planning problem where the customer node is 1000. Specifically, 5000 client nodes can be randomly generated to be a 1000 path planning problem, then the 5000 problems are used for solving for 3 thousands of steps by the iterative divide and conquer solving method, the optimal solution of the whole problem and the sub-problem inside the problem can be solved, and the cost corresponding to the sub-problem S and the sub-problem S when the sub-problem S is the optimal feasible solution is recorded. At this time, model training is not completed, so that the selection of the subproblems is selected in a random mode, at least hundreds of thousands of training samples can be obtained by solving the 5000 problems, and then the training of the subproblem selection model is performed by using the sample path sets.

According to the scheme of the embodiment of the disclosure, the model is selected to predict the optimized cost according to the second path planning information corresponding to each path subset, so that which path subset is optimized can be determined according to the original cost to reduce the cost to the maximum, the speed of optimizing the path set can be increased, and a better feasible solution can be obtained more quickly.

In a possible implementation manner, the step S602 determines, according to each path subset and its corresponding second path planning information, a second parameter of each path subset by using the first selection model, and further includes:

and processing each path subset and the corresponding second path planning information sequentially through the converter coding layer, the full connection layer and the average pooling layer of the first selection model to obtain a second parameter of the second path planning information.

In the embodiment of the disclosure, each path subset and the corresponding second path planning information thereof, that is, the node position information and the demand information of each subproblem are input into the model, and the cost fitting the current subproblem is output after sequentially passing through the 6-layer Transformer encoder layer, the full-link layer and the average pooling layer. The transform encoder layer comprises an encoding input layer, a full connection layer, and a QKV multi-head attention layer.

According to the scheme of the embodiment of the disclosure, the cost of the current problem can be more accurately fitted through the Transformer encoder layer, the full-link layer and the average pooling layer, the feature extraction module of the model is changed from accumulation averaging to splicing through 2 full-link layers, and the extracted features are more comprehensive.

Fig. 8 is a flowchart illustrating a method for training a path planning model according to an embodiment of the disclosure, as shown in fig. 8, the method at least includes the following steps:

s801, inputting the path planning information sample into an encoder of the first path planning model to obtain a coding vector corresponding to the path planning information sample, wherein the coding vector comprises a characterization vector of at least one node in the path planning information sample.

The following steps S802 and S803 are iteratively executed until each path included in the training path set is obtained:

s802, inputting the characterization vector of the node of each existing path in the training path set into a decoder of the first path planning model to obtain a decoding vector of the existing path.

And S803, fusing the decoding vector of the existing path and the coding vector corresponding to the path planning information sample based on the attention mechanism to obtain the next node of the existing path.

S804, training the first path planning model by using a gradient ascent method and a cost function of the training path set to obtain a trained second path planning model.

In the embodiment of the disclosure, the path planning model comprises a plurality of encoders and a plurality of decoders, each encoder is composed of a multi-head attention mechanism layer, a first residual and labeling layer, a feedforward neural network layer and a second residual and labeling layer, and the input path planning problem is converted into a vector through one encoding input layer and then is input into the encoder. The coding input layer is used for vectorizing the coordinate information and the requirement information of each client as a node. The vector sequentially passes through a multi-head attention mechanism layer, a first residual error and normalization layer, a feed-forward network layer, a second residual error and normalization layer to obtain a characterization vector of each node. In order to obtain the characterization vectors of the whole clients, the characterization vectors of all the clients are spliced together and pass through a fully-connected neural network layer, and then the characterization vectors of the whole problem are obtained. The encoder is used for encoding the whole path planning problem, and the encoded path planning problem can be input into the decoder for decoding the problem after being encoded into a vector, so that a feasible solution of the problem is decoded. Since the decoding is performed in a loop, the decoder parses out one node in the feasible solution path each time, so the input layer of the decoder is the vector of all previously decoded client nodes, i.e. the vector of all nodes on the existing path. By fusing the decoded vector and the encoded vector for the entire problem together in an attention mechanism, the model can output the next node of the feasible solution path. And determining the standard of the next node as the probability output by the decoder, wherein the decoder outputs a probability for each client node, and selects the node with the highest probability as the next node by using the greedy (beam search) idea, so that the decoding is continuously iterated to finally generate a feasible solution path.

The path planning model is trained by using reinforcement learning, in one example, the goal of model training is to solve a path planning problem with the number of clients being 1000, the path planning problem with the number of 100 ten thousand clients being 1000 is randomly constructed when the path planning model is trained on line, one problem is randomly selected each time in the reinforcement learning training to be coded and decoded to obtain a feasible solution, then the cost (generally the length of the path) of the feasible solution is calculated, then the network model is updated according to the cost by using a gradient ascent method, the updating mode is that the smaller the cost is, the better the cost is, therefore, the cost is transmitted to the model as a cost function loss to enable the model to learn, the cost is reduced as much as possible, and the model convergence training is finished.

According to the scheme of the embodiment of the disclosure, after model training is completed, the generation of the initial feasible solution of the relevant problem can be carried out on line, and for a new path planning problem, the model can rapidly output a feasible solution with higher quality as the initial solution of the problem. The initial feasible solution with higher quality can provide a good starting point for solving the problem by subsequent iteration division, so that the generation of the solution can be accelerated and the quality of the solution can be improved. The speed and quality of the initial feasible solution is faster than that of the solver.

In a possible implementation manner, step S801 inputs the path planning information sample into an encoder of the path planning model to obtain an encoding vector corresponding to the path planning information sample, and further includes the steps of:

the path plan information samples are converted into a first vector by an encoded input layer of an encoder.

And sequentially processing the first vector through a multi-head attention mechanism layer, a first residual error and standardization layer, a feedforward network layer, a second residual error and standardization layer of the encoder to obtain a characterization vector of at least one node in the path planning information sample.

According to the scheme of the embodiment of the disclosure, the path planning problem is encoded by using the encoder, and the vector representing the problem is obtained based on the attention mechanism, so that the quality of the obtained initial feasible solution is improved.

In a possible implementation manner, the step S802 of inputting the path planning information sample into an encoder of the path planning model to obtain a coding vector corresponding to the path planning information sample further includes the steps of:

and splicing the characterization vectors of a plurality of nodes in the path planning information sample through the full connection layer to obtain the coding vector corresponding to the path planning information sample.

According to the scheme of the embodiment of the disclosure, the characterization vectors of a plurality of nodes are spliced through the full connection layer, the extracted features are more comprehensive, and the improvement of the solution quality is facilitated.

Fig. 9 is a schematic structural diagram of a path planning apparatus according to an embodiment of the disclosure, and as shown in fig. 9, the apparatus 900 at least includes:

the dividing module 910 is configured to divide a first path set corresponding to the first path planning information to obtain a plurality of second path sets. Wherein the second set of paths is a subset of the first set of paths.

A selecting module 920, configured to select a target path set from the plurality of second path sets.

The optimizing module 930 is configured to perform path planning again according to the second path planning information corresponding to the target path set, so as to obtain an optimized third path set. Wherein the second path planning information includes partial information of the first path planning information.

An updating module 940 for updating the first path set with the third path set.

In one possible embodiment, the apparatus 900 further comprises:

and the initial path planning module is used for planning an initial path according to the first path planning information to obtain an initial first path set.

In one possible embodiment, the initial path planning module comprises:

and the coding sub-module is used for inputting the first path planning information into a coder of the path planning model to obtain a coding vector corresponding to the first path planning information, wherein the coding vector comprises a characteristic vector of at least one node in the first path planning information.

The iteration submodule is used for iteratively executing the following steps until each first path included in the first path set is obtained: and inputting the characterization vectors of the nodes included in the first path into a decoder of the path planning model to obtain the decoding vectors of the first path. And fusing the decoding vector of the first path and the coding vector corresponding to the planning information of the first path based on an attention mechanism to obtain a next node in the first path.

In one possible embodiment, the coding submodule is configured to:

the first path planning information is converted into a first vector through an input layer of an encoder.

In one possible embodiment, the encoding submodule is further configured to:

In one possible implementation, the selection module 920 is configured to:

and inputting the plurality of second path sets and second path planning information corresponding to the plurality of second path sets into a selection model, and predicting respective optimization parameters of the plurality of second path sets.

And determining the second path set with the largest difference value between the optimization parameters in the plurality of second path sets and the initial parameters of the first path set as a target path set.

In one possible embodiment, the plurality of second path sets includes: n adjacent paths in the first path set. Wherein, N is an integer greater than 1, and N is less than M, M is the path quantity in the first path set.

In one possible implementation, the optimization module 930 is configured to:

In one possible implementation, the apparatus 900 further includes an adjustment module, which includes:

and the path determining submodule is used for determining the path to be adjusted according to the distance between the nodes of each path in the first path set.

And the adjusting submodule is used for adjusting the nodes and/or the node sequence through which the path to be adjusted passes to obtain an adjusted fourth path set.

In one possible implementation, the path determination submodule is configured to:

and dividing all the nodes on each path according to the distance between the nodes of each path in the first path set to obtain a plurality of cluster nodes and the central point of each cluster node.

The center point of each path contained in the target path set is determined.

And determining the path to be adjusted according to the distance between the center point of each cluster node and the center point of each path.

In one possible implementation, the path determination submodule is further configured to:

And determining a second distance between the central point of each cluster node and the central points of other paths.

And under the condition that the first distance of any cluster node is greater than any second distance of the corresponding cluster node, determining the cluster node as a node to be adjusted, wherein the path where the node to be adjusted is located is a path to be adjusted.

In one possible embodiment, the adjustment submodule is configured to:

For a description of specific functions and examples of each module and sub-module of the apparatus in the embodiment of the present disclosure, reference may be made to the description of corresponding steps in the foregoing method embodiments, and details are not repeated here.

FIG. 10 is a schematic diagram of a structure of a training apparatus for selecting a model according to an embodiment of the present disclosure. As shown in fig. 10, the apparatus 1000 includes at least:

the first determining module 1010 is configured to determine first parameters of each path subset included in a sample path set, where the sample path set has corresponding first path planning information.

A second determining module 1020, configured to determine, according to each path subset and the corresponding second path planning information, a second parameter of each path subset by using the first selection model. The second path planning information corresponding to the path subset comprises partial content of the first path planning information of the sample path set to which the path subset belongs.

The first training module 1030 is configured to train the first selection model according to a difference between the first parameter and the second parameter, so as to obtain a trained second selection model.

In a possible implementation, the second determining module 1020 is further configured to:

Fig. 11 is a schematic structural diagram of a training apparatus of a path planning model according to an embodiment of the present disclosure. As shown in fig. 11, the apparatus 1100 includes at least:

the input module 1110 is configured to input the path planning information sample into an encoder of the first path planning model, to obtain a coding vector corresponding to the path planning information sample, where the coding vector includes a feature vector of at least one node in the path planning information sample.

An iteration module 1120, configured to iteratively perform the following steps until each second path included in the training path set is obtained: and inputting the characterization vectors of the nodes included in the second path into a decoder of the first path planning model to obtain a decoding vector of the second path. And fusing the decoding vector of the second path and the coding vector corresponding to the path planning information sample based on an attention mechanism to obtain the decoding vector of the second path. The next node in the second path.

The second training module 1130 is configured to train the first path planning model by using a gradient ascent method and a cost function of the training path set, so as to obtain a trained second path planning model.

In one possible embodiment, the input module is further configured to: the path plan information samples are converted into a first vector by an encoding input layer of an encoder.

And sequentially processing the first vector through a multi-head attention mechanism layer, a first residual error and normalization layer, a feedforward network layer, a second residual error and a normalization layer of the encoder to obtain a characterization vector of at least one node in the path planning information sample.

In one possible implementation, the input module 1110 is further configured to: and splicing the characterization vectors of a plurality of nodes in the path planning information sample through the full connection layer to obtain the coding vector corresponding to the path planning information sample.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 12 shows a schematic block diagram of an example electronic device 1200, which can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the device 1200 can also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 performs the respective methods and processes described above, such as a path planning method, a training method of a selection model, or a training method of a path planning model. For example, in some embodiments, the path planning method, the training method of the selection model, or the training method of the path planning model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into RAM 1203 and executed by computing unit 1201, one or more steps of the path planning method, the training method of the selection model, or the training method of the path planning model described above may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured in any other suitable way (e.g., by means of firmware) to perform a path planning method, a training method of a selection model, or a training method of a path planning model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server combining a blockchain.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A path planning method, comprising:

dividing a first path set corresponding to the first path planning information to obtain a plurality of second path sets; wherein the second set of paths is a subset of the first set of paths;

selecting a target path set from the plurality of second path sets;

re-planning the path according to second path planning information corresponding to the target path set to obtain an optimized third path set; wherein the second path planning information includes partial information of the first path planning information;

updating the first set of paths with the third set of paths.

2. The method of claim 1, further comprising:

and carrying out initial path planning according to the first path planning information to obtain an initial first path set.

3. The method according to claim 2, wherein the performing initial path planning according to the first path planning information to obtain the initial first path set comprises:

inputting the first path planning information into an encoder of a path planning model to obtain a coding vector corresponding to the first path planning information, wherein the coding vector comprises a characterization vector of at least one node in the first path planning information;

iteratively executing the following steps until each first path included in the first path set is obtained: inputting the characterization vector of the node included in the first path into a decoder of the path planning model to obtain a decoding vector of the first path; and fusing the decoding vector of the first path and the coding vector corresponding to the planning information of the first path based on an attention mechanism to obtain a next node in the first path.

4. The method of claim 3, inputting the first path planning information into an encoder of a path planning model, and obtaining a coding vector corresponding to the first path planning information, comprising:

converting the first path planning information into a first vector through an input layer of the encoder;

and sequentially processing the first vector through a multi-head attention mechanism layer, a first residual error and normalization layer, a feedforward network layer, a second residual error and normalization layer of the encoder to obtain a characterization vector of at least one node in the first path planning information.

5. The method of claim 4, inputting the first path planning information into an encoder of a path planning model to obtain a coding vector corresponding to the first path planning information, further comprising:

and splicing the characterization vectors of the plurality of nodes in the first path planning information through a full connection layer to obtain a coding vector corresponding to the first path planning information.

6. The method of claim 1, wherein said selecting a target path set from said plurality of second path sets comprises:

inputting the plurality of second path sets and second path planning information corresponding to the plurality of second path sets into a selection model, and predicting respective optimization parameters of the plurality of second path sets;

and determining a second path set with the largest difference between the optimization parameters in the plurality of second path sets and the initial parameters of the first path set as the target path set.

7. The method of claim 6, wherein the plurality of second path sets comprises: n adjacent paths in the first path set; and N is an integer greater than 1 and less than M, and M is the number of paths in the first path set.

8. The method according to claim 1, wherein the re-planning the path according to the second path planning information corresponding to the target path set to obtain an optimized third path set includes:

extracting information of each node to be processed from the paths included in the target path set;

and re-planning paths of the information of each node to be processed by using an open source solver according to second path planning information corresponding to the target path set to obtain the optimized third path set.

9. The method of claim 1, further comprising:

determining a path to be adjusted according to the distance between nodes of each path in the first path set;

and adjusting the nodes and/or the node sequence passed by the path to be adjusted to obtain an adjusted fourth path set.

10. The method of claim 9, wherein determining the path to be adjusted according to the distance between the nodes of each path in the first set of paths comprises:

dividing all nodes on each path according to the distance between each node of each path in the first path set to obtain a plurality of cluster nodes and the central point of each cluster node;

determining the central point of each path contained in the target path set;

and determining the path to be adjusted according to the distance between the central point of each cluster node and the central point of each path.

11. The method according to claim 10, wherein the determining the path to be adjusted according to the distance between the center point of each cluster node and the center point of each path comprises:

determining a first distance between the center point of each cluster node and the center point of the path where the cluster node is located;

determining a second distance between the central point of each cluster node and the central point of each other path;

and under the condition that the first distance of any cluster node is greater than any second distance corresponding to the cluster node, determining the cluster node as a node to be adjusted, wherein the path where the node to be adjusted is located is the path to be adjusted.

12. The method according to claim 11, wherein the adjusting the nodes and/or the node sequence through which the path to be adjusted passes to obtain an adjusted fourth path set includes:

determining a path with the minimum second distance with the node to be adjusted as a target path;

deleting the node to be adjusted from the path to be adjusted and adding the node to be adjusted into the target path;

13. A training method of selecting a model, comprising:

determining first parameters of each path subset contained in a sample path set, wherein the sample path set has corresponding first path planning information;

determining a second parameter of each path subset by using a first selection model according to each path subset and corresponding second path planning information thereof; the second path planning information corresponding to the path subset comprises partial content of the first path planning information of the sample path set to which the path subset belongs; and

14. The method of claim 13, determining second parameters for each subset of paths using a first selection model based on the each subset of paths and its corresponding second path planning information, comprising:

15. A method for training a path planning model comprises the following steps:

inputting a path planning information sample into an encoder of a first path planning model to obtain a coding vector corresponding to the path planning information sample, wherein the coding vector comprises a characterization vector of at least one node in the path planning information sample;

iteratively executing the following steps until obtaining each second path included in the training path set: inputting the characterization vector of the node included in the second path into a decoder of the first path planning model to obtain a decoding vector of the second path; fusing the decoding vector of the second path and the coding vector corresponding to the path planning information sample based on an attention mechanism to obtain a next node in the second path; and

and training the first path planning model by using a gradient ascent method and the cost function of the training path set to obtain a trained second path planning model.

16. The method of claim 15, wherein the inputting the path planning information samples into an encoder of the first path planning model to obtain a corresponding code vector of the path planning information samples comprises:

converting path planning information samples through an input layer of the encoder into a first vector;

and sequentially processing the first vector through a multi-head attention mechanism layer, a first residual error and normalization layer, a feed-forward network layer, a second residual error and normalization layer of the encoder to obtain a characterization vector of at least one node in the path planning information sample.

17. The method of claim 16, wherein inputting the path planning information samples into an encoder of the first path planning model to obtain a corresponding code vector of the path planning information samples, further comprises:

and splicing the characterization vectors of a plurality of nodes in the path planning information sample through a full connection layer to obtain a coding vector corresponding to the path planning information sample.

18. A path planner, comprising:

a selection module, configured to select a target path set from the plurality of second path sets;

the optimization module is used for re-planning the paths according to the second path planning information corresponding to the target path set to obtain an optimized third path set; wherein the second path planning information includes partial information of the first path planning information; and

an update module to update the first set of paths with the third set of paths.

19. The apparatus of claim 18, further comprising:

20. The apparatus of claim 19, wherein the initial path planning module comprises:

the encoding submodule is used for inputting the first path planning information into an encoder of a path planning model to obtain an encoding vector corresponding to the first path planning information, wherein the encoding vector comprises a representation vector of at least one node in the first path planning information;

an iteration submodule, configured to iteratively execute the following steps until each first path included in the first path set is obtained: inputting the characterization vector of the node included in the first path into a decoder of the path planning model to obtain a decoding vector of the first path; and fusing the decoding vector of the first path and the coding vector corresponding to the planning information of the first path based on an attention mechanism to obtain a next node in the first path.

21. The apparatus of claim 20, the encoding submodule to:

22. The apparatus of claim 21, the encoding sub-module further to:

23. The apparatus of claim 18, wherein the selection module is to:

24. The apparatus of claim 23, wherein the plurality of second path sets comprise: n adjacent paths in the first path set; wherein N is an integer greater than 1, and N is smaller than M, and M is the number of paths in the first path set.

25. The apparatus of claim 18, wherein the optimization module is to:

26. The apparatus of claim 18, further comprising: an adjustment module, the adjustment module comprising:

the path determining submodule is used for determining a path to be adjusted according to the distance between nodes of each path in the first path set;

and the adjusting submodule is used for adjusting the nodes and/or the node sequence passed by the path to be adjusted to obtain an adjusted fourth path set.

27. The apparatus of claim 26, the path determination sub-module to:

determining the central point of each path contained in the target path set;

28. The apparatus of claim 27, wherein the path determination sub-module is further configured to:

29. The apparatus of claim 28, wherein the adjustment submodule is to:

30. A training apparatus for selecting a model, comprising:

a first determining module, configured to determine first parameters of path subsets included in a sample path set, where the sample path set has corresponding first path planning information;

a second determining module, configured to determine, according to each of the path subsets and second path planning information corresponding to the path subset, a second parameter of each of the path subsets by using the first selection model; the second path planning information corresponding to the path subset comprises partial content of the first path planning information of the sample path set to which the path subset belongs; and

31. A training apparatus for a path planning model, comprising:

the path planning information sample is input into an encoder of a first path planning model to obtain a coding vector corresponding to the path planning information sample, wherein the coding vector comprises a characterization vector of at least one node in the path planning information sample;

the iteration module is used for iteratively executing the following steps until each second path included in the training path set is obtained: inputting the characterization vector of the node included in the second path into a decoder of the planning model of the first path to obtain a decoding vector of the second path; fusing the decoding vector of the second path and the coding vector corresponding to the path planning information sample based on an attention mechanism to obtain a next node in the second path; and

and the second training module is used for training the first path planning model by utilizing a gradient ascent method and the cost function of the training path set to obtain a trained second path planning model.

32. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-17.

33. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-17.

34. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-17.