CN117873089A - Multi-mobile robot cooperation path planning method based on clustering PPO algorithm - Google Patents

Multi-mobile robot cooperation path planning method based on clustering PPO algorithm Download PDF

Info

Publication number
CN117873089A
CN117873089A CN202410036441.0A CN202410036441A CN117873089A CN 117873089 A CN117873089 A CN 117873089A CN 202410036441 A CN202410036441 A CN 202410036441A CN 117873089 A CN117873089 A CN 117873089A
Authority
CN
China
Prior art keywords
mobile robot
clustering
network
updating
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410036441.0A
Other languages
Chinese (zh)
Other versions
CN117873089B (en
Inventor
李骏
李马兵
夏鹏程
曾振平
于霄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202410036441.0A priority Critical patent/CN117873089B/en
Publication of CN117873089A publication Critical patent/CN117873089A/en
Application granted granted Critical
Publication of CN117873089B publication Critical patent/CN117873089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses a path planning method for cooperation of multiple mobile robots based on a clustering PPO algorithm, which comprises the following steps: s1, collecting position information of all targets, and cleaning and standardizing target data; s2, performing target node allocation by using a K-means clustering algorithm; s3, optimizing the path of each mobile robot by using a PPO algorithm; and S4, updating the strategy network by using a PPO algorithm, and stopping training when the strategy network is stable or reaches the preset iteration times. The method for planning the path by combining the clustering algorithm and the deep reinforcement learning algorithm provides a new thought for solving the path planning, and has important practical application value for improving the storage efficiency and reducing the transportation cost.

Description

Multi-mobile robot cooperation path planning method based on clustering PPO algorithm
Technical Field
The invention relates to the technical field of warehouse logistics in an industrial field, in particular to a path planning method for cooperation of multiple mobile robots based on a clustering PPO algorithm.
Background
In the field of warehouse logistics in an industrial field, the path planning of the mobile robot can improve the production efficiency and enhance the flexibility of a manufacturing system. Path planning involves the need to simultaneously consider avoiding obstacles, minimizing movement time, reducing energy consumption, and ensuring coordination and safety of the robotic system for the mobile robot in a given working environment.
With the rise of the industrial automation degree, more and more mobile robots are deployed in the same working area. This requires the path planning system to plan a path for each mobile robot, and also to minimize the completion time of the maximized task in consideration of the task load balancing between the mobile robots and the task.
The traditional solving path planning method such as an accurate algorithm and a heuristic algorithm has the problems of high computational complexity and difficult application to a large-scale scene, and solves the problem of load balancing between the mobile robot and the task.
Disclosure of Invention
The invention aims to provide a path planning method for cooperation of multiple mobile robots based on a clustering PPO algorithm, wherein a K-means clustering algorithm is used for distributing targets to the mobile robots, and then the PPO algorithm is used for optimizing the path of each mobile robot. By the method, the task completion solution close to the minimum maximum can be found while the solving efficiency is ensured. The method combining the clustering algorithm and the deep reinforcement learning algorithm provides a new thought for solving the path planning, and is expected to play an important role in practical application.
In order to achieve the above purpose, the present invention provides a path planning method for cooperation of multiple mobile robots based on a clustering PPO algorithm, comprising the following steps:
s1, collecting position information of all targets, and cleaning and standardizing target data;
s2, performing target node allocation by using a K-means clustering algorithm;
s3, optimizing the path of each mobile robot by using a PPO algorithm;
and S4, updating the strategy network by using a PPO algorithm, and stopping training when the strategy network is stable or reaches the preset iteration times.
Preferably, in step S1, the collected position information of all the targets is expressed as:
s=(x 1 ,x 2 ,…,x K );
the target data is cleaned, any missing value or abnormal value is processed, the target data coordinates are subjected to standardized processing, and the target node index is obtained and expressed as:
wherein node 0 represents the start point and the end point of each mobile robot.
Preferably, the step S2 specifically includes the following steps:
selecting a certain number of mobile robots and setting indexes of the mobile robots
S21, initializing a cluster center, distributing target nodes to each mobile robot by using a K-means clustering algorithm, and updating the cluster center according to a distribution result, wherein the cluster center is expressed as:
C(i)=argmin j∈{1,2,…,K} distance(x i ,y j )
wherein C (i) is the index of the cluster center to which target i is assigned, x i Is the coordinates of object i, y j Is the coordinate of the cluster center, distance () is the Euclidean distance between two points;
s22, updating a clustering result: according to the average position of all data points in the responsible cluster of each mobile robot, calculating a cluster center, wherein the calculation formula is as follows:
wherein L is j Is the coordinates of the cluster center, S j Is a set of targets assigned to the mobile robot, |S j I is set S j The number of intermediate nodes;
allocation matrix u= [ U ] nm ]Is an N M matrix, where u nm Representing the allocation of the nth destination node to the mth mobile robot, the allocation function is represented as:
s23, repeating the steps S21 and S22, stopping updating when the sum of squares of errors in the clusters is converged below a threshold value or reaches the preset iteration times, and checking the balance of target allocation.
Preferably, in step S3, the optimizing the path of each robot by using the PPO algorithm specifically includes:
a strategy network is initialized for each mobile robot, a path is generated for each mobile robot by updating the strategy network, and the return of each path is calculated according to the moving distance or the cost of the mobile robot.
Preferably, in step S3, the motion space of the mobile robot m is denoted as a m The state space is denoted as S m The reward function is denoted as r m The motion space of mobile robot m at time step t is expressed as:
the state space of the mobile robot m is expressed as:
wherein V is m ={n|u nm =1, i.e. the target node n is accessed by the mobile robot m, m e m,
the rewarding function of the mobile robot m is determined by the distance from the node of the current time step t to the node of the next time step node t+1 and multiple access penalties, and pi is defined m (t) represents the access policy of the mobile robot, then r m Expressed as:
r m =-distance(π m (t),π m (t+1))-λr collision (t);
cumulative bonus function R for mobile robot m m Then it is expressed as:
wherein distance represents the distance between two nodes, lambda is the multiple access penalty coefficient weight, r collision Penalizing for multiple accesses.
Preferably, step S4 comprises the steps of:
s41, utilizing the amplitude of gradient update of the near-end optimization clipping limiting strategy, and combining the dominance function and the importance sampling, wherein the objective function is expressed as:
wherein pi θ Indicating the policy that is currently in use,representing old strategy->Representing the dominance function, e being a super parameter for controlling the intensity of the near-end optimized cut;
s42, updating the parameters of the neural network by using a PPO algorithm, and initializing a strategy network pi θ (a t |s t ) Sum value networkWherein θ and->Parameters respectively representing a policy network and a value network;
updating parameters of the policy network by a gradient ascent method:
updating parameters of the value network by gradient descent:
and performing repeated iterative updating until the training times of the mobile robot reach the maximum, stopping training, and learning an optimal traversal node path strategy when the cumulative prize convergence of the mobile robot reaches the maximum.
Therefore, the path planning method for cooperation of the multiple mobile robots based on the clustering PPO algorithm has the following beneficial effects:
(1) According to the invention, through the clustering PPO algorithm, good balance between stability and sample efficiency can be obtained by limiting the step length of strategy updating, and targets can be rapidly and effectively distributed to the mobile robots, so that the task burden balance of each mobile robot is ensured, and the overall moving distance is reduced.
(2) According to the invention, the data set is divided into a pre-designated number of clusters (K) by the K-means clustering method, and samples are distributed to the nearest clusters by iterative optimization, so that the K-means clustering algorithm can be used for efficiently distributing targets to different mobile robots, and the task load balance of the mobile robots is ensured.
(3) The training method of the invention can avoid large fluctuation when updating the strategy, and ensure the stability of the learning process, which is particularly important for the complex path planning problem because of the large number of variables and constraint conditions involved.
(4) The invention can adapt to the path planning problems of different scales and complexity, and can maintain good performance even under the condition of more targets or more mobile robots.
(5) According to the invention, the target allocation and the path optimization are processed separately, so that the calculation cost is reduced, and the solution of the large-scale path planning problem becomes more feasible.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is an overall flow chart of an embodiment of a path planning method for multi-mobile robot collaboration based on a clustered PPO algorithm of the present invention;
FIG. 2 is a graph of clustering partition results of target nodes of an embodiment of a path planning method for multi-mobile robot cooperation based on a clustering PPO algorithm;
FIG. 3 is an optimal path diagram of each industrial mobile robot of an embodiment of a path planning method for multi-mobile robot cooperation based on a clustered PPO algorithm of the present invention;
FIG. 4 is a training result diagram of an embodiment of a path planning method for multi-mobile robot cooperation based on PPO algorithm of the present invention;
fig. 5 is a training result diagram of an embodiment of a path planning method of multi-mobile robot cooperation based on a clustering PPO algorithm of the present invention.
Detailed Description
The technical scheme of the invention is further described below through the attached drawings and the embodiments.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.
Under the condition of considering complex target environment and congestion, the method combines a clustering algorithm and a deep reinforcement learning algorithm, is applied to the path planning problem of the multi-industry mobile robot, so as to optimize the path selection of the mobile robot and improve the overall transportation efficiency, and aims at finding a group of paths on the premise of meeting specific constraint conditions, so that the minimum and maximum task completion time of the mobile robot is ensured.
As shown in fig. 1, the path planning method for multi-mobile robot cooperation based on the clustering PPO algorithm comprises the following steps:
s1, collecting position information of all targets, including data such as coordinates, and expressing the collected position information of all targets as:
s=(x 1 ,x 2 ,…,x K );
the target data is cleaned, any missing value or abnormal value is processed, the target data coordinates are subjected to standardized processing, the data is ensured to be on the same scale, and the target node index is obtained and expressed as:
wherein node 0 represents the start point and the end point of each mobile robot.
S2, as shown in FIG. 2, the areas with different colors represent areas responsible for each industrial mobile robot, and target node allocation is performed by using a K-means clustering algorithm, wherein the step S2 specifically comprises the following steps:
selecting a certain number of mobile robots to setIndexing of mobile robots
S21, initializing a cluster center, distributing target nodes to each mobile robot by using a K-means clustering algorithm, and updating the cluster center according to a distribution result, wherein the cluster center is expressed as:
C(i)=argmin j∈{1,2,…,K} distance(x i ,y j )
wherein C (i) is the index of the cluster center to which target i is assigned, x i Is the coordinates of object i, y j Is the coordinate of the cluster center, distance () is the Euclidean distance between two points;
s22, updating a clustering result: according to the average position of all data points in the responsible cluster of each mobile robot, calculating a cluster center, wherein the calculation formula is as follows:
wherein L is j Is the coordinates of the cluster center, S j Is a set of targets assigned to the mobile robot, |S j I is set S j The number of intermediate nodes;
allocation matrix u= [ U ] nm ]Is an N M matrix, where u nm Representing the allocation of the nth destination node to the mth mobile robot, the allocation function is represented as:
s23, repeating the steps S21 and S22, continuously iterating the allocation process, stopping updating when the sum of squares of errors in the clusters is converged below a threshold value or the preset iteration times are reached, and checking the balance of target allocation to ensure that no mobile robot is overloaded or overloaded.
S3, optimizing the path of each mobile robot by using a PPO algorithm;
a strategy network is initialized for each mobile robot, a path is generated for each mobile robot by updating the strategy network, and the return of each path is calculated according to the moving distance or the cost of the mobile robot.
The motion space of the mobile robot m is denoted as A m The state space is denoted as S m The reward function is denoted as r m The motion space of mobile robot m at time step t is expressed as:
it is indicated that mobile robot m can pick a node in its own set of targets, this formula indicates that at time step t, the set of actions of mobile robot m consists of the target nodes it will access at time step t+1, note that here k=t+1, since it is assumed that mobile robot is already at target s at time step t km And ready to move to the next destination.
The state space of the mobile robot m is expressed as:
wherein V is m ={n|u nm =1, i.e. the target node n is accessed by the mobile robot M, M e M,
the rewarding function of the mobile robot m is determined by the distance from the node of the current time step t to the node of the next time step node t+1 and multiple access penalties, and pi is defined m (t) represents the access policy of the mobile robot, then r m Expressed as:
r m =-distance(π m (t),π m (t+1))-λr collision (t);
cumulative bonus function R for mobile robot m m Then it is expressed as:
wherein distance represents the distance between two nodes, lambda is the multiple access penalty coefficient weight, r collision Penalizing for multiple accesses.
S4, updating the strategy network by using a PPO algorithm, and stopping training when the strategy network is stable or reaches the preset iteration times;
each mobile robot initializes its own strategy network and value network and interacts information with the environment, estimates the optimal action according to the strategy network, and outputs a state value according to the value network.
The policy network requires the output of the value network to calculate the merit function, while the value network requires the data generated by the policy network to calculate the state value. This dependency means that the performance of the two networks can affect each other during the training process.
The policy network and value network structure includes an input layer, two hidden layers, and an output layer. The input layer of the policy network and the value network takes as input the state of the mobile robot, the dimensions of which depend on the size of the mobile robot state space. The strategy network and the value network share two hidden layers, wherein the two layers are all connected layers, a ReLU function is adopted as an activation function between all layers, and then the two layers are crossed into respective output layers, and the number of neurons of the hidden layers is customized according to the complexity of the problem; the policy network outputs a softmax layer equal in size to the size of the action space, representing the probability distribution of the action. The value network outputs a scalar quantity representing the state value estimation, thereby completing the construction of the strategy network and the value network of the mobile robot.
The specific implementation steps are as follows:
s41, utilizing the amplitude of gradient update of the near-end optimization clipping limiting strategy, and combining the dominance function and the importance sampling, wherein the objective function is expressed as:
wherein pi θ Indicating the policy that is currently in use,representing old strategy->Representing the dominance function, e being a super parameter for controlling the intensity of the near-end optimized cut;
s42, the core of the PPO algorithm is to update the parameters of the neural network and initialize the policy network pi θ (a t |s t ) Sum value networkWherein θ and->Parameters respectively representing a policy network and a value network;
updating parameters of the policy network by a gradient ascent method:
updating parameters of the value network by gradient descent:
and performing repeated iterative updating until the training time of the mobile robot is the maximum training time, and stopping training. As shown in fig. 3, the paths of different colors represent the movement paths of each industrial mobile robot, and when the cumulative prize convergence of the mobile robot reaches the maximum value, the optimal traversing node path strategy is learned.
Examples
Python programming simulation is carried out under a computer with an operating system of Windows 11, and the specific scene is as follows:
representing path topology states of target clusters as graphsWherein-> Representing node set,/->Edge e is the edge set i,j Epsilon represents the path of the mobile robot from target node i to target node j.
According to the cluster partition model, the topology of the graph G is dynamically changed. After the division is completed, each mobile robot has its own target node expressed asThe time is discretized into time slots T epsilon {1,2, …, T }, each mobile robot selects a target node in own node set, the target node is saved in own history information, and each time slot can only select a node once.
After the target data nodes are cleaned, normalized and clustered, the target data nodes enter a reinforcement learning module for training, and in the subsequent simulation process, when the training times of the mobile robot reach the maximum training times, the simulation is ended. The specific simulation parameters are shown in table 1:
TABLE 1 schematic representation of main simulation parameters
Target area 100m×100m
Target node number 50
Number of mobile robots 5
Maximum training times 10000
As shown in fig. 4 and 5, the average cumulative prize is relatively low because the initial model has not yet learned sufficiently. As training time increases, the average jackpot also continues to rise, eventually tending to stabilize. More specifically, by comparison, it can be found that the performance of the clustering-based PPO algorithm is superior to the PPO algorithm in both the convergence rate and the final stable value of the average jackpot.
Therefore, the path planning method based on the clustering PPO algorithm and used for cooperation of the multiple mobile robots is applied to the field of storage logistics in an industrial field, can reduce the network training cost, ensures the stability of the learning process and reduces the overall task completion time.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims (6)

1. The path planning method for cooperation of the multiple mobile robots based on the clustering PPO algorithm is characterized by comprising the following steps:
s1, collecting position information of all targets, and cleaning and standardizing target data;
s2, performing target node allocation by using a K-means clustering algorithm;
s3, optimizing the path of each mobile robot by using a PPO algorithm;
and S4, updating the strategy network by using a PPO algorithm, and stopping training when the strategy network is stable or reaches the preset iteration times.
2. The path planning method for cooperation of multiple mobile robots based on clustering PPO algorithm according to claim 1, wherein: in step S1, the collected position information of all the targets is expressed as:
s=(x 1 ,x 2 ,…,x K );
the target data is cleaned, any missing value or abnormal value is processed, the target data coordinates are subjected to standardized processing, and the target node index is obtained and expressed as:
wherein node 0 represents the start point and the end point of each mobile robot.
3. The path planning method for cooperation of multiple mobile robots based on the clustering PPO algorithm according to claim 1, wherein step S2 specifically comprises the steps of:
selecting a certain number of mobile robots and setting indexes of the mobile robots
S21, initializing a cluster center, distributing target nodes to each mobile robot by using a K-means clustering algorithm, and updating the cluster center according to a distribution result, wherein the cluster center is expressed as:
C(i)=argmin j∈{1,2,…,K} distance(x i ,y j )
wherein C (i) is the index of the cluster center to which target i is assigned, x i Is the coordinates of object i, y j Is the coordinate of the cluster center, distance () is the Euclidean distance between two points;
s22, updating a clustering result: according to the average position of all data points in the responsible cluster of each mobile robot, calculating a cluster center, wherein the calculation formula is as follows:
wherein L is j Is the coordinates of the cluster center, S j Is a set of targets assigned to the mobile robot, |S j I is set S j The number of intermediate nodes;
allocation matrix u= [ U ] nm ]Is an N M matrix, where u nm Representing the allocation of the nth destination node to the mth mobile robot, the allocation function is represented as:
s23, repeating the steps S21 and S22, stopping updating when the sum of squares of errors in the clusters is converged below a threshold value or reaches the preset iteration times, and checking the balance of target allocation.
4. The path planning method for cooperation of multiple mobile robots based on the clustering PPO algorithm according to claim 1, wherein in step S3, the optimizing the path of each robot by using the PPO algorithm is specifically:
a strategy network is initialized for each mobile robot, a path is generated for each mobile robot by updating the strategy network, and the return of each path is calculated according to the moving distance or the cost of the mobile robot.
5. The path planning method for cooperation of multiple mobile robots based on clustering PPO algorithm according to claim 4, wherein: in step S3, the motion space of the mobile robot m is denoted as a m The state space is denoted as S m The reward function is denoted as r m The motion space of mobile robot m at time step t is expressed as:
the state space of the mobile robot m is expressed as:
wherein V is m ={n|u nm =1, i.e. the target node n is accessed by the mobile robot M, M e M,
the rewarding function of the mobile robot m is determined by the distance from the node of the current time step t to the node of the next time step node t+1 and multiple access penalties, and pi is defined m (t) represents the access policy of the mobile robot, then r m Expressed as:
r m =-distance(π m (t),π m (t+1))-λr collision (t);
cumulative bonus function R for mobile robot m m Then it is expressed as:
wherein distance represents the distance between two nodes, lambda is the multiple access penalty coefficient weight, r collision Penalizing for multiple accesses.
6. The path planning method for cooperation of multiple mobile robots based on the clustering PPO algorithm according to claim 1, wherein step S4 comprises the steps of:
s41, utilizing the amplitude of gradient update of the near-end optimization clipping limiting strategy, and combining the dominance function and the importance sampling, wherein the objective function is expressed as:
wherein pi θ Indicating the policy that is currently in use,representing old strategy->Representing the dominance function, e being a super parameter for controlling the intensity of the near-end optimized cut;
s42, updating the parameters of the neural network by using a PPO algorithm, and initializing a strategy network pi θ (a t |s t ) Sum value networkWherein θ and->Parameters respectively representing a policy network and a value network;
updating parameters of the policy network by a gradient ascent method:
updating parameters of the value network by gradient descent:
and performing repeated iterative updating until the training times of the mobile robot reach the maximum, stopping training, and learning an optimal traversal node path strategy when the cumulative prize convergence of the mobile robot reaches the maximum.
CN202410036441.0A 2024-01-10 2024-01-10 Multi-mobile robot cooperation path planning method based on clustering PPO algorithm Active CN117873089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410036441.0A CN117873089B (en) 2024-01-10 2024-01-10 Multi-mobile robot cooperation path planning method based on clustering PPO algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410036441.0A CN117873089B (en) 2024-01-10 2024-01-10 Multi-mobile robot cooperation path planning method based on clustering PPO algorithm

Publications (2)

Publication Number Publication Date
CN117873089A true CN117873089A (en) 2024-04-12
CN117873089B CN117873089B (en) 2024-08-02

Family

ID=90592859

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410036441.0A Active CN117873089B (en) 2024-01-10 2024-01-10 Multi-mobile robot cooperation path planning method based on clustering PPO algorithm

Country Status (1)

Country Link
CN (1) CN117873089B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113281993A (en) * 2021-05-11 2021-08-20 北京理工大学 Greedy K-mean self-organizing neural network multi-robot path planning method
US20210341904A1 (en) * 2020-04-30 2021-11-04 Robert Bosch Gmbh Device and method for controlling a robot
CN114237235A (en) * 2021-12-02 2022-03-25 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114905510A (en) * 2022-04-29 2022-08-16 南京邮电大学 Robot action method based on adaptive near-end optimization
CN115373415A (en) * 2022-07-26 2022-11-22 西安电子科技大学 Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN116858241A (en) * 2023-07-01 2023-10-10 中国人民解放军空军勤务学院 Application method of mobile robot in reinforcement learning of matching network
CN117213497A (en) * 2023-10-10 2023-12-12 北京理工大学 AGV global path planning method based on deep reinforcement learning
CN117289691A (en) * 2023-04-12 2023-12-26 西交利物浦大学 Training method for path planning agent for reinforcement learning in navigation scene

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210341904A1 (en) * 2020-04-30 2021-11-04 Robert Bosch Gmbh Device and method for controlling a robot
CN113281993A (en) * 2021-05-11 2021-08-20 北京理工大学 Greedy K-mean self-organizing neural network multi-robot path planning method
CN114237235A (en) * 2021-12-02 2022-03-25 之江实验室 Mobile robot obstacle avoidance method based on deep reinforcement learning
CN114905510A (en) * 2022-04-29 2022-08-16 南京邮电大学 Robot action method based on adaptive near-end optimization
CN115373415A (en) * 2022-07-26 2022-11-22 西安电子科技大学 Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN117289691A (en) * 2023-04-12 2023-12-26 西交利物浦大学 Training method for path planning agent for reinforcement learning in navigation scene
CN116858241A (en) * 2023-07-01 2023-10-10 中国人民解放军空军勤务学院 Application method of mobile robot in reinforcement learning of matching network
CN117213497A (en) * 2023-10-10 2023-12-12 北京理工大学 AGV global path planning method based on deep reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
童亮;王准;: "强化学习在机器人路径规划中的应用研究", 计算机仿真, vol. 30, no. 12, 15 December 2013 (2013-12-15), pages 351 - 364 *

Also Published As

Publication number Publication date
CN117873089B (en) 2024-08-02

Similar Documents

Publication Publication Date Title
Tang et al. A novel hierarchical soft actor-critic algorithm for multi-logistics robots task allocation
Kamoshida et al. Acquisition of automated guided vehicle route planning policy using deep reinforcement learning
CN106651086B (en) Automatic stereoscopic warehouse scheduling method considering assembly process
CN113051815B (en) Agile imaging satellite task planning method based on independent pointer network
Chen et al. Research on an improved ant colony algorithm fusion with genetic algorithm for route planning
CN115145285A (en) Multi-point goods taking and delivering optimal path planning method and system for storage AGV
Feng et al. Flexible job shop scheduling based on deep reinforcement learning
Fuji et al. Deep multi-agent reinforcement learning using dnn-weight evolution to optimize supply chain performance
CN117669992B (en) Intelligent storage multi-mobile robot-oriented real-time two-stage scheduling method and system
Lin et al. Development of new features of ant colony optimization for flowshop scheduling
CN117873089B (en) Multi-mobile robot cooperation path planning method based on clustering PPO algorithm
Deng et al. Solving the Food-Energy-Water Nexus Problem via Intelligent Optimization Algorithms
CN117495052A (en) Multi-agricultural machine multi-task scheduling method driven by reinforcement learning and genetic algorithm fusion
CN117075634A (en) Power distribution network multi-unmanned aerial vehicle scheduling inspection method and device based on improved ant colony algorithm
CN112486185A (en) Path planning method based on ant colony and VO algorithm in unknown environment
CN116797116A (en) Reinforced learning road network load balancing scheduling method based on improved reward and punishment mechanism
CN117361013A (en) Multi-machine shelf storage scheduling method based on deep reinforcement learning
CN115755801A (en) SQP-CS-based ship building workshop process optimization method and system
Botzheim et al. Genetic and bacterial programming for B-spline neural networks design
Yu et al. A novel automated guided vehicle (AGV) remote path planning based on RLACA algorithm in 5G environment
Zhao Multiple-Agent Task Allocation Algorithm Utilizing Ant Colony Optimization.
Chreim et al. AI-agent-based modeling for Supervision a System of Systems for Mushroom Harvesting
Saeheaw et al. Application of Ant colony optimization for Multi-objective Production Problems
Ma et al. Improved DRL-based energy-efficient UAV control for maximum lifecycle
CN116718198B (en) Unmanned aerial vehicle cluster path planning method and system based on time sequence knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant