CN114815834A - A dynamic path planning method for mobile agents in a stage environment - Google Patents

A dynamic path planning method for mobile agents in a stage environment Download PDF

Info

Publication number
CN114815834A
CN114815834A CN202210465123.7A CN202210465123A CN114815834A CN 114815834 A CN114815834 A CN 114815834A CN 202210465123 A CN202210465123 A CN 202210465123A CN 114815834 A CN114815834 A CN 114815834A
Authority
CN
China
Prior art keywords
mobile agent
dynamic
obstacles
state
obstacle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210465123.7A
Other languages
Chinese (zh)
Other versions
CN114815834B (en
Inventor
刘安东
张柏鑫
倪洪杰
曹瀚仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210465123.7A priority Critical patent/CN114815834B/en
Publication of CN114815834A publication Critical patent/CN114815834A/en
Application granted granted Critical
Publication of CN114815834B publication Critical patent/CN114815834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a dynamic path planning method for a mobile intelligent agent in a stage environment, belonging to the technical field of path planning of intelligent robots; the method comprises the steps of firstly, obtaining obstacle information around a mobile intelligent body by constructing a global map, classifying obstacles into dynamic obstacles and static obstacles, then establishing a local map, coding the dynamic obstacle information through an LSTM network, and calculating the importance of each dynamic obstacle through a social attention mechanism to achieve better obstacle avoidance. Different avoiding conditions of the dynamic and static obstacles are responded by constructing a new reward function, so that the path planning problem of the mobile intelligent body under the complex stage environment is realized. A new experience pool updating method is provided to improve the convergence speed of network training, and meanwhile, simulation experiments are carried out on the method provided by the invention to prove the superiority of the algorithm, so that the method has very high practical value.

Description

一种舞台环境下的移动智能体动态路径规划方法A dynamic path planning method for mobile agents in a stage environment

技术领域technical field

本发明涉及智能机器人路径规划技术领域,具体涉及一种舞台环境下的移动智能体动态路径规划方法。The invention relates to the technical field of intelligent robot path planning, in particular to a dynamic path planning method for a mobile intelligent body in a stage environment.

背景技术Background technique

为了满足基层文化多样化服务需求,需要在基层小型文化服务综合体活动空间中开展文化演出、会议议事、展览阅览以及民俗活动等文化服务功能。有分类配置的文化设施不能满足居民对综合文化需求的期望。小型文化服务综合体可以较好的解决该问题,其着力点是推行多功能活动空间建设,它可以集成民俗活动、展览、会议、阅览等功能,形成一体化的场馆服务载体,既满足了农村基层文化的多样性和自组织性的需求,又探索出一条满足我国新农村公共文化服务要求的新模式。In order to meet the diverse service needs of grassroots culture, it is necessary to carry out cultural service functions such as cultural performances, conference discussions, exhibition reading and folk activities in the activity space of grassroots small cultural service complexes. Cultural facilities with classified configuration cannot meet residents' expectations for comprehensive cultural needs. Small cultural service complexes can better solve this problem. Its focus is to promote the construction of multi-functional activity spaces. It can integrate folk activities, exhibitions, conferences, reading and other functions to form an integrated venue service carrier, which not only meets the needs of rural areas. The diversity and self-organization needs of grassroots culture have also explored a new model to meet the requirements of public cultural services in my country's new rural areas.

为了减少土地资源的浪费,提高空间利用效率,就需要各种智能移动体协助完成多种功能空间相互快速组合以及切换,因服务空间不同,其功能空间内的配置设施和使用要求也不同,为了达到小型文化综合体“一厅多用”要求,往往要通过智能移动体协助完成多种功能空间相互快速组合以及切换,实现在小型综合体空间内拥挤环境下的动态路径规划等,从而满足单一空间多种文化服务需求。In order to reduce the waste of land resources and improve the efficiency of space utilization, various intelligent mobile bodies are required to assist in the rapid combination and switching of various functional spaces. To meet the requirements of "multi-purpose in one hall" in small cultural complexes, it is often necessary to use intelligent mobile bodies to assist in the rapid combination and switching of multiple functional spaces, so as to realize dynamic path planning in a crowded environment in small complex spaces, so as to meet the needs of a single space. A variety of cultural service needs.

小型文化综合体空间是一个典型的人、机、物共存环境,在功能空间切换过程中,空间内多个物体装备需要有其他装备物情况下进行轨迹规划运动,实现空间功能的切换服务,所以在切换过程中如何快速地躲避动静态障碍物,到达目标点,需要我们设计动态路径规划算法控制智能移动体,且文化综合体空间内环境拥挤,对动态路径规划算法要求较高。The small cultural complex space is a typical coexistence environment of people, machines and objects. In the process of functional space switching, multiple objects and equipment in the space need to have other equipment to carry out trajectory planning motion to realize the switching service of space functions, so How to quickly avoid dynamic and static obstacles and reach the target point in the switching process requires us to design a dynamic path planning algorithm to control the intelligent moving body, and the environment in the cultural complex space is crowded, so the dynamic path planning algorithm has high requirements.

传统的动态路径规划算法依赖于传感器的快速刷新来感知周围障碍物的信息,规划出来的路径也会随着动态障碍物的变化,出现绕路或者不自然轨迹的问题。不能够预测周围动态障碍物的运动趋势,缺乏适应性。对于以上问题,亟需提出一种可以区分动静态障碍物以及预测动态障碍物趋势的动态路径规划方法。The traditional dynamic path planning algorithm relies on the rapid refresh of the sensor to perceive the information of the surrounding obstacles, and the planned path will also change with the dynamic obstacles, resulting in the problem of detours or unnatural trajectories. It is unable to predict the movement trend of surrounding dynamic obstacles and lacks adaptability. For the above problems, it is urgent to propose a dynamic path planning method that can distinguish between dynamic and static obstacles and predict the trend of dynamic obstacles.

发明内容SUMMARY OF THE INVENTION

针对现有技术中存在的问题,本发明的目的在于提供一种舞台环境下的移动智能体动态路径规划方法,该方法通过在深度强化学习方法的基础上设计了新的马尔可夫决策过程和网络结构,通过引入社会注意力机制给动态障碍物添加注意力分数,使用长短期记忆神经网络来解决前馈网络维数不固定的问题,构建新的奖励函数来应对动静态障碍物的不同躲避情况,提出新的经验池更新方法提高网络训练的收敛速度,从而让移动智能体实现区分动静态障碍物以及预测动态障碍物的运动趋势。为了实现上述目的,本发明采用的技术方案如下:Aiming at the problems existing in the prior art, the purpose of the present invention is to provide a dynamic path planning method for a mobile agent in a stage environment. The method designs a new Markov decision process and Network structure, add attention score to dynamic obstacles by introducing social attention mechanism, use long short-term memory neural network to solve the problem of unfixed dimension of feedforward network, and build a new reward function to deal with different avoidance of dynamic and static obstacles Therefore, a new experience pool update method is proposed to improve the convergence speed of network training, so that the mobile agent can distinguish between dynamic and static obstacles and predict the motion trend of dynamic obstacles. In order to achieve the above object, the technical scheme adopted in the present invention is as follows:

一种舞台环境下的移动智能体动态路径规划方法,包括以下步骤:A dynamic path planning method for a mobile agent in a stage environment, comprising the following steps:

1)基于gym库建立移动智能体和动静态障碍物的仿真环境模型;1) Establish a simulation environment model of mobile agents and dynamic and static obstacles based on the gym library;

2)设计马尔代夫决策过程,设计状态空间S、动作空间A、转移概率P、奖励R和折扣因子γ;2) Design the Maldives decision-making process, design the state space S, the action space A, the transition probability P, the reward R and the discount factor γ;

3)设计神经网络结构;3) Design the neural network structure;

4)使用最佳互惠碰撞避免算法(ORCA),通过模仿学习预训练来初始化网络参数;模仿学习结束之后然后通过移动智能体在仿真环境下的实际交互进行训练来优化网络参数;4) Use the optimal reciprocal collision avoidance algorithm (ORCA) to initialize the network parameters through imitation learning pre-training; after the imitation learning is over, the network parameters are optimized by training the actual interaction of the mobile agent in the simulation environment;

5)通过自适应时刻估计方法(Adam)训练神经网络得到最优值函数:5) The optimal value function is obtained by training the neural network through the adaptive time estimation method (Adam):

V*(ut)=∑γΔt·Vpref·P(ut,at)V * (u t )=∑γΔt ·Vpref ·P( u t ,at )

6)通过最大化累计回报来设定最优策略:6) Set the optimal strategy by maximizing the cumulative return:

Figure BDA0003623628980000031
Figure BDA0003623628980000031

其中,ut表示当前移动智能体和障碍物的联合状态,at表示动作空间的集合,γ表示衰减因子,Δt表示两个动作之间的时间间隔,Vpref表示首选速度,V*表示在最优值函数,P表示为状态转移函数,R表示为奖励函数;

Figure BDA0003623628980000032
表示下一时刻的联合状态where ut represents the joint state of the current mobile agent and the obstacle, at represents the set of action spaces, γ represents the decay factor, Δt represents the time interval between two actions, Vpref represents the preferred speed, and V * represents the Merit function, P is the state transition function, R is the reward function;
Figure BDA0003623628980000032
Represents the joint state at the next moment

7)根据最优策略来选择当前时刻的动作at直到移动智能体到达目标。7) Select the action at the current moment according to the optimal strategy until the mobile agent reaches the target.

进一步的,所述步骤1)中将移动智能体和动态障碍物设定为半径为0.3米的圆,而将静态障碍物定义为半径在0.5米到1米之间的圆形或者为面积在1平方米到1.5平方米之间的四边形。Further, in the step 1), the mobile agent and the dynamic obstacle are set as a circle with a radius of 0.3 meters, and the static obstacle is defined as a circle with a radius of 0.5 meters to 1 meter or an area of Quadrilaterals between 1 square meter and 1.5 square meters.

进一步的,所述步骤2)中,设定状态空间S,其中动态障碍物的状态为SD=[Px,Py,Vx,Vy,r,Vpref]、静态障碍物的状态为SS=[Px,Py,r],移动智能体的状态为ST=[Px,Py,Gx,Gy,Vx,Vy,θ,r,Vpref]联合状态ut=[ST,SS,SD].其中(Px,Py)为移动智能体和动静态障碍物的当前位置,(Gx,Gy)为所设定的目标点的位置,θ为移动智能体的航向角,r为移动智能体和动静态障碍物的半径大小,Vpref为移动智能体的首选速度,(Vx,Vy)为移动智能体和动态障碍物的移动速度;Further, in the step 2), the state space S is set, wherein the state of the dynamic obstacle is S D =[P x ,P y ,V x ,V y ,r,V pref ], the state of the static obstacle is S S =[P x ,P y ,r], the state of the mobile agent is S T =[P x ,P y ,G x ,G y ,V x ,V y ,θ,r,V pref ] joint State u t =[S T , S S , S D ]. Among them (P x , P y ) are the current positions of the mobile agent and static and dynamic obstacles, and (G x , G y ) are the set target points , θ is the heading angle of the mobile agent, r is the radius of the mobile agent and the dynamic and static obstacles, V pref is the preferred speed of the mobile agent, (V x , V y ) is the mobile agent and the dynamic obstacle the moving speed of the object;

动作空间A为线速度和角速度,为了符合动力学约束,角速度分成18等分在[-π/4,π/4]区间内,线速度按照函数

Figure BDA0003623628980000041
x取1,2,3,4,5可获得5个变化平滑的线速度,动作空间共有90中动作组合;The action space A is the linear velocity and the angular velocity. In order to meet the dynamic constraints, the angular velocity is divided into 18 equal parts in the interval [-π/4, π/4], and the linear velocity follows the function
Figure BDA0003623628980000041
Take 1, 2, 3, 4, and 5 for x to obtain 5 linear velocities with smooth changes. There are 90 action combinations in the action space;

转移概率P通过轨迹预测模型来近似计算;The transition probability P is approximated by the trajectory prediction model;

Figure BDA0003623628980000042
Figure BDA0003623628980000042

奖励R设置为:The reward R is set as:

其中Gx,y是目标点的位置信息,Px,y是移动智能体的当前位置信息,ds是移动智能体和静态障碍物之间的距离,dd是移动智能体和动态障碍物之间的距离;折扣因子γ取0.9。where G x, y is the position information of the target point, P x, y is the current position information of the mobile agent, d s is the distance between the mobile agent and the static obstacle, and d d is the mobile agent and the dynamic obstacle The distance between; the discount factor γ is taken as 0.9.

进一步的,所述步骤3)中的网络结构由以下模块组成:1、输入层:输入层即为上述步骤而中的联合状态ut=[ST,SS,SD]。2、长短期记忆神经网络模块(LSTM):通过LSTM模块可以将移动智能体周围的障碍物排序,并且可以固定网络层输出参数。3、社会注意力机制:通过社会注意力机制模块可以分析出移动智能体与周围动态障碍物发生碰撞的概率,并且以分数的形式展现出来。4、输出层:输出层通过对网络参数的加权线性组合输出最优值函数V*(ut)。Further, the network structure in the step 3) is composed of the following modules: 1. Input layer: the input layer is the joint state u t =[ST , S S , S D ] in the above steps. 2. Long Short-Term Memory Neural Network Module (LSTM): Through the LSTM module, the obstacles around the mobile agent can be sorted, and the output parameters of the network layer can be fixed. 3. Social attention mechanism: Through the social attention mechanism module, the probability of collision between the mobile agent and the surrounding dynamic obstacles can be analyzed and displayed in the form of scores. 4. Output layer: The output layer outputs the optimal value function V * (u t ) through a weighted linear combination of network parameters.

进一步的,所述步骤3)中网络运行流程如下:首先将移动智能体和障碍物的状态信息输入进网络,然后根据状态信息将障碍物分为动态障碍物和静态障碍物,将移动智能体的状态和动态障碍物的状态输入LSTM模块,再输入进社会注意力机制模块,再将经过处理的状态、得到的交互特征和静态障碍物状态输入两层全连接层,最后通过激活函数对其进行归一化处理得到最优值函数。Further, the network operation process in the step 3) is as follows: first, input the state information of the mobile agent and the obstacle into the network, and then divide the obstacles into dynamic obstacles and static obstacles according to the state information, and then divide the mobile agent into a dynamic obstacle and a static obstacle. The state of the dynamic obstacle and the state of the dynamic obstacle are input into the LSTM module, and then into the social attention mechanism module, and then the processed state, the obtained interaction feature and the state of the static obstacle are input into the two-layer fully-connected layer, and finally the activation function is used to adjust it. Perform normalization to get the optimal value function.

进一步的,所述步骤4)中移动智能体在仿真环境下交互时将当前的状态信息、动作信息和奖励信息作为一条经验存储到经验池中,当经验达到最大容量时,将新的经验取代奖励低的旧的经验存储,从而提高选取优秀经验的概率,提高网络的收敛速度。在每一集交互过程中当移动智能体碰到障碍物或者超过单次集运行的最大时间时结束当前集。然后将经验通过梯度方向传播来更新网络参数。Further, in the step 4), when the mobile agent interacts in the simulation environment, the current state information, action information and reward information are stored in the experience pool as an experience, and when the experience reaches the maximum capacity, the new experience replaces the reward. Low old experience storage, thereby increasing the probability of selecting excellent experience and improving the convergence speed of the network. During each episode of interaction, the current episode ends when the mobile agent encounters an obstacle or exceeds the maximum time for a single episode run. The experience is then propagated through the gradient direction to update the network parameters.

本发明有益效果是:设计了新的马尔可夫决策过程来适应综合体空间内障碍物复杂的情况,设计了新的网络结构,来实现对动静态障碍物分类处理,实现对动态障碍物的预测,设计了新的奖励函数来应对不同障碍物的情况,提出新的经验池更新方法来提高神经网络的训练效率,使用模仿学习来对网络进行预训练,提高网络的收敛速度;从而让移动智能体在综合体空间内实现高效的动态规划方法。The beneficial effects of the invention are as follows: a new Markov decision-making process is designed to adapt to the complex situation of obstacles in the complex space, and a new network structure is designed to realize the classification and processing of dynamic and static obstacles, and realize the detection of dynamic obstacles. Prediction, a new reward function is designed to deal with different obstacles, a new experience pool update method is proposed to improve the training efficiency of the neural network, imitation learning is used to pre-train the network, and the convergence speed of the network is improved; The agent implements an efficient dynamic programming method in the complex space.

附图说明Description of drawings

图1是本发明实施例的方法实现流程图;Fig. 1 is the method realization flow chart of the embodiment of the present invention;

图2是本发明实施例中网络结构图;Fig. 2 is a network structure diagram in an embodiment of the present invention;

图3是本发明实施例中仿真结果图;Fig. 3 is the simulation result diagram in the embodiment of the present invention;

图4是本发明实施例中网络训练总奖励图。FIG. 4 is a graph of the total reward of network training in an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图及具体实施例,对本发明作进一步的详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

本发明的目的在于提供一种舞台环境下的移动智能体动态路径规划方法,该方法通过在深度强化学习方法的基础上设计了新的马尔可夫决策过程和网络结构,通过引入社会注意力机制给动态障碍物添加注意力分数,使用长短期记忆神经网络来解决前馈网络维数不固定的问题,构建新的奖励函数来应对动静态障碍物的不同躲避情况,提出新的经验池更新方法提高网络训练的收敛速度,从而让移动智能体实现区分动静态障碍物以及预测动态障碍物的运动趋势。The purpose of the present invention is to provide a dynamic path planning method for a mobile agent in a stage environment. The method designs a new Markov decision process and network structure on the basis of a deep reinforcement learning method, and introduces a social attention mechanism. Add attention scores to dynamic obstacles, use long short-term memory neural network to solve the problem that the dimension of feedforward network is not fixed, build a new reward function to deal with different avoidance situations of dynamic and static obstacles, and propose a new experience pool update method Improve the convergence speed of network training, so that the mobile agent can distinguish between dynamic and static obstacles and predict the movement trend of dynamic obstacles.

在本实施例中,仿真环境如图3所示,设规划地图的范围为10*10,路径规划的起始点为(0,-10),目标点为(0,7.5),静态障碍物位置随机分布,为长方形或者正方形,动态障碍物为半径大小为0.5的圆形。In this embodiment, the simulation environment is shown in Figure 3. The range of the planning map is set to 10*10, the starting point of the path planning is (0,-10), the target point is (0,7.5), and the static obstacle position Random distribution, rectangular or square, dynamic obstacles are circles with a radius of 0.5.

一种舞台环境下的移动智能体动态路径规划算法,具体步骤如下:A dynamic path planning algorithm for mobile agents in a stage environment, the specific steps are as follows:

1)基于gym库建立移动智能体和动静态障碍物的仿真环境模型;1) Establish a simulation environment model of mobile agents and dynamic and static obstacles based on the gym library;

2)设计马尔代夫决策过程,设计状态空间S、动作空间A、转移概率P、奖励R和折扣因子γ;2) Design the Maldives decision-making process, design the state space S, the action space A, the transition probability P, the reward R and the discount factor γ;

3)设计神经网络结构;3) Design the neural network structure;

4)使用最佳互惠碰撞避免算法(ORCA),通过模仿学习预训练3000集来初始化网络参数;然后通过移动智能体在仿真环境下的实际交互进行训练来优化网络参数。4) Using the Optimal Reciprocal Collision Avoidance Algorithm (ORCA), the network parameters are initialized by imitating the learning pre-training 3000 sets; then the network parameters are optimized by training the actual interaction of mobile agents in a simulated environment.

5)通过自适应时刻估计方法(Adam)训练神经网络得到最优值函数:5) The optimal value function is obtained by training the neural network through the adaptive time estimation method (Adam):

V*(ut)=∑γΔt·Vpref·P(ut,at)V * (u t )=∑γΔt ·Vpref ·P( u t ,at )

6)通过最大化累计回报来设定最优策略:6) Set the optimal strategy by maximizing the cumulative return:

Figure BDA0003623628980000071
Figure BDA0003623628980000071

其中,ut表示当前移动智能体和障碍物的联合状态,a表示动作空间的集合,γ表示衰减因子,Δt表示两个动作之间的时间间隔,Vpref表示首选速度,V*表示在最优值函数,P表示为状态转移函数,R表示为奖励函数;

Figure BDA0003623628980000072
表示下一时刻的联合状态where ut represents the joint state of the current mobile agent and obstacles, a represents the set of action spaces, γ represents the decay factor, Δt represents the time interval between two actions, Vpref represents the preferred speed, and V * represents the optimal Value function, P is the state transition function, R is the reward function;
Figure BDA0003623628980000072
Represents the joint state at the next moment

7)根据最优策略来选择当前时刻的动作at直到移动智能体到达目标。7) Select the action at the current moment according to the optimal strategy until the mobile agent reaches the target.

以上是本发明的较佳实施例,凡依本发明技术方案所作的改变,所产生的功能作用未超出本发明技术方案的范围时,均属于本发明的保护范围。The above are the preferred embodiments of the present invention. Any changes made according to the technical solutions of the present invention, when the resulting functional effects do not exceed the scope of the technical solutions of the present invention, belong to the protection scope of the present invention.

Claims (6)

1.一种舞台环境下的移动智能体动态路径规划方法,其特征在于,包括以下步骤:1. a mobile agent dynamic path planning method under a stage environment, is characterized in that, comprises the following steps: 1)基于gym库建立移动智能体和动静态障碍物的仿真环境模型;1) Establish a simulation environment model of mobile agents and dynamic and static obstacles based on the gym library; 2)设计马尔可夫决策过程,马尔可夫决策过程通过五元组表示<S,A,P,R,γ>,设计状态空间S、动作空间A、转移概率P、奖励R和折扣因子γ;2) Design Markov decision process, Markov decision process is represented by quintuple <S, A, P, R, γ>, design state space S, action space A, transition probability P, reward R and discount factor γ ; 3)设计神经网络结构;3) Design the neural network structure; 4)使用最佳互惠碰撞避免算法ORCA,通过模仿学习预训练来初始化网络参数;模仿学习结束之后然后通过移动智能体在仿真环境下的实际交互进行训练来优化网络参数;4) Using the best reciprocal collision avoidance algorithm ORCA, the network parameters are initialized by imitation learning pre-training; after the imitation learning is over, the network parameters are optimized by training the actual interaction of the mobile agent in the simulation environment; 5)通过自适应时刻估计方法Adam训练神经网络得到最优值函数:5) The optimal value function is obtained by training the neural network through the adaptive time estimation method Adam: V*(ut)=∑γΔt·Vpref·P(ut,at)V * (u t )=∑γΔt ·Vpref ·P( u t ,at ) 6)通过最大化累计回报来设定最优策略:6) Set the optimal strategy by maximizing cumulative returns:
Figure FDA0003623628970000011
Figure FDA0003623628970000011
其中,ut表示当前移动智能体和障碍物的联合状态,at表示动作空间的集合,γ表示衰减因子,Δt表示两个动作之间的时间间隔,Vpref表示首选速度,V*表示在最优值函数,P表示为状态转移函数,R表示为奖励函数;
Figure FDA0003623628970000012
表示下一时刻的联合状态;
where ut represents the joint state of the current mobile agent and the obstacle, at represents the set of action spaces, γ represents the decay factor, Δt represents the time interval between two actions, Vpref represents the preferred speed, and V * represents the Merit function, P is the state transition function, R is the reward function;
Figure FDA0003623628970000012
Represents the joint state at the next moment;
7)根据最优策略来选择当前时刻的动作at直到移动智能体到达目标。7) Select the action at the current moment according to the optimal strategy until the mobile agent reaches the target.
2.根据权利要求1所述的一种舞台环境下的移动智能体动态路径规划方法,其特征在于,所述步骤1)中基于gym库建立仿真环境模型,将移动智能体和动态障碍物设定为半径为0.3米的圆,而将静态障碍物定义为半径在0.5米到1米之间的圆形或者为面积在1平方米到1.5平方米之间的四边形。2. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, in described step 1), establishes simulation environment model based on gym library, and mobile agent and dynamic obstacle are set up. A circle with a radius of 0.3 m is defined, while static obstacles are defined as a circle with a radius of 0.5 m to 1 m or a quadrilateral with an area of 1 m to 1.5 m. 3.根据权利要求1所述的一种舞台环境下的移动智能体动态路径规划方法,其特征在于,所述步骤2)中,设定状态空间S,其中动态障碍物的状态为SD=[Px,Py,Vx,Vy,r,Vpref]、静态障碍物的状态为SS=[Px,Py,r]、移动智能体的状态为ST=[Px,Py,Gx,Gy,Vx,Vy,θ,r,Vpref]、联合状态ut=[ST,SS,SD];其中(Px,Py)为移动智能体和动静态障碍物的当前位置,(Gx,Gy)为所设定的目标点的位置,θ为移动智能体的航向角,r为移动智能体和动静态障碍物的半径大小,Vpref为移动智能体的首选速度,(Vx,Vy)为移动智能体和动态障碍物的移动速度;3. The mobile agent dynamic path planning method under a stage environment according to claim 1, wherein in the step 2), a state space S is set, and the state of the dynamic obstacle is S D = [P x ,P y ,V x ,V y ,r,V pref ], the state of the static obstacle is S S =[P x ,P y ,r], the state of the mobile agent is S T =[P x ,P y ,G x ,G y ,V x ,V y ,θ,r,V pref ], joint state u t =[S T ,S S ,S D ]; where (P x ,P y ) is the movement The current position of the agent and the dynamic and static obstacles, (G x , G y ) is the position of the set target point, θ is the heading angle of the mobile agent, r is the radius of the mobile agent and the dynamic and static obstacles , V pref is the preferred speed of the mobile agent, (V x , V y ) is the moving speed of the mobile agent and dynamic obstacles; 动作空间A为线速度和角速度,为了符合动力学约束,角速度分成18等分在[-π/4,π/4]区间内,线速度按照指数函数
Figure FDA0003623628970000021
x取1,2,3,4,5可获得5个变化平滑的线速度;动作空间共有90种动作组合;
The action space A is the linear velocity and the angular velocity. In order to comply with the dynamic constraints, the angular velocity is divided into 18 equal parts in the interval [-π/4, π/4], and the linear velocity follows an exponential function.
Figure FDA0003623628970000021
Take 1, 2, 3, 4, and 5 for x to obtain 5 smooth linear velocities; there are 90 action combinations in the action space;
转移概率P通过移动智能体在仿真环境下的实际交互来转移状态;奖励R设置为:The transition probability P transfers the state through the actual interaction of the mobile agent in the simulation environment; the reward R is set as:
Figure FDA0003623628970000022
Figure FDA0003623628970000022
其中Gx,y是目标点的位置信息,Px,y是移动智能体的当前位置信息,ds是移动智能体和静态障碍物之间的距离,dd是移动智能体和动态障碍物之间的距离。where G x, y is the position information of the target point, P x, y is the current position information of the mobile agent, d s is the distance between the mobile agent and the static obstacle, and d d is the mobile agent and the dynamic obstacle the distance between.
4.根据权利要求1所述的一种舞台环境下的移动智能体动态路径规划方法,其特征在于,所述步骤3)中的神经网络结构包括:输入层、长短期记忆神经网络模块LSTM、社会注意力机制及输出层;4. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, the neural network structure in described step 3) comprises: input layer, long short term memory neural network module LSTM, Social attention mechanism and output layer; 其中输入层:输入层即为上述步骤而中的联合状态ut=[ST,SS,SD];长短期记忆神经网络模块LSTM:通过LSTM模块将移动智能体周围的障碍物排序,并且固定网络层输出参数;社会注意力机制:通过社会注意力机制模块分析出移动智能体与周围动态障碍物发生碰撞的概率,并且以分数的形式展现出来;输出层:输出层通过对网络参数的加权线性组合输出最优值函数V*(ut)。The input layer: the input layer is the joint state u t = [S T , S S , S D ] in the above steps; the long short-term memory neural network module LSTM: the obstacles around the mobile agent are sorted through the LSTM module, And the output parameters of the network layer are fixed; social attention mechanism: the probability of collision between the mobile agent and surrounding dynamic obstacles is analyzed through the social attention mechanism module, and displayed in the form of scores; output layer: the output layer is based on the network parameters. The weighted linear combination of outputs the optimal value function V * (u t ). 5.根据权利要求1所述的一种舞台环境下的移动智能体动态路径规划方法,其特征在于,所述步骤3)中网络运行流程如下:首先将移动智能体和障碍物的状态信息输入进神经网络结构,然后根据状态信息将障碍物分为动态障碍物和静态障碍物,将移动智能体的状态和动态障碍物的状态输入LSTM模块,再输入进社会注意力机制模块,再将经过处理的状态、得到的交互特征和静态障碍物状态输入两层全连接层,最后通过激活函数对其进行归一化处理得到最优值函数。5. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, in described step 3), the network operation process is as follows: at first the state information of mobile agent and obstacle is input Enter the neural network structure, and then divide the obstacles into dynamic obstacles and static obstacles according to the state information, input the state of the mobile agent and the state of the dynamic obstacles into the LSTM module, and then into the social attention mechanism module, and then pass The processed state, the obtained interaction feature and the static obstacle state are input into two fully connected layers, and finally they are normalized by the activation function to obtain the optimal value function. 6.根据权利要求1所述的一种舞台环境下的移动智能体动态路径规划方法,其特征在于,所述步骤4)中移动智能体在仿真环境下交互时将当前的状态信息、动作信息和奖励信息作为一条经验存储到经验池中,通过TD-error来给每一条经验添加重要性,TD-error是某个时刻动作的值函数和当前网络的最优值函数的一个差值,假如差值越大则说明当前的经验比较差;通过定义:6. the mobile agent dynamic path planning method under a kind of stage environment according to claim 1, is characterized in that, in described step 4), when mobile agent interacts in simulation environment, current state information, action information and The reward information is stored in the experience pool as an experience, and the importance is added to each experience through TD-error. TD-error is a difference between the value function of the action at a certain moment and the optimal value function of the current network. Larger values indicate poorer current experience; by definition: Pt=(|δt|+ε)α P t =(|δ t |+ε) α 其中Pt是选择当前经验的概率,α、ε为常数,δt为TD-error,ε为了防止经验的TD-error为0后不再被回放;Among them, P t is the probability of selecting the current experience, α and ε are constants, δ t is TD-error, and ε is to prevent the experience from being played back after the TD-error is 0; 根据不同经验给经验赋予不同的概率,从而提高选取优秀经验的概率,提高网络的收敛速度;在每一集交互过程中当移动智能体碰到障碍物或者超过单次集运行的最大时间时结束当前集;然后将经验通过梯度反向传播来更新网络参数。Give different probabilities to experience according to different experiences, thereby increasing the probability of selecting excellent experience and improving the convergence speed of the network; in each episode of interaction, when the mobile agent encounters an obstacle or exceeds the maximum time of a single episode operation, it ends The current set; the experience is then back-propagated through the gradient to update the network parameters.
CN202210465123.7A 2022-04-29 2022-04-29 Dynamic path planning method for mobile intelligent body in stage environment Active CN114815834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210465123.7A CN114815834B (en) 2022-04-29 2022-04-29 Dynamic path planning method for mobile intelligent body in stage environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210465123.7A CN114815834B (en) 2022-04-29 2022-04-29 Dynamic path planning method for mobile intelligent body in stage environment

Publications (2)

Publication Number Publication Date
CN114815834A true CN114815834A (en) 2022-07-29
CN114815834B CN114815834B (en) 2024-11-29

Family

ID=82509534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210465123.7A Active CN114815834B (en) 2022-04-29 2022-04-29 Dynamic path planning method for mobile intelligent body in stage environment

Country Status (1)

Country Link
CN (1) CN114815834B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090688A (en) * 2023-04-10 2023-05-09 中国人民解放军国防科技大学 Moving Target Traversal Access Sequence Planning Method Based on Improved Pointer Network
CN118394109A (en) * 2024-06-26 2024-07-26 烟台中飞海装科技有限公司 Simulated countermeasure training method based on multi-agent reinforcement learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Collision avoidance planning method for mobile robot based on deep reinforcement learning in dynamic environment
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113342047A (en) * 2021-06-23 2021-09-03 大连大学 Unmanned aerial vehicle path planning method for improving artificial potential field method based on obstacle position prediction in unknown environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110632931A (en) * 2019-10-09 2019-12-31 哈尔滨工程大学 Collision avoidance planning method for mobile robot based on deep reinforcement learning in dynamic environment
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN113341958A (en) * 2021-05-21 2021-09-03 西北工业大学 Multi-agent reinforcement learning movement planning method with mixed experience
CN113342047A (en) * 2021-06-23 2021-09-03 大连大学 Unmanned aerial vehicle path planning method for improving artificial potential field method based on obstacle position prediction in unknown environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈旿 等: "一种多智能体协同信息一致性算法", 航空学报, vol. 38, no. 12, 25 December 2017 (2017-12-25) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116090688A (en) * 2023-04-10 2023-05-09 中国人民解放军国防科技大学 Moving Target Traversal Access Sequence Planning Method Based on Improved Pointer Network
CN118394109A (en) * 2024-06-26 2024-07-26 烟台中飞海装科技有限公司 Simulated countermeasure training method based on multi-agent reinforcement learning

Also Published As

Publication number Publication date
CN114815834B (en) 2024-11-29

Similar Documents

Publication Publication Date Title
Zhao et al. The experience-memory Q-learning algorithm for robot path planning in unknown environment
Shah et al. Long-distance path planning for unmanned surface vehicles in complex marine environment
CN114397896B (en) A Dynamic Path Planning Method Based on Improved Particle Swarm Optimization Algorithm
Sundarraj et al. Route planning for an autonomous robotic vehicle employing a weight-controlled particle swarm-optimized Dijkstra algorithm
Lv et al. Blind travel prediction based on obstacle avoidance in indoor scene
CN111611749B (en) Simulation method and system for automatic guidance of indoor crowd evacuation based on RNN
CN114815834A (en) A dynamic path planning method for mobile agents in a stage environment
CN112799386A (en) Robot Path Planning Method Based on Artificial Potential Field and Reinforcement Learning
Raheem et al. Development of a* algorithm for robot path planning based on modified probabilistic roadmap and artificial potential field
Chang et al. Interpretable fuzzy logic control for multirobot coordination in a cluttered environment
Lamouik et al. Deep neural network dynamic traffic routing system for vehicles
CN117289691A (en) Training method for path planning agent for reinforcement learning in navigation scene
CN114089751A (en) A Path Planning Method for Mobile Robots Based on Improved DDPG Algorithm
CN118348975A (en) Path planning method, amphibious unmanned platform, storage medium and program product
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Ou et al. Hybrid path planning based on adaptive visibility graph initialization and edge computing for mobile robots
Lei et al. Digital twin‐based multi‐objective autonomous vehicle navigation approach as applied in infrastructure construction
Xue et al. Multi-agent path planning based on MPC and DDPG
Jiang et al. Fuzzy neural network based dynamic path planning
CN115202357A (en) An autonomous mapping method based on spiking neural network
Wang et al. A mapless navigation method based on deep reinforcement learning and path planning
CN119289981A (en) A mobile robot path planning method based on SAC algorithm
Kodagoda et al. Socially aware path planning for mobile robots
Hliwa et al. Optimal path planning of mobile robot using hybrid tabu search-firefly algorithm
Tran et al. Mobile robot planner with low-cost cameras using deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant