CN116700258B - Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning - Google Patents

Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning Download PDF

Info

Publication number
CN116700258B
CN116700258B CN202310692024.7A CN202310692024A CN116700258B CN 116700258 B CN116700258 B CN 116700258B CN 202310692024 A CN202310692024 A CN 202310692024A CN 116700258 B CN116700258 B CN 116700258B
Authority
CN
China
Prior art keywords
state
potential field
target point
apf
action
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310692024.7A
Other languages
Chinese (zh)
Other versions
CN116700258A (en
Inventor
杨泽远
杨刚
陶发展
何晓鹏
熊心和
匡海军
宋兵
汪洋
杨一博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wanjitai Technology Group Digital City Technology Co ltd
Original Assignee
Wanjitai Technology Group Digital City Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wanjitai Technology Group Digital City Technology Co ltd filed Critical Wanjitai Technology Group Digital City Technology Co ltd
Priority to CN202310692024.7A priority Critical patent/CN116700258B/en
Publication of CN116700258A publication Critical patent/CN116700258A/en
Application granted granted Critical
Publication of CN116700258B publication Critical patent/CN116700258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Feedback Control In General (AREA)

Abstract

The application discloses an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which aims at the problem of 'minimum trap' existing in the traditional artificial potential field method, virtualizes a priori environment into a total artificial potential field through APF weighting, and calculates a proper learning value. The algorithm successfully solves the path planning problem in both known and unknown environments. The Q-learning method and the APF method are combined, so that the defects that a classical Q-learning method is low in learning speed, long in time consumption, incapable of learning in known and unknown environments and the like are overcome, and the phenomenon that the traditional artificial potential field method falls into local optimization is effectively avoided. Simulation results show that the APF reinforcement learning algorithm can improve learning speed, path length and path smoothness of path planning, and can avoid obstacles in moving obstacles and smoothly reach a destination.

Description

Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning
Technical Field
The application relates to the technical field of automatic driving, in particular to an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning.
Background
Path planning is one of key technologies of automatic driving, and is a necessary condition for safe driving of an automatic driving automobile. The artificial potential field method (APF) is a relatively mature and real-time planning method in the path planning field, and the working principle is that environment information is abstracted into a gravitational field function and a repulsive force field function, and a collision-free path from a starting point to a target point is planned through a combined force field function. The traditional artificial potential field method cannot guarantee the efficiency of path planning, because two major defects exist: the vehicle may be subject to local minimum problems and under certain conditions the target may not be reachable.
Aiming at the problems of the traditional artificial potential field in intelligent vehicle path planning, the road boundary repulsive potential field is established to limit the running area of the vehicle; solving the problems of local optimum and unreachable targets by improving the function of the repulsive potential field of the obstacle; the information of the obstacle is described more accurately by establishing a speed repulsive force potential field, so that the problem of collision between the automobile and the obstacle is solved. Bei Shaoyi and the like invent an unmanned vehicle local path planning method (CN 202111673751.6) based on an improved artificial potential field method, a distance threshold is set in an attractive force function, a repulsive force field adjusting factor is added in a repulsive force function, when the unmanned vehicle falls into a local optimal solution, a strategy of jumping out the local optimal solution based on a smaller steering angle is adopted on the premise of meeting the steering constraint of the unmanned vehicle, but the calculation amount and the complexity of an algorithm are increased. The invention discloses an intelligent vehicle path planning method (CN 202011307835.3) for improving an artificial potential field algorithm by time culture and the like, which automatically adjusts the step length and the track moving out of an influence area according to the size of the influence area of a local minimum point to automatically escape when the intelligent vehicle falls into the local minimum point, but the calculation amount of the overall track planning is overlarge, and the planned path is not smooth enough.
The reinforcement learning algorithm is a learning method that does not require a priori knowledge. It emphasizes interactive learning with the environment and accumulates rewards by continually attempting to get environmental feedback and then selects the best action to complete the path planning task. The most widely used reinforcement learning algorithm in the path planning field is the Q-learning algorithm, which is a value-based algorithm in reinforcement learning, attempts an action in a particular state, and updates the Q value based on the instant prize or penalty it receives and its estimate of the value of the state taken. By repeatedly trying all actions in all states, then selecting the action that can obtain the maximum benefit according to the Q value.
Disclosure of Invention
The application provides an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, wherein reinforcement learning algorithm is added to the traditional artificial potential field method, and APF weighting is utilized to guide Q-learning, so that the problems of local minimum and the like existing in the traditional potential field method are overcome, the convergence speed of classical Q-learning is accelerated, and the problem of path planning of an intelligent vehicle in a complex environment is solved.
In order to achieve the above purpose, the application adopts the following technical scheme: an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning comprises the following steps:
step S1: acquiring an environment image and establishing a grid map;
step S2, defining a reinforcement learning state space and an action space;
Step S3: initializing algorithm parameters;
step S4: randomly selecting an initial state from a state set;
Step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;
Step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;
step S7: executing the current action in the state s to obtain a new state s t+ and a reward r;
Step S8: updating the Q value;
step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;
step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;
step S11: a path from the start point q 0 to the target point q f is generated, and the finally generated path is transmitted to a control center of the intelligent vehicle, and the intelligent vehicle runs according to the path.
Further, the specific operation of the step S1 is as follows: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.
Further, the specific operation of step S2 is as follows: the state space of reinforcement learning is defined as the current position coordinate and the last position coordinate of the agent, the action space is the actions in the upper, lower, left and right directions, and the agent moves a grid towards the corresponding direction after each action is executed.
Further, the algorithm parameters in the step S3 include a learning rate α e (0, 1), a discount factor γ e (0, 1), a greedy factor ζ e (0, 1), a maximum iteration number, and a reward function r; all Q values are initialized to 0.
Further, the specific operation of step S5 is as follows: constructing a gravitational field of the target point according to the positions of the obstacle and the target point:
Wherein U att (q) is the gravitational field generated by the target point at the position q, k att is the gravitational coefficient of the target point, q is the position coordinate, and q f is the coordinate of the target point.
Constructing a repulsive force field of the obstacle:
wherein U rep (q) is a repulsive potential field generated by an obstacle, k rep is a repulsive coefficient of the obstacle, ρ represents a limit distance affected by the repulsive potential field, and ρ 0 represents a shortest distance from a current position to the obstacle.
The total artificial potential field comprises two terms, an attraction potential function and a repulsion potential function. The total artificial potential field U (q) is then the sum of these two potential functions:
U(q)=Uatt(q)+Urep(q)
Further, the specific operation of step S6 is as follows: generating uniform random numbers. If the uniform random number is greater than the decision rate ζ ε (0, 1), then APF weighting process would be performed. Otherwise, the explore-utilize method will be performed.
For APF weighting we will employ a molar neighborhood, which is defined on a two-dimensional square lattice and consists of a central cell (in this case, the current smart car state st) and eight cells surrounding it.
The probability assigned to adjacent cells of st is inversely proportional to their total artificial potential field. The neighbor cell with the lowest total artificial potential field has the highest probability of being assigned to st+1, while the neighbor cell with the highest total artificial potential field has the lowest probability of being assigned to st+1.
The probability of selecting a region p i (i=1,., k) is calculated, where k is the number of neighboring cells. The probability of which region to select is:
a standard (unit) APF weighting function is calculated. APF weighting function σ:
Where i=1..k, R k→(0,1)k,p=(p1,...,pk)∈Rk
The cumulative probability obtained by the APF weighting function is used to select action a t in state s t. First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than a random number is selected for state s t+1.
The cumulative probability obtained by the APF weighting function is used to select action a t in state s t. First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than a random number is selected for state s t+1.
If the APF weighting process is not performed, a explore-utilize method will be performed. Generating uniform random numbers. If the uniform random number is greater than the decision rate ζ, then a development process will be performed. Otherwise, a probing process is performed.
Further, the specific operation of step S7 is as follows: once action a t is selected by the APF weighting, utilization, or exploration process. Then action a t is performed and the prize r is received, reaching a new state s t+1.
Further, the specific operation of updating the Q value in step S8 is as follows: it selects action a, based on state S, according to S6, reaches state S', and gets the immediate prize r, the update of Q value with mathematical formula can be expressed as:
wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:
Further, the specific operation of step S10 is as follows:
The new state s t+1 is assigned to the current state s t to continue the iterative process, if s t+1 is the end state, the current round of iteration is completed, otherwise go to step 6 until the target point is reached.
Further, the method for generating the path in the step S11 is as follows:
Initializing the index i taken by the target point to 0, starting the point q 0 to the current state s t, and then performing an iterative process of path generation. When the next state does not reach the target point, the index i is increased by 1. Further performing environment verification, if the environment changes, updating the environment information Q G; the best action a t in the current state s t is selected using the learning value Q m×n. Once action a t has been selected, it is performed. Subsequently, a new state s t+1 is calculated using equation s t+1=f(st,at). Finally, a new state s t+1 is assigned to the current state s t, and the current state is assigned to the target point q i. If the current state s t is ultimately equal to the target point q f, the algorithm returns to the collision-free path between the starting point q 0 and the target point q f.
The application provides an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which aims at the problem of 'minimum trap' existing in the traditional artificial potential field method, virtualizes a priori environment into a total artificial potential field through APF weighting, and calculates a proper learning value. The algorithm successfully solves the path planning problem in both known and unknown environments. The Q-learning method and the APF method are combined, so that the defects that a classical Q-learning method is low in learning speed, long in time consumption, incapable of learning in known and unknown environments and the like are overcome, and the phenomenon that the traditional artificial potential field method falls into local optimization is effectively avoided. Simulation results show that the APF reinforcement learning algorithm can improve learning speed, path length and path smoothness of path planning, and can avoid obstacles in moving obstacles and smoothly reach a destination.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a drawing of a potential field of an artificial potential field;
FIG. 3 is a repulsive potential field diagram of an artificial potential field;
FIG. 4 is a total potential field diagram of an artificial potential field;
FIG. 5 is an overall frame diagram of the proposed new algorithm of the present invention;
FIG. 6 is a graph of a path planned by the improved algorithm of the present invention;
Fig. 7 is a step chart of each iteration of the algorithm.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1
Referring to fig. 1, the invention relates to an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which comprises the following specific implementation steps:
step S1: acquiring an environment image and establishing a grid map;
step S2, defining a reinforcement learning state space and an action space;
Step S3: initializing algorithm parameters;
step S4: randomly selecting an initial state from a state set;
Step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;
Step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;
step S7: executing the current action in the state s to obtain a new state st+1 and a reward r;
Step S8: updating the Q value;
step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;
step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;
step S11: a path from the start point q 0 to the target point q f is generated, and the finally generated path is transmitted to a control center of the intelligent vehicle, and the intelligent vehicle runs according to the path.
Wherein, step S1 specifically comprises the following steps: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.
The step S2 defines a state space of reinforcement learning as a current position coordinate and a previous position coordinate of the agent, and an action space as actions in four directions of up, down, left and right, wherein the agent moves a grid in a corresponding direction after each action is executed.
The algorithm parameters in the step S3 comprise a learning rate alpha epsilon (0, 1), a discount factor gamma epsilon (0, 1), a greedy factor zeta epsilon (0, 1), a maximum iteration number and a reward function r; all Q values are initialized to 0.
The specific operation of step S5 is as follows: the main idea of the Artificial Potential Field (APF) method is to establish an attractive potential field force around the target point and an repulsive potential field force around the obstacle. With this idea, the APF method employs attraction and repulsion components to attract a smart car to its target while keeping it away from obstacles.
Thus, the total artificial potential field comprises two terms, an attraction potential function and a repulsion potential function. The total artificial potential field U (q) is then the sum of these two potential functions:
U(q)=Uatt(q)+Urep(q)
as shown in fig. two, constructing a gravitational field of the target point according to the positions of the obstacle and the target point:
Wherein U att (q) is the gravitational field generated by the target point at the position q, k att is the gravitational coefficient of the target point, q is the position coordinate, and q f is the coordinate of the target point.
As shown in fig. three, a repulsive force field of the obstacle is constructed:
wherein U rep (q) is a repulsive potential field generated by an obstacle, k rep is a repulsive coefficient of the obstacle, ρ represents a limit distance affected by the repulsive potential field, and ρ 0 represents a shortest distance from a current position to the obstacle.
As shown in fig. four, a total artificial potential field is formed from the attractive potential field and the repulsive potential field.
As shown in fig. five, the action selection of the improved algorithm includes the following steps:
Generating uniform random numbers. If the uniform random number is greater than the decision rate ζ ε (0, 1), then APF weighting process would be performed. Otherwise, the explore-utilize method will be performed.
For APF weighting we will employ a molar neighborhood, which is defined on a two-dimensional square lattice and consists of a central cell (in this case, the current smart car state st) and eight cells surrounding it.
The probability assigned to adjacent cells of st is inversely proportional to their total artificial potential field. The neighbor cell with the lowest total artificial potential field has the highest probability of being assigned to st+1, while the neighbor cell with the highest total artificial potential field has the lowest probability of being assigned to st+1.
The probability of selecting a region p i (i=1,., k) is calculated, where k is the number of neighboring cells. The probability of which region to select is:
a standard (unit) APF weighting function is calculated. APF weighting function σ:
Where i=1..k, R k→(0,1)k,p=(p1,...,pk)∈Rk
The cumulative probability obtained by the APF weighting function is used to select action a t in state s t. First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than a random number is selected for state s t+1.
If the APF weighting process is not performed, a explore-utilize method will be performed. Generating uniform random numbers. If the uniform random number is greater than the decision rate ζ, then a development process will be performed. Otherwise, a probing process is performed.
Then, once action a t is selected by the APF weighting, utilization, or exploration process. Then action a t is performed and the prize r is received, reaching a new state s t+1. And updating the Q table according to the formula Q (s, a) +.Q (s, a) +alpha (r+gammam a axQ (s ', a') -Q (s, a)).
Wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:
The new state s t+1 is assigned to the current state s t to continue the iterative process, if s t+1 is the end state, the current round of iteration is completed, otherwise go to step 6 until the target point is reached.
The method for generating the path in step S11 is as follows:
Initializing an index i adopted by the target point to 0, assigning a starting point q0 to a current state s t, and performing an iterative process of path generation. When the next state does not reach the target point, the index i is increased by 1. Further performing environment verification, if the environment changes, updating the environment information Q G; the best action a t in the current state s t is selected using the learning value Q m×n. Once action a t has been selected, it is performed. Subsequently, a new state s t+1 is calculated using equation s t+1=f(st,at). Finally, a new state s t+1 is assigned to the current state s t, and the current state is assigned to the target point q i. If the current state s t is ultimately equal to the target point q f, the algorithm returns to the collision-free path between the starting point q 0 and the target point q f.
By using the method, the optimal driving path of the intelligent vehicle is obtained based on the example through the parameter setting, and is shown in a figure six.
The seventh illustration shows that the step size efficiency required by the intelligent vehicle to find an optimal path is higher and higher through training.
The artificial potential field combined reinforcement learning algorithm provided by the invention not only effectively solves the defects of the traditional artificial potential field method, but also further accelerates the convergence rate of the algorithm and improves the efficiency of the algorithm.

Claims (1)

1. An intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning is characterized by comprising the following steps:
step S1: acquiring an environment image and establishing a grid map;
step S2, defining a reinforcement learning state space and an action space;
Step S3: initializing algorithm parameters;
step S4: randomly selecting an initial state from a state set;
Step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;
Step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;
Step S7: executing the current action in the state s to obtain a new state s t+1 and a reward r;
Step S8: updating the Q value;
step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;
step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;
Step S11: generating a path from a starting point q 0 to a target point q f, and transmitting the finally generated path to a control center of the intelligent vehicle, wherein the intelligent vehicle runs according to the path;
The specific operation of the step S1 is as follows: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass;
The specific operation of the step S2 is as follows: defining a state space of reinforcement learning as a current position coordinate and a last position coordinate of the intelligent agent, and an action space as actions in four directions of up, down, left and right, wherein the intelligent agent moves a grid towards a corresponding direction after each action is executed;
The algorithm parameters in the step S3 comprise a learning rate alpha epsilon (0, 1), a discount factor gamma epsilon (0, 1), a greedy factor zeta epsilon (0, 1), a maximum iteration number and a reward function r; initializing all Q values to 0;
The specific operation of the step S5 is as follows: constructing a gravitational field of the target point according to the positions of the obstacle and the target point:
Wherein U att (q) is the gravitational field generated by the target point at the position q, k att is the gravitational coefficient of the target point, q is the position coordinate, and q f is the coordinate of the target point;
constructing a repulsive force field of the obstacle:
Wherein U rep (q) is a repulsive force potential field generated by an obstacle, k rep is a repulsive force coefficient of the obstacle, ρ represents a limit distance influenced by the repulsive force potential field, and ρ 0 represents the shortest distance from the current position to the obstacle;
the total artificial potential field comprises two items, an attraction potential function and a rejection potential function; the total artificial potential field U (q) is then the sum of these two potential functions:
U(q)=Uatt(q)+Urep(q)
The specific operation of the step S6 is as follows: generating uniform random numbers; if the uniform random number is greater than the decision rate ζ E (0, 1), an APF weighting process is performed; otherwise, the explore-utilize method will be performed;
for APF weighting we will employ a molar neighborhood defined on a two-dimensional square lattice and consisting of a central cell (in this case, the current smart car state st) and eight cells surrounding it;
the probability assigned to adjacent cells of st is inversely proportional to their total artificial potential field; the neighbor cell with the lowest total artificial potential field has the highest probability of being assigned to st+1, while the neighbor cell with the highest total artificial potential field has the lowest probability of being assigned to st+1;
Calculating a probability p i of selecting a region (i=1,., k), where k is the number of neighboring cells; the probability of which region to select is:
Calculating a standard (unit) APF weighting function; APF weighting function σ:
Where i=1..k, R k→(0,1)k,p=(p1,...,pk)∈Rk
The cumulative probability obtained by the APF weighting function is used to select action a t in state s t; first, sorting the probabilities contained in p; then, generating a random number between zero and one; starting from the top of the list, selecting a first neighbor cell for state s t+1 with a cumulative probability greater than a random number;
The cumulative probability obtained by the APF weighting function is used to select action a t in state s t; first, sorting the probabilities contained in p; then, generating a random number between zero and one; starting from the top of the list, selecting a first neighbor cell for state s t+1 with a cumulative probability greater than a random number;
If the APF weighting process is not performed, the explore-utilize method will be performed; generating uniform random numbers; if the uniform random number is greater than the decision rate xi, the development process will be executed; otherwise, executing the detection process;
The specific operation of the step S7 is as follows: once action a t is selected by the APF weighting, utilization, or exploration process; then act a t is performed to receive the bonus r to reach a new state s t+1;
the specific operation of updating the Q value in step S8 is as follows: it selects action a, based on state S, according to S6, reaches state S', and gets the immediate prize r, the update of Q value with mathematical formula can be expressed as:
wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:
the specific operation of step S10 is as follows:
The new state s t+1 is allocated to the current state s t to continue the iterative process, if s t+1 is the end state, the current round of iteration is completed, otherwise, the process goes to the step 6 until the target point is reached;
the method for generating the path in the step S11 is as follows:
Initializing an index i adopted by a target point to be 0, allocating a starting point q 0 to a current state s t, and then carrying out an iterative process of path generation; when the next state does not reach the target point, the index i is increased by 1; further performing environment verification, if the environment changes, updating the environment information Q G; selecting the best action a t in the current state s t using the learning value Q m×n; once action a t has been selected, the action is performed; subsequently, a new state s t+1 is calculated using equation s t+1=f(st,at); finally, a new state s t+1 is assigned to the current state s t, and the current state is assigned to the target point q i; if the current state s t is ultimately equal to the target point q f, the algorithm returns to the collision-free path between the starting point q 0 and the target point q f.
CN202310692024.7A 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning Active CN116700258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310692024.7A CN116700258B (en) 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310692024.7A CN116700258B (en) 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Publications (2)

Publication Number Publication Date
CN116700258A CN116700258A (en) 2023-09-05
CN116700258B true CN116700258B (en) 2024-05-03

Family

ID=87827236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310692024.7A Active CN116700258B (en) 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Country Status (1)

Country Link
CN (1) CN116700258B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0850548A (en) * 1994-08-05 1996-02-20 Nikon Corp Method and device for learning path
CN110919659A (en) * 2019-12-24 2020-03-27 哈尔滨工程大学 Robot control method based on DDGPES
CN111515932A (en) * 2020-04-23 2020-08-11 东华大学 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning
CN112344944A (en) * 2020-11-24 2021-02-09 湖北汽车工业学院 Reinforced learning path planning method introducing artificial potential field
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
WO2022229657A1 (en) * 2021-04-30 2022-11-03 Cambridge Enterprise Limited Method and system for robot navigation in unknown environments
CN115344046A (en) * 2022-08-19 2022-11-15 南京信息工程大学 Mobile robot path planning based on improved deep Q network algorithm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0850548A (en) * 1994-08-05 1996-02-20 Nikon Corp Method and device for learning path
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN110919659A (en) * 2019-12-24 2020-03-27 哈尔滨工程大学 Robot control method based on DDGPES
CN111515932A (en) * 2020-04-23 2020-08-11 东华大学 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning
CN112344944A (en) * 2020-11-24 2021-02-09 湖北汽车工业学院 Reinforced learning path planning method introducing artificial potential field
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
WO2022229657A1 (en) * 2021-04-30 2022-11-03 Cambridge Enterprise Limited Method and system for robot navigation in unknown environments
CN115344046A (en) * 2022-08-19 2022-11-15 南京信息工程大学 Mobile robot path planning based on improved deep Q network algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Path Planning of Coastal Ships Based on Optimized DQN Reward Function;Guo Siyu, etal.;Journal ofMarine Science and Engineering;20210218;第9卷(第2期);210 *
基于强化学习的智慧出行路线规划算法研究与实现;许珂;中国优秀硕士学位论文全文数据库 信息科技辑;20210515(第5期);第22-34页 *

Also Published As

Publication number Publication date
CN116700258A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN107272679B (en) Path planning method based on improved ant colony algorithm
CN109164810B (en) Robot self-adaptive dynamic path planning method based on ant colony-clustering algorithm
CN109945881B (en) Mobile robot path planning method based on ant colony algorithm
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
CN111896006B (en) Path planning method and system based on reinforcement learning and heuristic search
CN110991972B (en) Cargo transportation system based on multi-agent reinforcement learning
CN110852448A (en) Cooperative intelligent agent learning method based on multi-intelligent agent reinforcement learning
CN110471444A (en) UAV Intelligent barrier-avoiding method based on autonomous learning
CN114489059B (en) Path planning method based on D3QN-PER mobile robot
CN112550314B (en) Embedded optimization type control method suitable for unmanned driving, driving control module and automatic driving control system thereof
CN112327923A (en) Multi-unmanned aerial vehicle collaborative path planning method
CN110389591A (en) A kind of paths planning method based on DBQ algorithm
CN116451934B (en) Multi-unmanned aerial vehicle edge calculation path optimization and dependent task scheduling optimization method and system
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN113467481B (en) Path planning method based on improved Sarsa algorithm
CN113110101B (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN113382060B (en) Unmanned aerial vehicle track optimization method and system in Internet of things data collection
CN113377131B (en) Method for acquiring unmanned aerial vehicle collected data track by using reinforcement learning
CN112595326A (en) Improved Q-learning path planning algorithm with fusion of priori knowledge
CN116700258B (en) Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning
Han et al. Multi-uav automatic dynamic obstacle avoidance with experience-shared a2c
CN115344046A (en) Mobile robot path planning based on improved deep Q network algorithm
CN113985870A (en) Path planning method based on meta reinforcement learning
Yu et al. A novel automated guided vehicle (AGV) remote path planning based on RLACA algorithm in 5G environment
CN117880858B (en) Multi-unmanned aerial vehicle track optimization and power control method based on communication learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240408

Address after: 530022, Room 501, Room 502, Room 503, Room 505, 5th Floor, Building 2, Nanning International Tourism Center, No. 111 Fengling North Road, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant after: Wanjitai Technology Group Digital City Technology Co.,Ltd.

Country or region after: China

Address before: No. 11-3 Chongqing International Student Entrepreneurship Park, Erlang Kecheng Road, Jiulongpo District, Chongqing, 400000

Applicant before: CHONGQING RONGGUAN TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant