CN116700258A - Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning - Google Patents

Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning Download PDF

Info

Publication number
CN116700258A
CN116700258A CN202310692024.7A CN202310692024A CN116700258A CN 116700258 A CN116700258 A CN 116700258A CN 202310692024 A CN202310692024 A CN 202310692024A CN 116700258 A CN116700258 A CN 116700258A
Authority
CN
China
Prior art keywords
potential field
state
intelligent vehicle
artificial potential
target point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310692024.7A
Other languages
Chinese (zh)
Other versions
CN116700258B (en
Inventor
杨泽远
杨刚
陶发展
何晓鹏
熊心和
匡海军
宋兵
汪洋
杨一博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wanjitai Technology Group Digital City Technology Co ltd
Original Assignee
Chongqing Rongguan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Rongguan Technology Co ltd filed Critical Chongqing Rongguan Technology Co ltd
Priority to CN202310692024.7A priority Critical patent/CN116700258B/en
Publication of CN116700258A publication Critical patent/CN116700258A/en
Application granted granted Critical
Publication of CN116700258B publication Critical patent/CN116700258B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Feedback Control In General (AREA)

Abstract

The application discloses an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which aims at the problem of 'minimum trap' existing in the traditional artificial potential field method, virtualizes a priori environment into a total artificial potential field through APF weighting, and calculates a proper learning value. The algorithm successfully solves the path planning problem in both known and unknown environments. The Q-learning method and the APF method are combined, so that the defects that a classical Q-learning method is low in learning speed, long in time consumption, incapable of learning in known and unknown environments and the like are overcome, and the phenomenon that the traditional artificial potential field method falls into local optimization is effectively avoided. Simulation results show that the APF reinforcement learning algorithm can improve learning speed, path length and path smoothness of path planning, and can avoid obstacles in moving obstacles and smoothly reach a destination.

Description

Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning
Technical Field
The application relates to the technical field of automatic driving, in particular to an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning.
Background
Path planning is one of key technologies of automatic driving, and is a necessary condition for safe driving of an automatic driving automobile. The artificial potential field method (APF) is a relatively mature and real-time planning method in the path planning field, and the working principle is that environment information is abstracted into a gravitational field function and a repulsive force field function, and a collision-free path from a starting point to a target point is planned through a combined force field function. The traditional artificial potential field method cannot guarantee the efficiency of path planning, because two major defects exist: the vehicle may be subject to local minimum problems and under certain conditions the target may not be reachable.
Aiming at the problems of the traditional artificial potential field in intelligent vehicle path planning, the road boundary repulsive potential field is established to limit the running area of the vehicle; solving the problems of local optimum and unreachable targets by improving the function of the repulsive potential field of the obstacle; the information of the obstacle is described more accurately by establishing a speed repulsive force potential field, so that the problem of collision between the automobile and the obstacle is solved. Bei Shaoyi and the like invent a method for planning a local path of an unmanned vehicle based on an improved artificial potential field method (CN 202111673751.6), a distance threshold is set in an attractive force function, a repulsive force field adjusting factor is added in a repulsive force function, when the unmanned vehicle falls into a local optimal solution, a strategy of jumping out the local optimal solution based on a smaller steering angle is adopted on the premise of meeting the steering constraint of the unmanned vehicle, but the calculation amount and the complexity of an algorithm are increased. The application discloses an intelligent vehicle path planning method (CN 202011307835.3) for improving an artificial potential field algorithm by time culture and the like, which automatically adjusts the step length and the track moving out of an influence area according to the size of the influence area of a local minimum point so as to automatically escape when the intelligent vehicle falls into the local minimum point, but the calculation amount of the overall track planning is overlarge, and the planned path is not smooth enough.
The reinforcement learning algorithm is a learning method that does not require a priori knowledge. It emphasizes interactive learning with the environment and accumulates rewards by continually attempting to get environmental feedback and then selects the best action to complete the path planning task. The most widely used reinforcement learning algorithm in the path planning field is the Q-learning algorithm, which is a value-based algorithm in reinforcement learning, attempts an action in a particular state, and updates the Q value based on the instant prize or penalty it receives and its estimate of the value of the state taken. By repeatedly trying all actions in all states, then selecting the action that can obtain the maximum benefit according to the Q value.
Disclosure of Invention
The application provides an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, wherein reinforcement learning algorithm is added to the traditional artificial potential field method, and APF weighting is utilized to guide Q-learning, so that the problems of local minimum and the like existing in the traditional potential field method are overcome, the convergence speed of classical Q-learning is accelerated, and the problem of path planning of an intelligent vehicle in a complex environment is solved.
In order to achieve the above purpose, the application adopts the following technical scheme: an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning comprises the following steps:
step S1: acquiring an environment image and establishing a grid map;
step S2, defining a reinforcement learning state space and an action space;
step S3: initializing algorithm parameters;
step S4: randomly selecting an initial state from a state set;
step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;
step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;
step S7: executing the current action in the state s to obtain a new state s t+ 1 and a prize r;
step S8: updating the Q value;
step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;
step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;
step S11: generating a sequence from the starting point q 0 To target point q f And (3) transmitting the finally generated path to a control center of the intelligent vehicle, and driving the intelligent vehicle according to the path.
Further, the specific operation of the step S1 is as follows: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.
Further, the specific operation of step S2 is as follows: the state space of reinforcement learning is defined as the current position coordinate and the last position coordinate of the agent, the action space is the actions in the upper, lower, left and right directions, and the agent moves a grid towards the corresponding direction after each action is executed.
Further, the algorithm parameters in the step S3 include a learning rate α e (0, 1), a discount factor γ e (0, 1), a greedy factor ζ e (0, 1), a maximum iteration number, and a reward function r; all Q values are initialized to 0.
Further, the specific operation of step S5 is as follows: constructing a gravitational field of the target point according to the positions of the obstacle and the target point:
wherein U is att (q) is the gravitational field, k, generated by the target point at position q att Is the gravitation coefficient of the target point, q is the position coordinate, q f Is the coordinate of the target point.
Constructing a repulsive force field of the obstacle:
wherein U is rep (q) is the repulsive potential field, k, generated by the obstacle rep As the repulsive force coefficient of the obstacle, ρ represents the limit distance of the influence of the repulsive force potential field, ρ 0 Representing the shortest distance of the current location to the obstacle.
The total artificial potential field comprises two terms, an attraction potential function and a repulsion potential function. The total artificial potential field U (q) is then the sum of these two potential functions:
U(q)=U att (q)+U rep (q)
further, the specific operation of step S6 is as follows: generating uniform random numbers. If the uniform random number is greater than the decision rate ζ ε (0, 1), then APF weighting process would be performed. Otherwise, the explore-utilize method will be performed.
For APF weighting we will employ a molar neighborhood, which is defined on a two-dimensional square lattice and consists of a central cell (in this case, the current smart car state st) and eight cells surrounding it.
The probability assigned to adjacent cells of st is inversely proportional to their total artificial potential field. The neighbor cell with the lowest total artificial potential field has the highest probability of being assigned to st+1, while the neighbor cell with the highest total artificial potential field has the lowest probability of being assigned to st+1.
Calculating probability p of selecting region i (i=1,., k), where k is the number of neighboring cells. The probability of which region to select is:
a standard (unit) APF weighting function is calculated. APF weighting function σ:
where i=1..k, R k →(0,1) k ,p=(p 1 ,...,p k )∈R k
The cumulative probability obtained by the APF weighting function is used to select state s t Action a in (a) t . First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than the random number is selected for state s t+1
The cumulative probability obtained by the APF weighting function is used to select state s t Action a in (a) t . First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than the random number is selected for state s t+1
If the APF weighting process is not performed, a explore-utilize method will be performed. Generating uniform random numbers. If the uniform random number is greater than the decision rate ζ, then a development process will be performed. Otherwise, a probing process is performed.
Further, the specific operation of step S7 is as follows: once action a is selected by APF weighting, utilization or exploration procedures t . Then act a is performed t Receive the reward r to reach the new state s t+1
Further, the specific operation of updating the Q value in step S8 is as follows: it selects action a, based on state S, according to S6, reaches state S', and gets the immediate prize r, the update of Q value with mathematical formula can be expressed as:
wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:
further, the specific operation of step S10 is as follows:
new state s t+1 Assigned to the current state s t To continue the iterative process if s t+1 And (6) ending the state, and if the current round of iteration is completed, turning to the step (6) until the target point is reached.
Further, the method for generating the path in the step S11 is as follows:
initializing an index i adopted by a target point to be 0 and starting a point q 0 Assigned to the current state s t An iterative process of path generation follows. When the next state does not reach the target point, the index i is increased by 1. Further execution environment verification, if the environment changes, update the environment information Q G The method comprises the steps of carrying out a first treatment on the surface of the Using learned value Q m×n Selecting the current state s t Optimum action a t . Once action a has been selected t This action is performed. Subsequently, equation s is used t+1 =f(s t ,a t ) Calculating a new state s t+1 . Finally, new state s t+1 Assigned to the current state s t And the current state is assigned to the target point q i . If the current state s t Eventually equal to the target point q f The algorithm returns to the starting point q 0 And target point q f No collision path between them.
The application provides an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which aims at the problem of 'minimum trap' existing in the traditional artificial potential field method, virtualizes a priori environment into a total artificial potential field through APF weighting, and calculates a proper learning value. The algorithm successfully solves the path planning problem in both known and unknown environments. The Q-learning method and the APF method are combined, so that the defects that a classical Q-learning method is low in learning speed, long in time consumption, incapable of learning in known and unknown environments and the like are overcome, and the phenomenon that the traditional artificial potential field method falls into local optimization is effectively avoided. Simulation results show that the APF reinforcement learning algorithm can improve learning speed, path length and path smoothness of path planning, and can avoid obstacles in moving obstacles and smoothly reach a destination.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a flow chart of the method of the present application;
FIG. 2 is a drawing of a potential field of an artificial potential field;
FIG. 3 is a repulsive potential field diagram of an artificial potential field;
FIG. 4 is a total potential field diagram of an artificial potential field;
FIG. 5 is an overall frame diagram of the proposed new algorithm of the present application;
FIG. 6 is a graph of a path planned by the improved algorithm of the present application;
fig. 7 is a step chart of each iteration of the algorithm.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Example 1
Referring to fig. 1, the application relates to an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which comprises the following specific implementation steps:
step S1: acquiring an environment image and establishing a grid map;
step S2, defining a reinforcement learning state space and an action space;
step S3: initializing algorithm parameters;
step S4: randomly selecting an initial state from a state set;
step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;
step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;
step S7: executing the current action in the state s to obtain a new state st+1 and a reward r;
step S8: updating the Q value;
step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;
step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;
step S11: generating a sequence from the starting point q 0 To target point q f And (3) transmitting the finally generated path to a control center of the intelligent vehicle, and driving the intelligent vehicle according to the path.
Wherein, step S1 specifically comprises the following steps: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.
The step S2 defines a state space of reinforcement learning as a current position coordinate and a previous position coordinate of the agent, and an action space as actions in four directions of up, down, left and right, wherein the agent moves a grid in a corresponding direction after each action is executed.
The algorithm parameters in the step S3 comprise a learning rate alpha epsilon (0, 1), a discount factor gamma epsilon (0, 1), a greedy factor zeta epsilon (0, 1), a maximum iteration number and a reward function r; all Q values are initialized to 0.
The specific operation of step S5 is as follows: the main idea of the Artificial Potential Field (APF) method is to establish an attractive potential field force around the target point and an repulsive potential field force around the obstacle. With this idea, the APF method employs attraction and repulsion components to attract a smart car to its target while keeping it away from obstacles.
Thus, the total artificial potential field comprises two terms, an attraction potential function and a repulsion potential function. The total artificial potential field U (q) is then the sum of these two potential functions:
U(q)=U att (q)+U rep (q)
as shown in fig. two, constructing a gravitational field of the target point according to the positions of the obstacle and the target point:
wherein U is att (q) is the gravitational field, k, generated by the target point at position q att Is the gravitation coefficient of the target point, q is the position coordinate, q f Is the coordinate of the target point.
As shown in fig. three, a repulsive force field of the obstacle is constructed:
wherein U is rep (q) is the repulsive potential field, k, generated by the obstacle rep As the repulsive force coefficient of the obstacle, ρ represents the limit distance of the influence of the repulsive force potential field, ρ 0 Representing the shortest distance of the current location to the obstacle.
As shown in fig. four, a total artificial potential field is formed from the attractive potential field and the repulsive potential field.
As shown in fig. five, the action selection of the improved algorithm includes the following steps:
generating uniform random numbers. If the uniform random number is greater than the decision rate ζ ε (0, 1), then APF weighting process would be performed. Otherwise, the explore-utilize method will be performed.
For APF weighting we will employ a molar neighborhood, which is defined on a two-dimensional square lattice and consists of a central cell (in this case, the current smart car state st) and eight cells surrounding it.
The probability assigned to adjacent cells of st is inversely proportional to their total artificial potential field. The neighbor cell with the lowest total artificial potential field has the highest probability of being assigned to st+1, while the neighbor cell with the highest total artificial potential field has the lowest probability of being assigned to st+1.
Calculating probability p of selecting region i (i=1,., k), where k is the number of neighboring cells. The probability of which region to select is:
a standard (unit) APF weighting function is calculated. APF weighting function σ:
where i=1..k, R k →(0,1) k ,p=(p 1 ,...,p k )∈R k
The cumulative probability obtained by the APF weighting function is used to select state s t Action a in (a) t . First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than the random number is selected for state s t+1
If the APF weighting process is not performed, a explore-utilize method will be performed. Generating uniform random numbers. If the uniform random number is greater than the decision rate ζ, then a development process will be performed. Otherwise, a probing process is performed.
Subsequently, once action a is selected by the APF weighting, utilization or exploration procedure t . Then act a is performed t Receive the reward r to reach the new state s t+1 . Then according to the formula Q (s, a) ≡Q (s, a) +alpha (r+gammam) a axQ (s ', a') -Q (s, a)) updates the Q table.
Wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:
new state s t+1 Assigned to the current state s t To continue the iterative process if s t+1 And (6) ending the state, and if the current round of iteration is completed, turning to the step (6) until the target point is reached.
The method for generating the path in step S11 is as follows:
initializing an index i adopted by the target point to 0, and assigning a starting point q0 to the current state s t An iterative process of path generation follows. When the next state does not reach the target point, the index i is increased by 1. Further execution environment verification, if the environment changes, update the environment information Q G The method comprises the steps of carrying out a first treatment on the surface of the Using learned value Q m×n Selecting the current state s t Optimum action a t . Once action a has been selected t This action is performed. Subsequently, equation s is used t+1 =f(s t ,a t ) Calculating a new state s t+1 . Finally, new state s t+1 Assigned to the current state s t And the current state is assigned to the target point q i . If the current state s t Eventually equal to the target point q f The algorithm returns to the starting point q 0 And target point q f No collision path between them.
By using the method, the optimal driving path of the intelligent vehicle is obtained based on the example through the parameter setting, and is shown in a figure six.
The seventh illustration shows that the step size efficiency required by the intelligent vehicle to find an optimal path is higher and higher through training.
The artificial potential field combined reinforcement learning algorithm provided by the application not only effectively solves the defects of the traditional artificial potential field method, but also further accelerates the convergence rate of the algorithm and improves the efficiency of the algorithm.

Claims (10)

1. An intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning is characterized by comprising the following steps:
step S1: acquiring an environment image and establishing a grid map;
step S2, defining a reinforcement learning state space and an action space;
step S3: initializing algorithm parameters;
step S4: randomly selecting an initial state from a state set;
step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;
step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;
step S7: executing the current action in the state s to obtain a new state s t+1 And rewards r;
step S8: updating the Q value;
step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;
step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;
step S11: generating a sequence from the starting point q 0 To target point q f And (3) transmitting the finally generated path to a control center of the intelligent vehicle, and driving the intelligent vehicle according to the path.
2. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 1, wherein the specific operation of step S1 is as follows: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.
3. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 2, wherein the specific operation of step S2 is as follows: the state space of reinforcement learning is defined as the current position coordinate and the last position coordinate of the agent, the action space is the actions in the upper, lower, left and right directions, and the agent moves a grid towards the corresponding direction after each action is executed.
4. The intelligent vehicle path planning method based on artificial potential field method and reinforcement learning according to claim 3, wherein the algorithm parameters in the step S3 include learning rate α e (0, 1), discount factor γ e (0, 1), greedy factor ζ e (0, 1), maximum iteration number, rewarding function r; all Q values are initialized to 0.
5. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 4, wherein the specific operation of step S5 is as follows: constructing a gravitational field of the target point according to the positions of the obstacle and the target point:
wherein U is att (q) is the gravitational field, k, generated by the target point at position q att Is the gravitation coefficient of the target point, q is the position coordinate, q f The coordinates of the target point are;
constructing a repulsive force field of the obstacle:
wherein U is rep (q) is the repulsive potential field, k, generated by the obstacle rep As the repulsive force coefficient of the obstacle, ρ represents the limit distance of the influence of the repulsive force potential field, ρ 0 Indicating the current position to the obstacleShortest distance of the obstacle, total artificial potential field includes two terms, attraction potential function and repulsion potential function, and total artificial potential field U (q) is the sum of these two potential functions:
U(q)=U att (q)+U rep (q)。
6. the intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 5, wherein the specific operation of step S6 is as follows: generating a uniform random number, if the uniform random number is larger than a decision rate xi (0, 1), executing an APF weighting process, otherwise, executing an exploration-utilization method;
for APF weighting we will employ a molar neighborhood defined on a two-dimensional square lattice and consisting of a central cell (in this case, the current smart car state st) and eight cells surrounding it;
the probability of neighboring cells assigned to st is inversely proportional to their total artificial potential field, with neighboring cells having the lowest total artificial potential field having the greatest probability assigned to st+1 and neighboring cells having the highest total artificial potential field having the lowest probability assigned to st+1;
calculating probability p of selecting region i (i=1,.,. K), where k is the number of neighboring cells, the probability of which region to select is:
calculating a standard (unit) APF weighting function, APF weighting function σ:
where i=1..k, R k →(0,1) k ,p=(p 1 ,...,p k )∈R k
The cumulative probability obtained by the APF weighting function is used to select state s t Action a in (a) t First, the probabilities contained by p are ordered and then a random number between zero and one is generated, starting from the top of the list, the first neighbor cell with a cumulative probability greater than the random number is selected for state s t+1
The cumulative probability obtained by the APF weighting function is used to select state s t Action a in (a) t First, the probabilities contained by p are ordered and then a random number between zero and one is generated, starting from the top of the list, the first neighbor cell with a cumulative probability greater than the random number is selected for state s t+1
If the APF weighting process is not performed, the exploration-utilization method will be performed, generating a uniform random number, if the uniform random number is greater than the decision rate ζ, the development process will be performed, otherwise, the probing process will be performed.
7. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 6, wherein the specific operation of step S7 is as follows: once action a is selected by APF weighting, utilization or exploration procedures t Then act a is performed t Receive the reward r to reach the new state s t+1
8. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 7, wherein the updating of the Q value in the step S8 specifically comprises the following operations: it selects action a, based on state S, according to S6, reaches state S', and gets the immediate prize r, the update of Q value with mathematical formula can be expressed as:
wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:
9. the intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 8, wherein the specific operation of step S10 is as follows:
new state s t+1 Assigned to the current state s t To continue the iterative process if s t+1 And (6) ending the state, and if the current round of iteration is completed, turning to the step (6) until the target point is reached.
10. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 9, wherein the method for generating the path in step S11 is as follows:
initializing an index i adopted by a target point to be 0 and starting a point q 0 Assigned to the current state s t The iterative process of path generation is carried out next, when the next state does not reach the target point, the index i is increased by 1, the environment verification is further carried out, and if the environment is changed, the environment information Q is updated G The method comprises the steps of carrying out a first treatment on the surface of the Using learned value Q m×n Selecting the current state s t Optimum action a t Once action a has been selected t This action is performed, and then, using equation s t+1 =f(s t ,a t ) Calculating a new state s t+1 Finally, the new state s t+1 Assigned to the current state s t And the current state is assigned to the target point q i If the current state s t Eventually equal to the target point q f The algorithm returns to the starting point q 0 And target point q f No collision path between them.
CN202310692024.7A 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning Active CN116700258B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310692024.7A CN116700258B (en) 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310692024.7A CN116700258B (en) 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Publications (2)

Publication Number Publication Date
CN116700258A true CN116700258A (en) 2023-09-05
CN116700258B CN116700258B (en) 2024-05-03

Family

ID=87827236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310692024.7A Active CN116700258B (en) 2023-06-13 2023-06-13 Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Country Status (1)

Country Link
CN (1) CN116700258B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0850548A (en) * 1994-08-05 1996-02-20 Nikon Corp Method and device for learning path
CN110919659A (en) * 2019-12-24 2020-03-27 哈尔滨工程大学 Robot control method based on DDGPES
CN111515932A (en) * 2020-04-23 2020-08-11 东华大学 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning
CN112344944A (en) * 2020-11-24 2021-02-09 湖北汽车工业学院 Reinforced learning path planning method introducing artificial potential field
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
WO2022229657A1 (en) * 2021-04-30 2022-11-03 Cambridge Enterprise Limited Method and system for robot navigation in unknown environments
CN115344046A (en) * 2022-08-19 2022-11-15 南京信息工程大学 Mobile robot path planning based on improved deep Q network algorithm

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0850548A (en) * 1994-08-05 1996-02-20 Nikon Corp Method and device for learning path
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN110919659A (en) * 2019-12-24 2020-03-27 哈尔滨工程大学 Robot control method based on DDGPES
CN111515932A (en) * 2020-04-23 2020-08-11 东华大学 Man-machine co-fusion assembly line implementation method based on artificial potential field and reinforcement learning
CN112344944A (en) * 2020-11-24 2021-02-09 湖北汽车工业学院 Reinforced learning path planning method introducing artificial potential field
CN112964272A (en) * 2021-03-16 2021-06-15 湖北汽车工业学院 Improved Dyna-Q learning path planning algorithm
WO2022229657A1 (en) * 2021-04-30 2022-11-03 Cambridge Enterprise Limited Method and system for robot navigation in unknown environments
CN115344046A (en) * 2022-08-19 2022-11-15 南京信息工程大学 Mobile robot path planning based on improved deep Q network algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUO SIYU, ETAL.: "Path Planning of Coastal Ships Based on Optimized DQN Reward Function", JOURNAL OFMARINE SCIENCE AND ENGINEERING, vol. 9, no. 2, 18 February 2021 (2021-02-18), pages 210 *
许珂: "基于强化学习的智慧出行路线规划算法研究与实现", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 5, 15 May 2021 (2021-05-15), pages 22 - 34 *

Also Published As

Publication number Publication date
CN116700258B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
CN109164810B (en) Robot self-adaptive dynamic path planning method based on ant colony-clustering algorithm
CN109945881B (en) Mobile robot path planning method based on ant colony algorithm
CN108919641B (en) Unmanned aerial vehicle flight path planning method based on improved goblet sea squirt algorithm
CN112362066B (en) Path planning method based on improved deep reinforcement learning
CN111896006B (en) Path planning method and system based on reinforcement learning and heuristic search
CN110991972B (en) Cargo transportation system based on multi-agent reinforcement learning
CN110084375B (en) Multi-agent collaboration framework based on deep reinforcement learning
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
CN110471444A (en) UAV Intelligent barrier-avoiding method based on autonomous learning
CN112550314B (en) Embedded optimization type control method suitable for unmanned driving, driving control module and automatic driving control system thereof
CN112327923A (en) Multi-unmanned aerial vehicle collaborative path planning method
CN115509239B (en) Unmanned vehicle route planning method based on air-ground information sharing
CN113032904A (en) Model construction method, task allocation method, device, equipment and medium
CN113467481B (en) Path planning method based on improved Sarsa algorithm
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN113780576A (en) Cooperative multi-agent reinforcement learning method based on reward self-adaptive distribution
CN112613608A (en) Reinforced learning method and related device
CN115629607A (en) Reinforced learning path planning method integrating historical information
CN113110101B (en) Production line mobile robot gathering type recovery and warehousing simulation method and system
CN113377131B (en) Method for acquiring unmanned aerial vehicle collected data track by using reinforcement learning
CN112595326A (en) Improved Q-learning path planning algorithm with fusion of priori knowledge
CN116700258B (en) Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning
CN116796844A (en) M2 GPI-based unmanned aerial vehicle one-to-one chase game method
CN114610024B (en) Multi-agent collaborative searching energy-saving method for mountain land
CN116400726A (en) Rotor unmanned aerial vehicle escape method and system based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240408

Address after: 530022, Room 501, Room 502, Room 503, Room 505, 5th Floor, Building 2, Nanning International Tourism Center, No. 111 Fengling North Road, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant after: Wanjitai Technology Group Digital City Technology Co.,Ltd.

Country or region after: China

Address before: No. 11-3 Chongqing International Student Entrepreneurship Park, Erlang Kecheng Road, Jiulongpo District, Chongqing, 400000

Applicant before: CHONGQING RONGGUAN TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant