CN116700258A

CN116700258A - Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Info

Publication number: CN116700258A
Application number: CN202310692024.7A
Authority: CN
Inventors: 杨泽远; 杨刚; 陶发展; 何晓鹏; 熊心和; 匡海军; 宋兵; 汪洋; 杨一博
Original assignee: Chongqing Rongguan Technology Co ltd
Current assignee: Wanjitai Technology Group Digital City Technology Co ltd
Priority date: 2023-06-13
Filing date: 2023-06-13
Publication date: 2023-09-05
Anticipated expiration: 2043-06-13
Also published as: CN116700258B

Abstract

The application discloses an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which aims at the problem of 'minimum trap' existing in the traditional artificial potential field method, virtualizes a priori environment into a total artificial potential field through APF weighting, and calculates a proper learning value. The algorithm successfully solves the path planning problem in both known and unknown environments. The Q-learning method and the APF method are combined, so that the defects that a classical Q-learning method is low in learning speed, long in time consumption, incapable of learning in known and unknown environments and the like are overcome, and the phenomenon that the traditional artificial potential field method falls into local optimization is effectively avoided. Simulation results show that the APF reinforcement learning algorithm can improve learning speed, path length and path smoothness of path planning, and can avoid obstacles in moving obstacles and smoothly reach a destination.

Description

Intelligent vehicle path planning method based on artificial potential field method and reinforcement learning

Technical Field

The application relates to the technical field of automatic driving, in particular to an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning.

Background

Path planning is one of key technologies of automatic driving, and is a necessary condition for safe driving of an automatic driving automobile. The artificial potential field method (APF) is a relatively mature and real-time planning method in the path planning field, and the working principle is that environment information is abstracted into a gravitational field function and a repulsive force field function, and a collision-free path from a starting point to a target point is planned through a combined force field function. The traditional artificial potential field method cannot guarantee the efficiency of path planning, because two major defects exist: the vehicle may be subject to local minimum problems and under certain conditions the target may not be reachable.

Aiming at the problems of the traditional artificial potential field in intelligent vehicle path planning, the road boundary repulsive potential field is established to limit the running area of the vehicle; solving the problems of local optimum and unreachable targets by improving the function of the repulsive potential field of the obstacle; the information of the obstacle is described more accurately by establishing a speed repulsive force potential field, so that the problem of collision between the automobile and the obstacle is solved. Bei Shaoyi and the like invent a method for planning a local path of an unmanned vehicle based on an improved artificial potential field method (CN 202111673751.6), a distance threshold is set in an attractive force function, a repulsive force field adjusting factor is added in a repulsive force function, when the unmanned vehicle falls into a local optimal solution, a strategy of jumping out the local optimal solution based on a smaller steering angle is adopted on the premise of meeting the steering constraint of the unmanned vehicle, but the calculation amount and the complexity of an algorithm are increased. The application discloses an intelligent vehicle path planning method (CN 202011307835.3) for improving an artificial potential field algorithm by time culture and the like, which automatically adjusts the step length and the track moving out of an influence area according to the size of the influence area of a local minimum point so as to automatically escape when the intelligent vehicle falls into the local minimum point, but the calculation amount of the overall track planning is overlarge, and the planned path is not smooth enough.

The reinforcement learning algorithm is a learning method that does not require a priori knowledge. It emphasizes interactive learning with the environment and accumulates rewards by continually attempting to get environmental feedback and then selects the best action to complete the path planning task. The most widely used reinforcement learning algorithm in the path planning field is the Q-learning algorithm, which is a value-based algorithm in reinforcement learning, attempts an action in a particular state, and updates the Q value based on the instant prize or penalty it receives and its estimate of the value of the state taken. By repeatedly trying all actions in all states, then selecting the action that can obtain the maximum benefit according to the Q value.

Disclosure of Invention

The application provides an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, wherein reinforcement learning algorithm is added to the traditional artificial potential field method, and APF weighting is utilized to guide Q-learning, so that the problems of local minimum and the like existing in the traditional potential field method are overcome, the convergence speed of classical Q-learning is accelerated, and the problem of path planning of an intelligent vehicle in a complex environment is solved.

In order to achieve the above purpose, the application adopts the following technical scheme: an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning comprises the following steps:

step S1: acquiring an environment image and establishing a grid map;

step S2, defining a reinforcement learning state space and an action space;

step S3: initializing algorithm parameters;

step S4: randomly selecting an initial state from a state set;

step S5: constructing an artificial potential field, wherein the potential field is formed by overlapping an attractive potential field and a repulsive potential field;

step S6: in the action space, selecting actions by adopting an APF weighting or epsilon-greedy strategy;

step S7: executing the current action in the state s to obtain a new state s _t+ 1 and a prize r;

step S8: updating the Q value;

step S9: selecting the action with the maximum Q value in each step to obtain an optimal path;

step S10: repeating the steps 6,7 and 8 until a certain step number or a certain convergence condition is reached;

step S11: generating a sequence from the starting point q ₀ To target point q _f And (3) transmitting the finally generated path to a control center of the intelligent vehicle, and driving the intelligent vehicle according to the path.

Further, the specific operation of the step S1 is as follows: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.

Further, the specific operation of step S2 is as follows: the state space of reinforcement learning is defined as the current position coordinate and the last position coordinate of the agent, the action space is the actions in the upper, lower, left and right directions, and the agent moves a grid towards the corresponding direction after each action is executed.

Further, the algorithm parameters in the step S3 include a learning rate α e (0, 1), a discount factor γ e (0, 1), a greedy factor ζ e (0, 1), a maximum iteration number, and a reward function r; all Q values are initialized to 0.

Further, the specific operation of step S5 is as follows: constructing a gravitational field of the target point according to the positions of the obstacle and the target point:

wherein U is _att (q) is the gravitational field, k, generated by the target point at position q _att Is the gravitation coefficient of the target point, q is the position coordinate, q _f Is the coordinate of the target point.

Constructing a repulsive force field of the obstacle:

wherein U is _rep (q) is the repulsive potential field, k, generated by the obstacle _rep As the repulsive force coefficient of the obstacle, ρ represents the limit distance of the influence of the repulsive force potential field, ρ ₀ Representing the shortest distance of the current location to the obstacle.

The total artificial potential field comprises two terms, an attraction potential function and a repulsion potential function. The total artificial potential field U (q) is then the sum of these two potential functions:

U(q)＝U _att (q)+U _rep (q)

further, the specific operation of step S6 is as follows: generating uniform random numbers. If the uniform random number is greater than the decision rate ζ ε (0, 1), then APF weighting process would be performed. Otherwise, the explore-utilize method will be performed.

For APF weighting we will employ a molar neighborhood, which is defined on a two-dimensional square lattice and consists of a central cell (in this case, the current smart car state st) and eight cells surrounding it.

The probability assigned to adjacent cells of st is inversely proportional to their total artificial potential field. The neighbor cell with the lowest total artificial potential field has the highest probability of being assigned to st+1, while the neighbor cell with the highest total artificial potential field has the lowest probability of being assigned to st+1.

Calculating probability p of selecting region _i (i=1,., k), where k is the number of neighboring cells. The probability of which region to select is:

a standard (unit) APF weighting function is calculated. APF weighting function σ:

where i=1..k, R ^k →(0,1) ^k ,p＝(p ₁ ,...,p _k )∈R ^k

The cumulative probability obtained by the APF weighting function is used to select state s _t Action a in (a) _t . First, the probabilities contained in p are ordered. Then, a random number between zero and one is generated. Starting from the top of the list, a first neighbor cell with a cumulative probability greater than the random number is selected for state s _t+1 。

If the APF weighting process is not performed, a explore-utilize method will be performed. Generating uniform random numbers. If the uniform random number is greater than the decision rate ζ, then a development process will be performed. Otherwise, a probing process is performed.

Further, the specific operation of step S7 is as follows: once action a is selected by APF weighting, utilization or exploration procedures _t . Then act a is performed _t Receive the reward r to reach the new state s _t+1 。

Further, the specific operation of updating the Q value in step S8 is as follows: it selects action a, based on state S, according to S6, reaches state S', and gets the immediate prize r, the update of Q value with mathematical formula can be expressed as:

wherein the learning rate α is empirically set to 0.3, the discount coefficient γ is set to 0.8, the decision rate ζ is set to 0.2, and the rewarding function is set as follows:

further, the specific operation of step S10 is as follows:

new state s _t+1 Assigned to the current state s _t To continue the iterative process if s _t+1 And (6) ending the state, and if the current round of iteration is completed, turning to the step (6) until the target point is reached.

Further, the method for generating the path in the step S11 is as follows:

initializing an index i adopted by a target point to be 0 and starting a point q ₀ Assigned to the current state s _t An iterative process of path generation follows. When the next state does not reach the target point, the index i is increased by 1. Further execution environment verification, if the environment changes, update the environment information Q _G The method comprises the steps of carrying out a first treatment on the surface of the Using learned value Q _m×n Selecting the current state s _t Optimum action a _t . Once action a has been selected _t This action is performed. Subsequently, equation s is used _t+1 ＝f(s _t ,a _t ) Calculating a new state s _t+1 . Finally, new state s _t+1 Assigned to the current state s _t And the current state is assigned to the target point q _i . If the current state s _t Eventually equal to the target point q _f The algorithm returns to the starting point q ₀ And target point q _f No collision path between them.

The application provides an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which aims at the problem of 'minimum trap' existing in the traditional artificial potential field method, virtualizes a priori environment into a total artificial potential field through APF weighting, and calculates a proper learning value. The algorithm successfully solves the path planning problem in both known and unknown environments. The Q-learning method and the APF method are combined, so that the defects that a classical Q-learning method is low in learning speed, long in time consumption, incapable of learning in known and unknown environments and the like are overcome, and the phenomenon that the traditional artificial potential field method falls into local optimization is effectively avoided. Simulation results show that the APF reinforcement learning algorithm can improve learning speed, path length and path smoothness of path planning, and can avoid obstacles in moving obstacles and smoothly reach a destination.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description, serve to explain the principles of the application.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of the method of the present application;

FIG. 2 is a drawing of a potential field of an artificial potential field;

FIG. 3 is a repulsive potential field diagram of an artificial potential field;

FIG. 4 is a total potential field diagram of an artificial potential field;

FIG. 5 is an overall frame diagram of the proposed new algorithm of the present application;

FIG. 6 is a graph of a path planned by the improved algorithm of the present application;

fig. 7 is a step chart of each iteration of the algorithm.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Example 1

Referring to fig. 1, the application relates to an intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning, which comprises the following specific implementation steps:

step S1: acquiring an environment image and establishing a grid map;

step S2, defining a reinforcement learning state space and an action space;

step S3: initializing algorithm parameters;

step S4: randomly selecting an initial state from a state set;

step S7: executing the current action in the state s to obtain a new state st+1 and a reward r;

step S8: updating the Q value;

Wherein, step S1 specifically comprises the following steps: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.

The step S2 defines a state space of reinforcement learning as a current position coordinate and a previous position coordinate of the agent, and an action space as actions in four directions of up, down, left and right, wherein the agent moves a grid in a corresponding direction after each action is executed.

The algorithm parameters in the step S3 comprise a learning rate alpha epsilon (0, 1), a discount factor gamma epsilon (0, 1), a greedy factor zeta epsilon (0, 1), a maximum iteration number and a reward function r; all Q values are initialized to 0.

The specific operation of step S5 is as follows: the main idea of the Artificial Potential Field (APF) method is to establish an attractive potential field force around the target point and an repulsive potential field force around the obstacle. With this idea, the APF method employs attraction and repulsion components to attract a smart car to its target while keeping it away from obstacles.

Thus, the total artificial potential field comprises two terms, an attraction potential function and a repulsion potential function. The total artificial potential field U (q) is then the sum of these two potential functions:

U(q)＝U _att (q)+U _rep (q)

as shown in fig. two, constructing a gravitational field of the target point according to the positions of the obstacle and the target point:

As shown in fig. three, a repulsive force field of the obstacle is constructed:

As shown in fig. four, a total artificial potential field is formed from the attractive potential field and the repulsive potential field.

As shown in fig. five, the action selection of the improved algorithm includes the following steps:

generating uniform random numbers. If the uniform random number is greater than the decision rate ζ ε (0, 1), then APF weighting process would be performed. Otherwise, the explore-utilize method will be performed.

where i=1..k, R ^k →(0,1) ^k ,p＝(p ₁ ,...,p _k )∈R ^k

Subsequently, once action a is selected by the APF weighting, utilization or exploration procedure _t . Then act a is performed _t Receive the reward r to reach the new state s _t+1 . Then according to the formula Q (s, a) ≡Q (s, a) +alpha (r+gammam) _a axQ (s ', a') -Q (s, a)) updates the Q table.

The method for generating the path in step S11 is as follows:

initializing an index i adopted by the target point to 0, and assigning a starting point q0 to the current state s _t An iterative process of path generation follows. When the next state does not reach the target point, the index i is increased by 1. Further execution environment verification, if the environment changes, update the environment information Q _G The method comprises the steps of carrying out a first treatment on the surface of the Using learned value Q _m×n Selecting the current state s _t Optimum action a _t . Once action a has been selected _t This action is performed. Subsequently, equation s is used _t+1 ＝f(s _t ,a _t ) Calculating a new state s _t+1 . Finally, new state s _t+1 Assigned to the current state s _t And the current state is assigned to the target point q _i . If the current state s _t Eventually equal to the target point q _f The algorithm returns to the starting point q ₀ And target point q _f No collision path between them.

By using the method, the optimal driving path of the intelligent vehicle is obtained based on the example through the parameter setting, and is shown in a figure six.

The seventh illustration shows that the step size efficiency required by the intelligent vehicle to find an optimal path is higher and higher through training.

The artificial potential field combined reinforcement learning algorithm provided by the application not only effectively solves the defects of the traditional artificial potential field method, but also further accelerates the convergence rate of the algorithm and improves the efficiency of the algorithm.

Claims

1. An intelligent vehicle path planning method based on an artificial potential field method and reinforcement learning is characterized by comprising the following steps:

step S1: acquiring an environment image and establishing a grid map;

step S2, defining a reinforcement learning state space and an action space;

step S3: initializing algorithm parameters;

step S4: randomly selecting an initial state from a state set;

step S7: executing the current action in the state s to obtain a new state s _t+1 And rewards r;

step S8: updating the Q value;

2. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 1, wherein the specific operation of step S1 is as follows: an environmental image is obtained based on an intelligent vehicle camera, the image is divided into 25 multiplied by 25 grids, an environmental model is built by adopting a grid method, if an obstacle is found in the grids, the grids are defined as obstacle positions, and the intelligent vehicle cannot pass through; if the target point is found in the grid, the target point is set at the target position of the grid, and the target point is the final position to be reached by the intelligent vehicle; other grids are defined as barrier-free grids through which the intelligent vehicle can pass.

3. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 2, wherein the specific operation of step S2 is as follows: the state space of reinforcement learning is defined as the current position coordinate and the last position coordinate of the agent, the action space is the actions in the upper, lower, left and right directions, and the agent moves a grid towards the corresponding direction after each action is executed.

4. The intelligent vehicle path planning method based on artificial potential field method and reinforcement learning according to claim 3, wherein the algorithm parameters in the step S3 include learning rate α e (0, 1), discount factor γ e (0, 1), greedy factor ζ e (0, 1), maximum iteration number, rewarding function r; all Q values are initialized to 0.

5. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 4, wherein the specific operation of step S5 is as follows: constructing a gravitational field of the target point according to the positions of the obstacle and the target point:

wherein U is _att (q) is the gravitational field, k, generated by the target point at position q _att Is the gravitation coefficient of the target point, q is the position coordinate, q _f The coordinates of the target point are;

constructing a repulsive force field of the obstacle:

wherein U is _rep (q) is the repulsive potential field, k, generated by the obstacle _rep As the repulsive force coefficient of the obstacle, ρ represents the limit distance of the influence of the repulsive force potential field, ρ ₀ Indicating the current position to the obstacleShortest distance of the obstacle, total artificial potential field includes two terms, attraction potential function and repulsion potential function, and total artificial potential field U (q) is the sum of these two potential functions:

U(q)＝U _att (q)+U _rep (q)。

6. the intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 5, wherein the specific operation of step S6 is as follows: generating a uniform random number, if the uniform random number is larger than a decision rate xi (0, 1), executing an APF weighting process, otherwise, executing an exploration-utilization method;

for APF weighting we will employ a molar neighborhood defined on a two-dimensional square lattice and consisting of a central cell (in this case, the current smart car state st) and eight cells surrounding it;

the probability of neighboring cells assigned to st is inversely proportional to their total artificial potential field, with neighboring cells having the lowest total artificial potential field having the greatest probability assigned to st+1 and neighboring cells having the highest total artificial potential field having the lowest probability assigned to st+1;

calculating probability p of selecting region _i (i=1,.,. K), where k is the number of neighboring cells, the probability of which region to select is:

calculating a standard (unit) APF weighting function, APF weighting function σ:

where i=1..k, R ^k →(0,1) ^k ,p＝(p ₁ ,...,p _k )∈R ^k

The cumulative probability obtained by the APF weighting function is used to select state s _t Action a in (a) _t First, the probabilities contained by p are ordered and then a random number between zero and one is generated, starting from the top of the list, the first neighbor cell with a cumulative probability greater than the random number is selected for state s _t+1 ；

If the APF weighting process is not performed, the exploration-utilization method will be performed, generating a uniform random number, if the uniform random number is greater than the decision rate ζ, the development process will be performed, otherwise, the probing process will be performed.

7. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 6, wherein the specific operation of step S7 is as follows: once action a is selected by APF weighting, utilization or exploration procedures _t Then act a is performed _t Receive the reward r to reach the new state s _t+1 。

8. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 7, wherein the updating of the Q value in the step S8 specifically comprises the following operations: it selects action a, based on state S, according to S6, reaches state S', and gets the immediate prize r, the update of Q value with mathematical formula can be expressed as:

9. the intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 8, wherein the specific operation of step S10 is as follows:

10. The intelligent vehicle path planning method based on the artificial potential field method and reinforcement learning according to claim 9, wherein the method for generating the path in step S11 is as follows:

initializing an index i adopted by a target point to be 0 and starting a point q ₀ Assigned to the current state s _t The iterative process of path generation is carried out next, when the next state does not reach the target point, the index i is increased by 1, the environment verification is further carried out, and if the environment is changed, the environment information Q is updated _G The method comprises the steps of carrying out a first treatment on the surface of the Using learned value Q _m×n Selecting the current state s _t Optimum action a _t Once action a has been selected _t This action is performed, and then, using equation s _t+1 ＝f(s _t ,a _t ) Calculating a new state s _t+1 Finally, the new state s _t+1 Assigned to the current state s _t And the current state is assigned to the target point q _i If the current state s _t Eventually equal to the target point q _f The algorithm returns to the starting point q ₀ And target point q _f No collision path between them.