CN114442630A - Intelligent vehicle planning control method based on reinforcement learning and model prediction - Google Patents

Intelligent vehicle planning control method based on reinforcement learning and model prediction Download PDF

Info

Publication number
CN114442630A
CN114442630A CN202210088325.4A CN202210088325A CN114442630A CN 114442630 A CN114442630 A CN 114442630A CN 202210088325 A CN202210088325 A CN 202210088325A CN 114442630 A CN114442630 A CN 114442630A
Authority
CN
China
Prior art keywords
vehicle
intelligent vehicle
intelligent
potential field
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210088325.4A
Other languages
Chinese (zh)
Other versions
CN114442630B (en
Inventor
陈剑
戚子恒
王通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210088325.4A priority Critical patent/CN114442630B/en
Publication of CN114442630A publication Critical patent/CN114442630A/en
Application granted granted Critical
Publication of CN114442630B publication Critical patent/CN114442630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0231Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means
    • G05D1/0238Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors
    • G05D1/024Control of position or course in two dimensions specially adapted to land vehicles using optical position detecting means using obstacle or wall sensors in combination with a laser
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0214Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory in accordance with safety or protection criteria, e.g. avoiding hazardous areas
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0223Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving speed control of the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0257Control of position or course in two dimensions specially adapted to land vehicles using a radar
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0276Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle
    • G05D1/0278Control of position or course in two dimensions specially adapted to land vehicles using signals provided by a source external to the vehicle using satellite positioning signals, e.g. GPS
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses an intelligent vehicle planning control method based on reinforcement learning and model prediction. The method comprises the following steps: acquiring and calculating road boundary information and obstacle information under a vehicle body coordinate system through a vehicle-mounted laser radar sensor; acquiring and calculating by using a vehicle-mounted GPS sensor to obtain a global reference waypoint under a vehicle body coordinate system; building a virtual scene where the intelligent vehicle is located; under a virtual scene of the intelligent vehicle, based on road boundary information, obstacle information and global reference road points under a vehicle body coordinate system, a path generation module is used for planning a path of the intelligent vehicle to obtain a planned path of the intelligent vehicle; and tracking the planned path of the intelligent vehicle by using the tracking control module, thereby realizing the planning control of the intelligent vehicle. The invention promotes the network training of the planning part, ensures the path planning effect of the intelligent vehicle when the intelligent vehicle is not positioned accurately, and promotes the stability and the comfort of the vehicle body movement.

Description

Intelligent vehicle planning control method based on reinforcement learning and model prediction
Technical Field
The invention belongs to an intelligent vehicle planning control method in the field of automatic driving of intelligent vehicles, and particularly relates to an intelligent vehicle planning control method based on reinforcement learning and model prediction in a weak GPS environment.
Background
With the development of economy and the improvement of the technical level of the automobile industry in recent years, the quantity of automobile keeping is continuously increased, and the problems of traffic accidents, traffic jams, exhaust emission, drowsiness of drivers and the like are aggravated. The unmanned automobile has the advantages of energy conservation, environmental protection, comfort, high efficiency and the like, is an important trend of automobile development in the future, and is highly valued by countries in the world.
Path planning and tracking control are key technologies for autonomous driving. For the path planning module, the planning effect depends heavily on a high-precision map and high-precision positioning equipment. Compared with the traditional electronic map with the precision of meter level, the centimeter-level high-precision map can show the details of the number, the shape, the width and the like of the lanes of the road more truly, and can help the intelligent vehicle to plan and make decisions more accurately. However, the information collection, quality detection, operation and maintenance processes in the high-precision map making process make the cost of drawing and maintenance expensive. Meanwhile, because GPS signals are easily inaccurate in positioning or lost due to weather, high buildings, tunnels and the like, high-precision positioning equipment is often required to be equipped with expensive IMU equipment for auxiliary positioning, and the popularization of the intelligent vehicle are greatly hindered. The difficulty with tracking control modules is how to deal with the non-linear behavior of the vehicle system and with the constraint problems in the state variables and manipulated variables while tracking the path. Meanwhile, errors are easily introduced when the sensor senses the motion state of the vehicle body, and robustness of the controller under error interference needs to be ensured.
In recent years, reinforcement learning has been highly successful in fields such as image recognition, voice recognition, robots, and the like. Q learning develops from reinforcement learning. In Q learning, there is one body with states and corresponding actions. At any time, the agent is in some feasible state. In the next time step, the state is switched by performing certain operations. This action is accompanied by a reward or penalty. The objective of the broker is to maximize the bonus benefits. The algorithm can interact with its environment through constant trial and error in an initially unknown environment, it directs the vehicle to take action continuously, maximizing its return from the environment, and then find a collision-free path that avoids the obstacle.
The DDPG (deep Deterministic Policy gradient) algorithm references the network structure of Actor-Critic, and adopts the method of an experience playback pool in the DQN (deep Q network) algorithm to establish a database named as the experience pool to store the data of interaction between the intelligent agent and the environment. During training, the agent can randomly select training data from the experience pool to train the neural network, so that the correlation of the training data in time is prevented, and the training efficiency and the sample utilization rate are effectively improved.
Model Predictive Control (MPC) is an effective method to conveniently deal with multivariable constraint Control problem, and has been widely used in industrial systems. In recent years, MPC has been extended to the problem of moving body tracking control to achieve a predetermined goal in a suboptimal manner based on satisfying system constraints. In this control scheme, the control sequence is recalculated at each sample time, minimizing the cost function under the input state constraint. After the first control input of the sequence is applied to the system, the online optimization problem is repeated at the next time step based on the latest system state.
Disclosure of Invention
In order to solve the problem of inaccurate positioning of the intelligent vehicle in the background art, the invention provides an intelligent vehicle planning control method based on reinforcement learning and model prediction, and the existing planning and control algorithm is improved so as to improve the stability and comfort of the intelligent vehicle when the positioning is not accurate.
The technical scheme adopted by the invention is as follows:
the invention comprises the following steps:
step 1: obtaining an obstacle grid map through a vehicle-mounted laser radar sensor, determining road boundary information and obstacle information around a vehicle body in a laser radar sensor coordinate system based on the obstacle grid map, and then obtaining the road boundary information and the obstacle information in the vehicle body coordinate system after coordinate conversion;
step 2: acquiring global reference waypoints under a coordinate system of the vehicle-mounted GPS sensor by using the vehicle-mounted GPS sensor, acquiring vehicle body positioning and motion states by using the vehicle-mounted GPS sensor, and finally performing coordinate conversion on the global reference waypoints based on the vehicle body positioning and the motion states to acquire the global reference waypoints under a vehicle body coordinate system;
and step 3: building a virtual scene where the intelligent vehicle is located by the barrier grid map and the global reference waypoints;
and 4, step 4: under a virtual scene of the intelligent vehicle, based on road boundary information, obstacle information and global reference road points under a vehicle body coordinate system, a path generation module is used for planning a path of the intelligent vehicle to obtain a planned path of the intelligent vehicle;
and 5: and tracking the planned path of the intelligent vehicle by using the tracking control module, thereby realizing the planning control of the intelligent vehicle.
The path generation module in the step 4 is obtained by training the following steps:
s1: the training stage of the reinforcement learning agent based on the DDPG is divided into an initial stage, an intermediate stage and a final stage in sequence; the system comprises a first state space input in an initial stage, a second state space input in an intermediate stage, a third state space input in a final stage and a third state space input in a final stage, wherein the first state space input in the initial stage consists of the distance from an intelligent vehicle to the left and right boundaries of a road and the position of an accurate global reference waypoint in a vehicle body coordinate system;
s2: constructing an action space which is the corner delta of the front wheel of the intelligent vehiclef
S3: and forming a training set by the action space and different state spaces to train the reinforcement learning intelligent body based on the DDPG, setting reward and punishment values and supervising the training process to obtain the well-trained reinforcement learning intelligent body.
The reward and punishment values include a reward value R that reaches an endpointarrivePunishment value R of intelligent vehicle collisioncollisionAnd intermediate state reward and punishment value Rtemp
-a reward or punishment value R of said intermediate statetempObtained by the following steps:
a1: respectively distributing corresponding potential field functions to the road boundary, the obstacle and the global reference waypoint in each training stage by using a potential field method;
a2: respectively calculating corresponding road boundary potential according to the three potential field functionsField PRBarrier potential field POAnd an accurate global reference waypoint potential field PWAnd inaccurate global reference waypoint potential field PW′After the corresponding potential fields in the training stage are superposed, the total potential field P of the current training stage is obtainedUAnd as reward and punishment value R of intermediate statetemp
A3: in the training process, according to the total potential field PUThe three-dimensional gradient diagram is characterized in that potential field parameters of all potential field functions in each training stage in A1 are set by a path planning method based on a potential field method, the total potential field of each training stage is updated according to the set potential field functions, and the updated total potential field is used as a reward and punishment value R of the intermediate state of each training stagetemp
In the tracking control module in the step 5, firstly, a vehicle dynamic model is established according to the intelligent vehicle, and then a prediction equation of the vehicle state is established based on the vehicle dynamic model;
then, according to a prediction equation of the vehicle state, a target optimization function and a constraint condition are established by using a model prediction control algorithm, and a path tracking controller is further established;
and finally, tracking the planned path of the intelligent vehicle by using the path tracking controller, thereby realizing the planning control of the intelligent vehicle.
The target optimization function is as follows:
Figure BDA0003488117330000031
the constraint conditions of the objective optimization function are as follows:
Δumin≤Δu(k|t)≤Δumax
umin≤u(k|t)≤umax
ymin≤y(k|t)≤ymax
βmin≤β(k|t)≤βmax
k=t,…,t+Np-1
y(t+Np|t)-r(t+Np|t)∈Ω
wherein, minU(t)J represents the operation of taking a control quantity set of the front wheel turning angle of the vehicle when the target optimization value of the intelligent vehicle is minimum in the prediction time domain corresponding to the time t; j represents a target optimization value of the intelligent vehicle, and U (t) represents a control quantity set of a vehicle front wheel corner in a prediction time domain corresponding to the time t;
Figure BDA0003488117330000041
representing the operation of calculating the norm squared based on the first weight matrix Q,
Figure BDA0003488117330000042
representing the operation of calculating the norm squared based on the second weight matrix R,
Figure BDA0003488117330000043
denotes an operation of calculating a norm square based on the third weight matrix P, y (t + i | t) denotes a predicted value of the ith vehicle-state yaw angle and lateral position at time t, r (t + i | t) denotes a predicted value of the ith vehicle-state yaw angle and lateral position at time t, u (t + i | t) denotes an ith control quantity at time t, and y (t + N | t) denotes a predicted value of the ith vehicle-state yaw angle and lateral position at time tp| t) denotes the Nth at time tpPredicted values of yaw angle and lateral position of individual vehicle states, r (t + N)p| t) denotes the Nth at time tpExpected values of yaw angle and lateral position, N, for individual vehicle statespFor predicting the time domain Q, R, P are the first, second and third weighting coefficients, DeltaumaxA right limit increment for a vehicle front wheel steering angle; Δ uminA left limit increment for a vehicle front wheel steering angle; Δ u (k | t) represents a control increment of the vehicle front wheel steering angle at the time k at the current time t, u (k | t) is a control amount of the vehicle front wheel steering angle at the time k at the current time t, and u (k | t) is a control amount of the vehicle front wheel steering angle at the time k at the current time tmaxA right extreme position of a vehicle front wheel corner; u. ofminA left extreme position of a vehicle front wheel corner; y (k | t) represents the vehicle state yaw angle and lateral position at time k at the current time t, yminThe minimum value of the vehicle state yaw angle and the lateral position; y ismaxBeta (k | t) represents the vehicle at time k at the current time t, which is the maximum value of the yaw angle and the lateral position of the vehicle stateA centroid slip angle; beta is aminAnd betamaxThe minimum value and the maximum value of the vehicle mass center slip angle are respectively, and omega represents a terminal constraint domain.
And the terminal constraint domain in the objective optimization function is subjected to linearization preprocessing.
The invention has the beneficial effects that:
the invention provides a planning control method aiming at the scene of inaccurate intelligent vehicle positioning, which comprises a path planning method based on DDPG reinforcement learning and a path tracking method based on model prediction control, namely a path generating module and a tracking control module.
In the path planning method, the path generation of the intelligent vehicle under the inaccurate positioning scene is realized based on the DDPG algorithm, and the safety and the smoothness of the path are ensured. The reward and punishment value of the DDPG is improved by a potential field method, and the training stage is divided into an initial stage, an intermediate stage and a final stage, so that the convergence speed and the training efficiency of the algorithm are improved.
In the tracking control method, the path tracking controller is realized based on a model prediction control algorithm, and the terminal cost and the terminal constraint are added into the target optimization function, so that the stability and the control precision of the control system are improved. And the terminal constraint domain is linearized, so that the real-time performance of the intelligent vehicle control system is ensured.
The planning control algorithm combining the path planning method and the tracking control method can smoothly complete obstacle avoidance in a scene where the intelligent vehicle is inaccurately positioned, complete a navigation task safely according to a designed path, and can ensure the smoothness and stability of a track.
Drawings
Fig. 1 is a schematic diagram of the offset of an acquired reference point.
FIG. 2 is a schematic diagram of the misalignment of the vehicle body causing the reference waypoints to shift.
Fig. 3 is a schematic diagram of a DDPG network structure.
Fig. 4 is a virtual environment path generation flow block diagram.
FIG. 5 is a smart vehicle kinematics model.
FIG. 6 is a schematic diagram of path generation in a virtual environment.
FIG. 7 is a vehicle dynamics model.
FIG. 8 is a graph of reward functions for reinforcement learning training.
Fig. 9 is a planning control implementation flow of the present invention.
FIG. 10 is a diagram of a smart vehicle motion profile when positioning is inaccurate.
FIG. 11 is a graph of three methods of centroid cornering angle variation when positioning is inaccurate.
Fig. 12 shows lateral acceleration changes in three methods when the positioning step is accurate.
Detailed Description
The invention will be further illustrated and described with reference to specific embodiments. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
As shown in fig. 9, the present invention includes the steps of:
step 1: the intelligent vehicle is provided with a laser radar sensor and a GPS sensor. Obtaining an obstacle grid map through a vehicle-mounted laser radar sensor, determining road boundary information and obstacle information around a vehicle body in a laser radar sensor coordinate system based on the obstacle grid map, and then obtaining the road boundary information and the obstacle information in the vehicle body coordinate system after coordinate conversion; the obstacle information is specifically the position of the nearest obstacle in front of the intelligent vehicle.
Step 2: acquiring global reference waypoints under a coordinate system of the vehicle-mounted GPS sensor by using the vehicle-mounted GPS sensor, acquiring vehicle body positioning and motion states by using the vehicle-mounted GPS sensor, and finally performing coordinate conversion on the global reference waypoints based on the vehicle body positioning and the motion states to acquire the global reference waypoints under a vehicle body coordinate system; signals of the vehicle-mounted GPS sensor may be interfered by the environment and shifted, which causes shifting of the acquired global reference point, as shown in fig. 1. The signal of the vehicle-mounted GPS sensor is interfered, and the vehicle body is inaccurately positioned, so that the global reference waypoint in the vehicle body coordinate system is offset, as shown in fig. 2.
And step 3: establishing a virtual scene where the intelligent vehicle is located by the barrier grid map and the global reference waypoints;
and 4, step 4: as shown in fig. 4, in a virtual scene of the intelligent vehicle, based on road boundary information, obstacle information and global reference waypoints in a vehicle body coordinate system, a path generation module is used to plan a path of the intelligent vehicle, so as to obtain a planned path of the intelligent vehicle; the kinematic model of the intelligent vehicle is shown in fig. 5, and the generated planned path in the virtual environment is shown in fig. 6.
The path generation module in the step 4 is obtained by training the following steps:
s1: the network structure of the reinforcement learning agent based on the DDPG is shown in FIG. 3, and the training stage of the reinforcement learning agent based on the DDPG is divided into an initial stage, an intermediate stage and a final stage in sequence from simple to difficult according to a training scene; wherein, the distance d from the intelligent vehicle to the left and right borders of the road in the first state space input in the initial stagelAnd drAnd the position d of the accurate global reference waypoint in the body coordinate systemwxAnd dwyThe second state space input in the intermediate stage consists of the first state space and the position d of the nearest obstacle in front of the intelligent vehicle in the coordinate system of the vehicle bodyoxAnd doyThe third state space input in the final stage comprises the distance from the intelligent vehicle to the left and right boundaries of the road, the position of the nearest obstacle in front of the intelligent vehicle in the vehicle body coordinate system and the position d of the inaccurate reference waypoint in the vehicle body coordinate systemwx′And dwy′Composition is carried out; i.e. the third state space s ═ { d ═ dl,dr,dox,doy,dwx′,dwy′}。
S2: constructing an action space which is the corner delta of the front wheel of the intelligent vehiclef
S3: forming a training set by the action space and different state spaces to train the reinforcement learning intelligent body based on the DDPG, setting reward and punishment values and supervising the training process to obtain the trained reinforcement learning intelligent body;
the reward and punishment values include a reward value R reaching an endpointarrivePunishment value R of intelligent vehicle collisioncollisionAnd reward punishment values of intermediate statesRtemp
Reward and punishment value R of intermediate statetempObtained by the following steps:
a1: respectively distributing corresponding potential field functions to the road boundary, the obstacle and the global reference waypoint in each training stage by using a potential field method;
a2: respectively calculating corresponding road boundary potential fields P according to the three potential field functionsRBarrier potential field POAnd an accurate global reference waypoint potential field PWAnd inaccurate global reference waypoint potential field PW′After the corresponding potential fields in the training stage are superposed, the total potential field P of the current training stage is obtainedUAnd as reward and punishment value R of intermediate statetemp(ii) a I.e. the reward penalty value R of the intermediate state of the final stagetemp=PR+PO+PW′
The potential field function of a road boundary is:
Figure BDA0003488117330000071
wherein, PR(dl,dr) Is the road boundary potential field, aRIs the strength parameter of the potential field, dsThe safe distance from the intelligent vehicle to the road boundary.
The potential field function of an obstacle is:
Figure BDA0003488117330000072
wherein, PO(dox,doy) Is an obstacle potential field, aoAnd boRespectively, an intensity parameter and a shape parameter of the barrier potential function. XsAnd YsRespectively representing the safe distance between the vehicle and the obstacle in the longitudinal direction and the transverse direction, wherein the longitudinal direction is the driving direction of the intelligent vehicle, the direction perpendicular to the driving direction of the intelligent vehicle is the transverse direction, and the longitudinal direction and the transverse direction are in the horizontal plane and defined as follows:
Xs=X0-vT0
Ys=Y0+(υsinθeosinθe)T0
wherein, X0And Y0Representing the minimum safety distance, T, in the longitudinal and transverse directions, respectively0Is the safe time interval, v is the speed of the intelligent vehicle, voIs the speed of the obstacle, θeIs the heading angle deviation between the smart car and the obstacle.
The potential field functions of the accurate and inaccurate global reference waypoints are the same, wherein the potential field function of the global reference waypoint is:
Figure BDA0003488117330000073
wherein, PW(dwy) Is an accurate global reference waypoint potential field, daError range, a, referring to the lateral position of the global reference waypointwIs the potential field strength of the global reference waypoint.
A3: in the training process, according to the total potential field PUThe three-dimensional gradient diagram is characterized in that potential field parameters of all potential field functions in each training stage in A1 are set by a path planning method based on a potential field method, the total potential field of each training stage is updated according to the set potential field functions, and the updated total potential field is used as a reward and punishment value R of the intermediate state of each training stagetemp
And 5: and tracking the planned path of the intelligent vehicle by using the tracking control module, thereby realizing the planning control of the intelligent vehicle.
In the tracking control module in the step 5, firstly, a vehicle dynamic model is established according to the intelligent vehicle, and then a prediction equation of the vehicle state is established based on the vehicle dynamic model; the vehicle dynamics model is shown in fig. 7.
Then, according to a prediction equation of the vehicle state, a target optimization function with terminal constraint and terminal cost and constraint conditions are established by using a model prediction control algorithm, and a path tracking controller is further established;
and finally, tracking the planned path of the intelligent vehicle by controlling the corner of the front wheel of the vehicle by using a path tracking controller, thereby realizing the planning control of the intelligent vehicle.
The objective optimization function with terminal constraints and terminal costs is:
Figure BDA0003488117330000081
the constraint conditions of the objective optimization function are as follows:
Δumin≤Δu(k|t)≤Δumax
umin≤u(k|t)≤umax
ymin≤y(k|t)≤ymax
βmin≤β(k|t)≤βmax
k=t,…,t+Np-1
y(t+Np|t)-r(t+Np|t)∈Ω
wherein the content of the first and second substances,
Figure BDA0003488117330000082
a terminal cost for joining; y (t + N)p|t)-r(t+NpAnd l t) belongs to omega and is the added terminal constraint. minU(t)J represents the operation of taking a control quantity set of the front wheel turning angle of the vehicle when the target optimization value of the intelligent vehicle is minimum in the prediction time domain corresponding to the time t; j represents a target optimization value of the intelligent vehicle, reflects the requirements for path tracking errors and stable change of control quantity in a certain time domain in the future, and U (t) represents a control quantity set of a vehicle front wheel corner in a corresponding prediction time domain at the moment t;
Figure BDA0003488117330000083
representing the operation of calculating the norm squared based on the first weight matrix Q,
Figure BDA0003488117330000084
representing the operation of calculating the norm squared based on the second weight matrix R,
Figure BDA0003488117330000085
representing the operation of calculating the norm squared based on the third weight matrix P,
Figure BDA0003488117330000086
represents the operation of calculating the weight of the tracking error of the intelligent vehicle based on the first weight matrix Q at the ith moment under the t moment,
Figure BDA0003488117330000087
the operation of calculating the weight of the control stationarity of the intelligent vehicle based on the second weight matrix R at the ith moment under the t moment is shown,
Figure BDA0003488117330000088
denotes the Nth at time tpCalculating the tracking error weight of the intelligent vehicle based on the third weight matrix P at each moment,
Figure BDA0003488117330000089
reflecting the requirements for path tracking errors,
Figure BDA00034881173300000810
reflecting the requirement for smooth change of the controlled variable, y (t + i | t) represents the predicted value of the ith vehicle state yaw angle and the transverse position at the time t, r (t + i | t) represents the expected value of the ith vehicle state yaw angle and the transverse position at the time t, the predicted values of the vehicle state yaw angle and the transverse position are obtained through the planned path of the intelligent vehicle, u (t + i | t) represents the ith controlled variable at the time t, and y (t + N | t) represents the ith controlled variable at the time tp| t) denotes the Nth at time tpPredicted values of yaw angle and lateral position of individual vehicle states, r (t + N)pT) represents the Nth time at tpExpected values of yaw angle and lateral position, N, for individual vehicle statespFor predicting the time domain Q, R, P are the first, second and third weighting coefficients, DeltaumaxA right limit increment for a vehicle front wheel steering angle; Δ uminA left limit increment for a vehicle front wheel steering angle; Δ u (k | t) represents a control increment of the vehicle front wheel steering angle at the time k at the current time t, and u (k | t) is a control of the vehicle front wheel steering angle at the time k at the current time tSystem amount of umaxA right extreme position of a vehicle front wheel corner; u. ofminA left extreme position of a vehicle front wheel corner; y (k | t) represents the vehicle state yaw angle and lateral position at time k at current time t, yminThe minimum value of the vehicle state yaw angle and the lateral position; y ismaxBeta (k | t) represents the vehicle mass center slip angle at the moment k at the current moment t, and is the maximum value of the vehicle state yaw angle and the lateral position; beta is aminAnd betamaxThe minimum value and the maximum value of the vehicle mass center slip angle are respectively, and omega represents a terminal constraint domain.
And a terminal constraint domain in the objective optimization function is subjected to linearization preprocessing, so that the real-time performance of the control system is ensured.
In this example, the training environment is a joint simulation of MATLAB/Simulink and Carsim. And designing a network structure, a state space, an action space and an incentive function of the reinforcement learning algorithm in MATLAB/Simulink, and obtaining a vehicle model with high precision and high truth in Carsim.
And after the potential field design is finished, setting the potential field parameters by using a path planning method of a potential field method. And if the planned path does not meet the safety requirement, adjusting potential field parameters.
When a reinforcement learning training scene is set, the training scene is divided into three stages from simple to difficult. The initial stage comprises only road barriers and accurate reference waypoints; an intermediate stage, in which obstacles are added to the initial stage; and in the final stage, inaccurate reference waypoints are added in the intermediate stage.
The result of the reinforcement learning training is shown in fig. 8, and both the network training effect and the convergence rate of the method are improved compared with the traditional DDPG network.
The controller provided by the invention is tested under the condition of double-shift line, noise is added into the yaw velocity and the transverse velocity, and the tracking effect of the controller is compared with the tracking effect of the traditional model prediction control method. The Mean Absolute Error (MAE) of its tracking effect is given by the following table:
table 1: mean absolute error of tracking effect (MAE)
Figure BDA0003488117330000101
As can be seen from the table 1, the tracking accuracy of the tracking control method provided by the invention is improved compared with that of the traditional model prediction control method when no error exists, yaw velocity noise exists and transverse velocity noise exists.
The path planning method and the tracking control method provided by the invention are combined to deal with the scene of inaccurate vehicle body positioning, and the implementation flow is shown in fig. 9. Fig. 10 is a comparison of planning control effects in a scenario where a designed reference waypoint is inaccurate and a car body positioning step is accurate, where frame a is a planning control method proposed by the present invention, frame B is a traditional DDPG planning and pure tracking control method, and PF + MPC is a planning and model prediction control tracking method of a potential field method. In fig. 11, (a), (b), and (c) are respectively the change of the centroid slip angle in the three methods in sequence, and in fig. 12, (a), (b), and (c) are respectively the change of the lateral acceleration in the three methods in sequence, so as to reflect the stability and comfort of the trajectory. Table 2 statistical analysis was performed on the experimental data.
Table 2: table for analyzing experimental results of the present invention and other methods
Figure BDA0003488117330000102
As can be seen from fig. 9, fig. 10, fig. 11, fig. 12 and table 2, the planning control method designed by the present invention can make the intelligent vehicle have a more comfortable and stable motion state when the positioning is not accurate.

Claims (7)

1. An intelligent vehicle planning control method based on reinforcement learning and model prediction is characterized by comprising the following steps:
step 1: obtaining an obstacle grid map through a vehicle-mounted laser radar sensor, determining road boundary information and obstacle information around a vehicle body in a laser radar sensor coordinate system based on the obstacle grid map, and then obtaining the road boundary information and the obstacle information in the vehicle body coordinate system after coordinate conversion;
step 2: acquiring global reference waypoints under a coordinate system of the vehicle-mounted GPS sensor by using the vehicle-mounted GPS sensor, acquiring vehicle body positioning and motion states by using the vehicle-mounted GPS sensor, and finally performing coordinate conversion on the global reference waypoints based on the vehicle body positioning and the motion states to acquire the global reference waypoints under a vehicle body coordinate system;
and step 3: building a virtual scene where the intelligent vehicle is located by the barrier grid map and the global reference waypoints;
and 4, step 4: under a virtual scene of the intelligent vehicle, based on road boundary information, obstacle information and global reference road points under a vehicle body coordinate system, a path generation module is used for planning a path of the intelligent vehicle to obtain a planned path of the intelligent vehicle;
and 5: and tracking the planned path of the intelligent vehicle by using the tracking control module, thereby realizing the planning control of the intelligent vehicle.
2. The intelligent vehicle planning control method based on reinforcement learning and model prediction as claimed in claim 1, wherein the path generation module in step 4 is obtained by training through the following steps:
s1: the training stage of the reinforcement learning agent based on the DDPG is divided into an initial stage, an intermediate stage and a final stage in sequence; the system comprises a first state space input in an initial stage, a second state space input in an intermediate stage, a third state space input in a final stage and a third state space input in a final stage, wherein the first state space input in the initial stage consists of the distance from an intelligent vehicle to the left and right boundaries of a road and the position of an accurate global reference waypoint in a vehicle body coordinate system;
s2: constructing an action space which is the corner delta of the front wheel of the intelligent vehiclef
S3: and forming a training set by the action space and different state spaces to train the reinforcement learning intelligent body based on the DDPG, setting reward and punishment values and supervising the training process to obtain the well-trained reinforcement learning intelligent body.
3. The smart vehicle planning control method based on reinforcement learning and model prediction as claimed in claim 2, wherein the reward and punishment value includes a reward value R to an endpointarrivePunishment value R of intelligent vehicle collisioncollisionAnd intermediate state reward and punishment value Rtemp
4. The intelligent vehicle planning control method based on reinforcement learning and model prediction as claimed in claim 3, wherein the reward and punishment value R of the intermediate statetempObtained by the following steps:
a1: respectively distributing corresponding potential field functions to the road boundary, the obstacle and the global reference waypoint in each training stage by using a potential field method;
a2: respectively calculating corresponding road boundary potential fields P according to the three potential field functionsRBarrier potential field POAnd an accurate global reference waypoint potential field PWAnd inaccurate global reference waypoint potential field PW′After the corresponding potential fields in the training stage are superposed, the total potential field P of the current training stage is obtainedUAnd as reward and punishment value R of intermediate statetemp
A3: in the training process, according to the total potential field PUThe three-dimensional gradient diagram is characterized in that potential field parameters of all potential field functions in each training stage in A1 are set by a path planning method based on a potential field method, the total potential field of each training stage is updated according to the set potential field functions, and the updated total potential field is used as a reward and punishment value R of the intermediate state of each training stagetemp
5. The intelligent vehicle planning control method based on reinforcement learning and model prediction as claimed in claim 1, wherein the tracking control module of step 5 firstly establishes a vehicle dynamics model according to the intelligent vehicle, and then establishes a prediction equation of the vehicle state based on the vehicle dynamics model;
then, according to a prediction equation of the vehicle state, a target optimization function and a constraint condition are established by using a model prediction control algorithm, and a path tracking controller is further established;
and finally, tracking the planned path of the intelligent vehicle by using the path tracking controller, thereby realizing the planning control of the intelligent vehicle.
6. The intelligent vehicle planning control method based on reinforcement learning and model prediction as claimed in claim 5, wherein the objective optimization function is:
Figure FDA0003488117320000021
the constraint conditions of the objective optimization function are as follows:
Δumin≤Δu(k|t)≤Δumax
umin≤u(k|t)≤umax
ymin≤y(k|t)≤ymax
βmin≤β(k|t)≤βmax
k=t,…,t+Np-1
y(t+Np|t)-r(t+Np|t)∈Ω
wherein, minU(t)J represents the operation of taking a control quantity set of the front wheel turning angle of the vehicle when the target optimization value of the intelligent vehicle is minimum in the prediction time domain corresponding to the time t; j represents a target optimization value of the intelligent vehicle, and U (t) represents a control quantity set of a vehicle front wheel corner in a prediction time domain corresponding to the time t;
Figure FDA0003488117320000031
representing the operation of calculating the norm squared based on the first weight matrix Q,
Figure FDA0003488117320000032
representing the operation of calculating the norm squared based on the second weight matrix R,
Figure FDA0003488117320000033
denotes an operation of calculating a norm square based on the third weight matrix P, y (t + i | t) denotes a predicted value of the ith vehicle-state yaw angle and lateral position at time t, r (t + i | t) denotes a predicted value of the ith vehicle-state yaw angle and lateral position at time t, u (t + i | t) denotes an ith control quantity at time t, and y (t + N | t) denotes a predicted value of the ith vehicle-state yaw angle and lateral position at time tp| t) denotes the Nth at time tpPredicted values of yaw angle and lateral position of individual vehicle states, r (t + N)p| t) denotes the Nth at time tpExpected values of yaw angle and lateral position, N, for individual vehicle statespFor predicting the time domain Q, R, P are the first, second and third weighting coefficients, DeltaumaxA right limit increment for a vehicle front wheel corner; Δ uminA left limit increment for a vehicle front wheel steering angle; Δ u (k | t) represents a control increment of the vehicle front wheel steering angle at the time k at the current time t, u (k | t) is a control amount of the vehicle front wheel steering angle at the time k at the current time t, and u (k | t) is a control amount of the vehicle front wheel steering angle at the time k at the current time tmaxA right extreme position of a vehicle front wheel corner; u. ofminA left extreme position of a vehicle front wheel corner; y (k | t) represents the vehicle state yaw angle and lateral position at time k at current time t, yminThe minimum value of the vehicle state yaw angle and the lateral position; y ismaxBeta (k | t) represents the vehicle mass center slip angle at the moment k at the current moment t, and is the maximum value of the vehicle state yaw angle and the lateral position; beta is aminAnd betamaxThe minimum value and the maximum value of the vehicle mass center slip angle are respectively, and omega represents a terminal constraint domain.
7. The intelligent vehicle planning control method based on reinforcement learning and model prediction as claimed in claim 6, wherein the terminal constraint domain in the objective optimization function is subjected to linearization preprocessing.
CN202210088325.4A 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction Active CN114442630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210088325.4A CN114442630B (en) 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210088325.4A CN114442630B (en) 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction

Publications (2)

Publication Number Publication Date
CN114442630A true CN114442630A (en) 2022-05-06
CN114442630B CN114442630B (en) 2023-12-05

Family

ID=81368785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210088325.4A Active CN114442630B (en) 2022-01-25 2022-01-25 Intelligent vehicle planning control method based on reinforcement learning and model prediction

Country Status (1)

Country Link
CN (1) CN114442630B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114578834A (en) * 2022-05-09 2022-06-03 北京大学 Target layered double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN115540896A (en) * 2022-12-06 2022-12-30 广汽埃安新能源汽车股份有限公司 Path planning method, path planning device, electronic equipment and computer readable medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN112650237A (en) * 2020-12-21 2021-04-13 武汉理工大学 Ship path planning method and device based on clustering processing and artificial potential field
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799386A (en) * 2019-10-25 2021-05-14 中国科学院沈阳自动化研究所 Robot path planning method based on artificial potential field and reinforcement learning
CN110794842A (en) * 2019-11-15 2020-02-14 北京邮电大学 Reinforced learning path planning algorithm based on potential field
CN112666939A (en) * 2020-12-09 2021-04-16 深圳先进技术研究院 Robot path planning algorithm based on deep reinforcement learning
CN112650237A (en) * 2020-12-21 2021-04-13 武汉理工大学 Ship path planning method and device based on clustering processing and artificial potential field

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUNQIANG LIN: "APF-DPPO: an automatic driving policy learning method based on the artificial potential field method ot optimize the reward function", MACHINES *
刘和祥;边信黔;秦政;王宏健;: "基于前视声呐信息的AUV避碰规划研究", 系统仿真学报, no. 24 *
王通: "基于强化学习的智能车低成本导航", 中国优秀硕士学位论文全文数据库工程科技Ⅱ辑, pages 035 - 484 *
韩光信: "约束非完整移动机器人轨迹跟踪的非线性预测控制", 吉林大学学报(工学版), pages 177 - 181 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114578834A (en) * 2022-05-09 2022-06-03 北京大学 Target layered double-perception domain-based reinforcement learning unmanned vehicle path planning method
CN115540896A (en) * 2022-12-06 2022-12-30 广汽埃安新能源汽车股份有限公司 Path planning method, path planning device, electronic equipment and computer readable medium
CN115540896B (en) * 2022-12-06 2023-03-07 广汽埃安新能源汽车股份有限公司 Path planning method and device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN114442630B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN110187639B (en) Trajectory planning control method based on parameter decision framework
Li et al. Shared control driver assistance system based on driving intention and situation assessment
Li et al. Real-time trajectory planning for autonomous urban driving: Framework, algorithms, and verifications
Weiskircher et al. Predictive guidance and control framework for (semi-) autonomous vehicles in public traffic
CN113276848B (en) Intelligent driving lane changing and obstacle avoiding track planning and tracking control method and system
CN111289978A (en) Method and system for making decision on unmanned driving behavior of vehicle
CN114442630B (en) Intelligent vehicle planning control method based on reinforcement learning and model prediction
CN110568841A (en) Automatic driving decision method and system
CN112249008B (en) Unmanned automobile early warning method aiming at complex dynamic environment
CN112965476B (en) High-speed unmanned vehicle trajectory planning system and method based on multi-window model
CN112577506B (en) Automatic driving local path planning method and system
CN113255998B (en) Expressway unmanned vehicle formation method based on multi-agent reinforcement learning
CN113433947B (en) Intersection trajectory planning and control method based on obstacle vehicle estimation and prediction
CN115257745A (en) Automatic driving lane change decision control method based on rule fusion reinforcement learning
Wei et al. Game theoretic merging behavior control for autonomous vehicle at highway on-ramp
CN114942642A (en) Unmanned automobile track planning method
Zhang et al. Structured road-oriented motion planning and tracking framework for active collision avoidance of autonomous vehicles
CN115257746A (en) Uncertainty-considered decision control method for lane change of automatic driving automobile
CN113200054B (en) Path planning method and system for automatic driving take-over
CN115447615A (en) Trajectory optimization method based on vehicle kinematics model predictive control
Li et al. Distributed MPC for multi-vehicle cooperative control considering the surrounding vehicle personality
CN115140048A (en) Automatic driving behavior decision and trajectory planning model and method
CN114291112A (en) Decision planning cooperative enhancement method applied to automatic driving automobile
Smit et al. Informed sampling-based trajectory planner for automated driving in dynamic urban environments
Wang Control system design for autonomous vehicle path following and collision avoidance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant