CN117911414A

CN117911414A - Automatic driving automobile motion control method based on reinforcement learning

Info

Publication number: CN117911414A
Application number: CN202410315976.1A
Authority: CN
Inventors: 何舒平; 程纬地; 任乘乘; 王广宇
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-04-19

Abstract

The invention relates to the technical field of tracking control, in particular to an automatic driving automobile motion control method based on reinforcement learning. In the first stage, a robust steering controller based on reinforcement learning is designed by utilizing a back-stepping structure control based on a reference path model, a vehicle dynamics model and a kinematic model, so that a lateral path tracking error is restrained, unknown external interference is resisted, and the yaw stability of an autonomous vehicle is ensured. In the second stage, the uncertainty of the tire turning rigidity is compensated by learning approximately any nonlinear function by combining an adaptive control mechanism based on Lyapunov stability theory and a radial basis function neural network, and the global asymptotic stability of a closed-loop system is ensured.

Description

Automatic driving automobile motion control method based on reinforcement learning

Technical Field

The invention relates to the technical field of tracking control, in particular to an automatic driving automobile motion control method based on reinforcement learning.

Background

One of the most important considerations in designing a motion control scheme for an autonomous vehicle is to eliminate the lateral path tracking error while ensuring the stability of the vehicle during driving. In general, motion control of an autonomous vehicle can be achieved by longitudinal control and lateral control according to the current vehicle state and road information, the longitudinal control aiming at maintaining a desired cruising speed and maintaining a safe distance between a preceding vehicle and a controlled vehicle to avoid collision.

However, when autopilots leave the research laboratory, they must be able to react to emergency situations, some of which may require maneuver, such as emergency collision avoidance occurring in a short period of time, requiring a large number of actuator inputs and high yaw rates. The tire will be highly saturated and begin to slip. In this case, the characteristic of the tire force becomes highly nonlinear, which means that the cornering force of the tire does not linearly increase with an increase in slip angle, but rather it hardly changes, or even decreases with an increase in slip angle. Lateral motion control of autonomous vehicles still faces two major challenges: uncertainties in parametric modeling and unknown external disturbances are often problems in actual vehicle systems. If the tire side forces in the non-linear region are considered to be linear forces, or the driving environment changes suddenly, the behavior of the vehicle may become uncontrollable, thereby causing the autonomous vehicle to lose path tracking capability and stability.

Disclosure of Invention

Therefore, the invention aims to provide an automatic driving automobile motion control method based on reinforcement learning, so as to solve the problem of instability caused by parameter uncertainty in the existing automatic driving automobile algorithm.

Based on the above purpose, the invention provides an automatic driving automobile motion control method based on reinforcement learning, which comprises the following steps:

s1, establishing an autonomous control automobile system dynamic model;

S2, establishing a steering system model of the automobile;

S3, combining an autonomous control automobile system dynamic model and a steering system model to obtain a man-machine control mapping model of torque input;

S4, selecting state variables in a man-machine control mapping model, carrying out self-adaptive optimization controller design based on a back-step structure technology and reinforcement learning, and carrying out automatic driving automobile motion control by using the obtained optimized control strategy, wherein the obtained optimized control strategy is as follows:

；

wherein, Is the system state,/>Is the system order,/>，/>，/>And/>Is a positive constant,/>Is tracking error,/>Representing an optimized virtual control law,/>Representing the actual control law after optimization,/>And/>Is a basis function vector,/>And/>Is a bounded approximation error,/>Representing system variables,/>Is the actor's neural network weight,/>Is the neural network weight of criticist,/>Neural network weights for the identified person, wherein the adaptive update law based on the evaluation-action mechanism controller is:

；

Wherein the method comprises the steps of And/>Is a positive constant;

The identifier update law for disturbance and nonlinear term approximation is:

；

Wherein the method comprises the steps of Representing a positive constant matrix,/>Is constant.

Preferably, establishing the autonomous control automobile system dynamic model includes:

s11, establishing a two-degree-of-freedom vehicle dynamics model:

；

wherein, Is the roll angle of the vehicle body,/>For the yaw rate of the vehicle body,/>For the total mass of the vehicle,/>In order to be a longitudinal velocity,Is yaw moment of inertia,/>And/>The distance between the center of gravity and the front and rear axes,/>, respectivelyAnd/>Tire side forces of the front and rear axles, respectively;

s12, converting the vehicle dynamics state parameters into state parameters related to the reference track, and using the prediction error Incorporating lateral error/>And yaw error/>The obtained vehicle kinematic model is as follows:

；

wherein, Is the transverse velocity,/>Is the heading of the carrier,/>Is the heading of the reference path,/>Distance of constant projection,/>Representing the distance along the reference path;

s13, according to the pair And to the/>, of formula (1.5)And/>And (3) deriving to obtain:

；

wherein, Representing the curvature of the path;

S14, eliminating prediction error And oscillation of yaw rate, the following relationship is obtained:

；

s15, carrying the formula (1.4) into the formula (1.7) to obtain the following formula:

；

s16, under the condition of unknown external force interference, calculating the tire lateral force of the front axle and the rear axle as follows:

；

wherein, Tire bending force of front axle and rear axle respectively,/>Lateral unknown external force interference of front axle and rear axle respectively,/>Tire slip angle,/>, for front axle, rear axle, respectivelyFor the front axle tire bending rigidity,/>For the bending rigidity of the rear axle tyre,/>The adhesion coefficient between the tire and the road surface;

The slip angle of the front and rear tires satisfies the requirement:

；

In the middle of For the front wheel steering angle,/>The vehicle speed is the vehicle speed;

And satisfies cornering stiffness having a nonlinear characteristic:

；

In the middle of 、/>The nominal turning rigidity of the front axle and the rear axle is/>、/>Uncertainty of turning rigidity of front and rear tires respectively;

S17, combining the formula (1.8) -formula (1.11) to obtain a nonlinear vehicle-road system model:

；

wherein S _1,S_{2 And}S₃ is a system variable, a smoothing function Representing an equivalent random disturbance of the vehicle.

Preferably, building a steering system model of the automobile includes:

The steering system model is initially built as follows:

；

restated as:

；

In the middle of And/>Equivalent moment of inertia and damping, respectively,/>, of a steering systemIs the reduction ratio of a motor reduction mechanism,/>For the reduction ratio of the steering system,/>Is the front wheel angle,/>Input torque for driver,/>Is steering load torque;

The fitting equation is:

；

Will be And/>Viewed as real-time measurable,/>Is of a known value, then in formula (1.14)Viewed as a known term/>And a smoothing functionWherein/>Obtained by fitting equation (1.15)/>Is measured by a sensor,/>Is the torque fit error,/>The measurement error is obtained by the simplified model of the steering system:

；

wherein the system variables 。

Preferably, combining the autonomous control vehicle system dynamic model and the steering system modeling, the man-machine control mapping model for deriving the torque input comprises:

combining equations (1.12) and (1.16) results in a man-machine-controlled mapping model of torque input:

；

Wherein the method comprises the steps of Indicating the front wheel angular velocity.

Preferably, the process of deriving an optimized control strategy comprises:

Selecting the steering angular velocity of the front wheel Front wheel steering angle/>Projection error/>And derivative of projection error/>As a strict feedback system, i.e., the state variable of formula (1.17), the formula (1.17) is converted into:

；

wherein, ；

Coordinate transformation is adopted, and a first-order filter is introduced to obtain a tracking error equation:

；

The first order filter is designed as In/>For reference signal,/>Is the filter output signal,/>For the filter input signal, i.e. the optimal control law,/>For design constant,/>A first order derivative of the output signal of the filter;

Introducing finite time convergence into the controller design as a constraint, i.e., making the system finite time In, achieve control objective, wherein/>The method meets the following conditions:

；

wherein, Are all constant,/>Is an initial lyapunov function;

for the fourth-order system of formula (1.18), take For/>Virtual control law of steps, wherein/>The optimal performance index function is obtained as follows:

；

wherein, ；

Is provided withFor the optimal virtual controller, obtaining:

；

Wherein the method comprises the steps of Is a predefined tight set;

Will be Considered as optimal virtual control signal/>The HJB equation corresponding to equation (1.21) is obtained as:

；

Wherein the method comprises the steps of By solving/>Obtained, i.e. for/>And/>；

Will beThe method comprises the following steps of:

；

Wherein the method comprises the steps of And/>Is an unknown continuous function;

Will be And/>Using neural network approximation, i.e. pair/>Has the following components

；

Wherein the method comprises the steps of，/>And/>Is the desired neural network weight,/>AndIs a basis function vector,/>And/>Is a bounded approximation error;

bringing formula (1.24) into formula (1.23) gives:

；

Wherein the method comprises the steps of ；

The corresponding optimized controller is obtained as follows:

; introduction of reinforcement learning with recognizer-criticizer-actor structure will be used to approximate/> Is designed as follows:

；

Wherein the method comprises the steps of Output for identifier,/>Neural network weights for the identified person;

The identifier update law is constructed as follows:

；

Wherein the method comprises the steps of Representing a positive constant matrix,/>Is a constant;

Based on the criticist-actor architecture and formula (1.25), a criticist evaluating control performance is configured to:

；

Wherein the method comprises the steps of For/>Estimate of/>A neural network weight for criticists;

According to equation (1.28), an actor for performing a control action is designed to:

；

In the middle of And/>Respectively an optimized virtual control law and an optimized actual control law,/>Neural network weights that are actors;

The criticizing home neural network weight and actor neural network weight update law is:

；

Wherein the method comprises the steps of And/>Is a positive constant.

The invention has the beneficial effects that:

(1) The invention provides a reinforcement learning method design controller based on an evaluation-action mechanism, which realizes stable motion control of an automatic driving vehicle.

(2) Aiming at adverse effects of vehicle system parameter modeling uncertainty and unknown external interference on transverse motion control of an autonomous vehicle under a limit driving condition, the invention provides an identifier-evaluator-actor mechanism, and control performance of the system under larger uncertainty is effectively improved.

(3) In a complex scene, aiming at the problems that an automatic driving automobile suppresses a lateral path tracking error, unknown external interference is resisted and yaw stability of the automatic vehicle is guaranteed, the invention provides an adaptive controller design based on a reinforcement learning and reaction structure method, and the reinforcement learning can be utilized to dynamically adjust adaptive parameters so as to improve the robustness of motion control.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only of the invention and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a modeling block diagram of a steering system module according to an embodiment of the present invention;

FIG. 2 is a diagram of a kinematic model of a vehicle according to an embodiment of the present invention;

FIG. 3 is an architecture diagram of an autonomous vehicle lateral motion controller according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a motion control method applied to an adaptive lateral motion system according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to specific embodiments in order to make the objects, technical solutions and advantages of the present invention more apparent.

It is to be noted that unless otherwise defined, technical or scientific terms used herein should be taken in a general sense as understood by one of ordinary skill in the art to which the present invention belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

The embodiment of the specification provides an automatic driving automobile motion control method based on reinforcement learning, which is characterized by comprising the following steps:

s1, establishing an autonomous control automobile system dynamic model;

The method specifically comprises the following steps: s11, establishing a two-degree-of-freedom vehicle dynamics model:

；

S12, converting the vehicle dynamics state parameters into state parameters related to the reference track so as to concentrate the track tracking capability. Converting vehicle dynamics state parameters to state parameters related to a reference trajectory using prediction errors Incorporating lateral error/>And yaw error/>The kinematic model of the vehicle is shown in fig. 2, and is:

；

wherein, Representing the curvature of the path;

S14, under the condition of high lateral acceleration or low attachment coefficient, the closed loop steering response is under damped, and significant yaw rate oscillation is caused, so as to eliminate prediction error And the yaw rate oscillation, the following relationship is obtained by the equation (1.6):

；

The slip angle of the front and rear tires satisfies the requirement:

；

And satisfies cornering stiffness having a nonlinear characteristic:

；/>

；

S2, building a steering system model of an automobile, wherein the steering system structure of the automobile is shown in fig. 1, and specifically comprises the following steps:

The steering system model is initially built as follows:

；

restated as:

；

In the middle of And/>Equivalent moment of inertia and damping, respectively,/>, of a steering systemIs the reduction ratio of a motor reduction mechanism,/>For the reduction ratio of the steering system,/>Is the front wheel angle,/>Input torque for driver,/>For steering load torque, both can be obtained by sensors, but under extreme conditions, the sensor data will be inaccurate, fitting the equation:

；

；/>

wherein the system variables 。

S3, combining an autonomous control automobile system dynamic model and a steering system model to obtain a man-machine control mapping model of torque input, wherein the man-machine control mapping model specifically comprises the following steps:

；

wherein, Is the system state,/>Is the system order,/>，/>，/>And/>Is a positive constant,/>Is tracking error,/>Representing an optimized virtual control law,/>Representing the actual control law after the optimization,And/>Is a basis function vector,/>And/>Is a bounded approximation error,/>Representing system variables,/>Is the actor's neural network weight,/>Is the neural network weight of criticist,/>Neural network weights for the identified person, wherein the adaptive update law based on the evaluation-action mechanism controller is:

；

Wherein the method comprises the steps of And/>Is a positive constant;

The identifier update law for disturbance and nonlinear term approximation is:

；

Specifically, the process of deriving an optimized control strategy includes:

；

wherein, ；

And adopting coordinate transformation, and simultaneously adopting a dynamic surface technology to introduce a first-order filter to obtain a tracking error equation for inhibiting the jitter problem of the control signal:

；

In the middle of For reference signal,/>Is the filter output signal,/>For optimal control law, the filter is designed as，/>For design constant,/>A first order derivative of the output signal of the filter;

To ensure the rapidity of motion control, a constraint of limited time convergence is introduced into the controller design, namely, the system is enabled to be in limited time In, achieve control objective, wherein/>The method meets the following conditions:

；

wherein, Are all constant,/>Is an initial lyapunov function.

For the fourth-order system of formula (1.18), takeFor/>Virtual control law of steps, wherein/>The optimal performance index function is obtained as follows:

；

wherein, ；

Is provided withFor the optimal virtual controller, obtaining:

；

Wherein the method comprises the steps of Is a predefined tight set;

Will be Considered as optimal virtual control signal/>The HJB equation corresponding to equation (1.21) is obtained as: /(I)

；

Wherein the method comprises the steps ofBy solving/>Obtained, i.e. forAnd/>；

Will beThe method comprises the following steps of:

；

bringing formula (1.24) into formula (1.23) gives:

；

Wherein the method comprises the steps of ；

The corresponding optimized controller is obtained as follows:

；

The identifier update law is constructed as follows:

；

Wherein the method comprises the steps of And/>Is a positive constant.

The optimized robust steering controller designed by strengthening the learning algorithm based on the criticizing-evaluating mechanism and the identifier approximator based on the radial basis function neural network. In the first stage, a robust steering controller based on reinforcement learning is designed by utilizing a back-stepping structure control based on a reference path model, a vehicle dynamics model and a kinematic model, so that a lateral path tracking error is restrained, unknown external interference is resisted, and the yaw stability of an autonomous vehicle is ensured. In the second stage, the uncertainty of the tire turning rigidity is compensated by learning approximately any nonlinear function by combining an adaptive control mechanism based on Lyapunov stability theory and a radial basis function neural network, and the global asymptotic stability of a closed-loop system is ensured, and the system architecture is shown in figure 3.

As an implementation manner, the control method is applicable to the following self-adaptive transverse motion system, as shown in fig. 4, and mainly comprises a steering wheel assembly and a steering execution assembly, wherein the steering wheel assembly is provided with a steering column 10, a steering wheel 1 is fixed at the upper end of the steering column 10, a hand wheel angle sensor 2 is sleeved on the steering column 10, and a road sense feedback motor 3 is connected at the bottom end of the steering column 10; the steering motor 4 is arranged at the top of the steering transmission shaft 14, the gear angle sensor 5 is sleeved in the middle of the steering transmission shaft 14, one end of the rack-and-pinion steering gear 7 is connected with the steering transmission shaft 14, the other end of the rack-and-pinion steering gear 7 is connected with the tie rod 8, two ends of the tie rod 8 are respectively connected with the steering knuckle arm 13 through a ball head 12, and the other end of the steering knuckle arm 13 is connected to the front wheel 6. The hand wheel angle sensor 2 measures the angle of the steering wheel 1 rotated by the driver and transmits signals to the inside of the main controller 9, the main controller 9 transmits road condition information to the road sense feedback motor 3 in the form of an electric signal, the road sense feedback motor 3 drives the steering column 10 to rotate, the steering wheel 1 also rotates, and the driver feels the road condition of the road surface. According to the self-adaptive motion control method provided by the embodiment of the specification, an optimized motion control strategy is obtained through self-adaptive learning, the control signal is input into the main controller 9, the steering motor 4 receives the steering signal sent by the main controller 9 to make corresponding action, the steering motor 4 drives the steering transmission shaft 14 to rotate, the steering transmission shaft 14 drives the rack-and-pinion steering device 7 to operate, the rack-and-pinion steering device 7 drives the transverse pull rod 8 to move left and right, the transverse pull rod 8 drives the steering knuckle arm 13 to operate through the ball head 12, and the steering knuckle arm 13 drives the front wheels 6 to steer. The steering angle of the gear measured by the gear angle sensor 5 is transmitted to the main controller 9, thereby forming a closed loop.

Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the invention (including the claims) is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

The present invention is intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the present invention should be included in the scope of the present invention.

Claims

1. An automatic driving automobile motion control method based on reinforcement learning, which is characterized by comprising the following steps:

s1, establishing an autonomous control automobile system dynamic model;

S2, establishing a steering system model of the automobile;

；

Wherein the method comprises the steps of And/>Is a positive constant;

The identifier update law for disturbance and nonlinear term approximation is:

；

2. The reinforcement learning based automatic driving car motion control method of claim 1, wherein the establishing an autonomous control car system dynamic model comprises:

s11, establishing a two-degree-of-freedom vehicle dynamics model:

；

wherein, Is the roll angle of the vehicle body,/>For the yaw rate of the vehicle body,/>For the total mass of the vehicle,/>For longitudinal speed,/>Is yaw moment of inertia,/>And/>The distance between the center of gravity and the front and rear axes,/>, respectivelyAnd/>Tire side forces of the front and rear axles, respectively;

；

wherein, Representing the curvature of the path;

；

The slip angle of the front and rear tires satisfies the requirement:

；

And satisfies cornering stiffness having a nonlinear characteristic:

；

3. The reinforcement learning-based automatic driving car motion control method according to claim 2, wherein the building a steering system model of a car comprises:

The steering system model is initially built as follows:

；

restated as:

；

In the middle of And/>Equivalent moment of inertia and damping, respectively,/>, of a steering systemIs the reduction ratio of the motor reduction mechanism,For the reduction ratio of the steering system,/>Is the front wheel angle,/>Input torque for driver,/>Is steering load torque;

The fitting equation is:

；

Will be And/>Viewed as real-time measurable,/>Is of a known value, then in formula (1.14)Viewed as a known term/>And a smoothing functionWherein/>Obtained by fitting formula (1.15),Is measured by a sensor,/>Is the torque fit error,/>The measurement error is obtained by the simplified model of the steering system:

；

wherein the system variables 。

4. The reinforcement learning-based automatic driving vehicle motion control method according to claim 3, wherein the combining the autonomous control vehicle system dynamic model and the steering system modeling to obtain the man-machine control map model of the torque input comprises:

；

5. The reinforcement learning based autopilot motion control method of claim 4 wherein the process of deriving an optimized control strategy includes:

；

wherein, ；

；

The first order filter is designed as In/>As a reference signal, a reference signal is provided,Is the filter output signal,/>For the filter input signal, i.e. the optimal control law,/>For design constant,/>A first order derivative of the output signal of the filter;

；

wherein, Are all constant,/>Is an initial lyapunov function;

；

wherein, ；

Is provided withFor the optimal virtual controller, obtaining:

；

Wherein the method comprises the steps of Is a predefined tight set;

；

Will beThe method comprises the following steps of:

；

bringing formula (1.24) into formula (1.23) gives:

；

Wherein the method comprises the steps of ；

The corresponding optimized controller is obtained as follows: ; introduction of reinforcement learning with recognizer-criticizer-actor structure will be used to approximate/> Is designed as follows:

；

The identifier update law is constructed as follows:

；

Wherein the method comprises the steps of And/>Is a positive constant.