CN110450771B

CN110450771B - Intelligent automobile stability control method based on deep reinforcement learning

Info

Publication number: CN110450771B
Application number: CN201910809910.7A
Authority: CN
Inventors: 黄鹤; 郭伟锋; 张炳力; 张润; 王博文; 吴润晨; 程进
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2019-08-29
Filing date: 2019-08-29
Publication date: 2021-03-09
Anticipated expiration: 2039-08-29
Also published as: CN110450771A

Abstract

The invention discloses an intelligent automobile stability control method based on deep reinforcement learning, which comprises the following steps: 1, acquiring decision output, vehicle structure parameters and driving parameters of a transverse controller of an automobile; 2, defining state parameters, action parameters and reward functions of the deep reinforcement learning method; 3, constructing and training a network model of the deep reinforcement learning method to obtain an optimal action network model; 4 obtaining the current state parameter s of the automobile_tThereby outputting a current additional yaw moment v M using the optimal action network model_tAnd correcting the corner ^ δ_t(ii) a 5, judging the stable state of the automobile; 6, determining the current correction corner v according to the steering property of the automobile and the corner direction of the steering wheel_tDirection and current additional yaw moment M_tThe action wheel of (2). The invention can realize the optimal coordination control law between direct yaw moment control and steering control under the stable working condition and the limit working condition, thereby realizing the stability control of the vehicle and ensuring the safety and comfort of drivers and passengers.

Description

Intelligent automobile stability control method based on deep reinforcement learning

Technical Field

The invention relates to the field of automobile dynamics control, in particular to an intelligent automobile stability control method based on deep reinforcement learning.

Background

When the automobile turns, the tire slip angle is increased, the lateral force is increased, the automobile can run according to the intention of a driver, but under some low-adhesion and sharp-turning working conditions, the lateral force of the automobile easily reaches the adhesion limit, and the automobile can generate dangerous working conditions such as sideslip, sharp turning and side turning. Currently, the main ways to intervene in the above mentioned dangerous conditions are active steering control and direct yaw moment control. The active steering control is to change the yaw moment of the vehicle by inputting a correction steering angle to the steering wheel; the direct yaw moment control is mainly to adjust the understeer or oversteer of the vehicle by adjusting the wheel braking forces to form a braking force difference, thereby generating an additional yaw moment.

The influence of active steering and direct yaw moment control on the performance of the automobile has advantages and disadvantages, the influence of independent active steering control on the speed of the automobile is small, the comfort of drivers and passengers is ensured, but the effect is poor under the extreme working condition, the stability of the automobile cannot be controlled, and the safety requirement of the drivers and passengers cannot be met; the independent direct yaw moment control system can ensure the safety of drivers and passengers under the limit working condition, but has larger influence on the longitudinal acceleration of the vehicle, and can not meet the comfort requirement of the drivers and passengers. The vehicle is used as a complex nonlinear system, a plurality of coupling effects exist among the systems, relatively optimal control outputs exist for controlling the stability of the vehicle in each state of the vehicle, the optimal control outputs are not simple linear relations, and the safety and comfort of drivers and passengers cannot be well guaranteed by designing a linear coordination controller.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides an intelligent automobile stability control method based on deep reinforcement learning, so that the optimal coordination control law between direct yaw moment control and steering control under stable working conditions and extreme working conditions can be realized, the stability control of a vehicle is realized, and the safety and the comfort of drivers and passengers are ensured.

The invention adopts the following technical scheme for solving the technical problems:

the invention relates to an intelligent automobile stability control method based on deep reinforcement learning, which is characterized by comprising the following steps of:

step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controller_fAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axles_fAnd L_rFront and rear wheel side yaw stiffness C₁And C₂The mass m of the automobile;

acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;

step 2: calculating the ideal yaw rate w using equation (1)_d：

In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:

and step 3: calculating the ideal centroid slip angle beta by using the formula (3)_d：

β_d＝-min{|β|,|β_max|}·sign(δ_f) (3)

In the formula (3), beta is the vehicle mass center slip angle, beta_maxIs the maximum centroid slip angle of the vehicle and has:

and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):

s＝{w,β,sw,w_d,β_d} (6)

and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):

in the formula (7), the reaction mixture is,

the angle of rotation is corrected for the steering wheel,

an additional yaw moment;

step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):

r＝r_e+r_ps+r_v+r_m+r_sw+r_st (8)

in the formula (8), r_eIs an error reward function and has:

in the formula (9), the reaction mixture is,

as the yaw-rate error, there is,

is the error of the centroid slip angle and has:

in the formula (8), r_psIs a fixed prize value function and has:

in the formula (8), r_vA function is awarded for the speed difference and has:

in the formula (8), r_mA function is awarded for the additional yaw moment and having:

in the formula (8), r_swFor correcting the angle reward function, and have:

in the formula (8), r_stA reward function for the stable domain, and having:

and 7: constructing a network model of the deep reinforcement learning method:

step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n₁M of individual neuron₁A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θ^μ；

Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n₂M of individual neuron₂A layer-hiding layer, wherein the m-th layer₂The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; initializing evaluation network parameter as theta^Q；

Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter theta^μ′＝θ^μConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter theta^Q′＝θ^Q；

And 8: n samples were formed from the ith sample:

initializing the ith vehicle state parameter s_iAnd with the ith vehicle state parameter s_iAs input to the motion network model, outputting μ(s) by the motion network model_i|θ^μ)；

Obtaining the ith vehicle motion parameter a by using the formula (17)_i：

a_i＝μ(s_i|θ^μ)+N_i (17)

In the formula (17), N_iRepresenting the ith random noise;

obtaining the ith vehicle reward value r according to the formula (8)_iAnd obtaining an updated ith vehicle state parameter s'_i(ii) a Thus obtaining the ith sample, denoted as(s)_i,a_i,r_i,s′_i) Further obtaining N samples;

and step 9: training the network model of the deep reinforcement learning method by using the N samples so as to obtain an optimal action network model and an optimal evaluation network model;

step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:

in the formula (18), k₁To stabilize the domain first boundary coefficient, k₂A second boundary coefficient of the stable domain;

is the centroid slip angular velocity;

in the formula (19), epsilon is an adjustable parameter;

step 11: obtaining the current state parameter s of the vehicle_tAs input to the optimal motion network model, thereby outputting the current additional yaw moment using the optimal motion network model

And correcting the rotation angle

Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;

w_d×(w-w_d)＞0 (20)

step 13: if delta_fIf greater than 0, the angle is corrected

Is directed leftward, if delta_fIf < 0, then let the correction corner

To the right;

step 14: if delta_fIf greater than 0, the angle is corrected

Is directed to the right, if delta_fIf < 0, then let the correction corner

To the left.

The intelligent automobile stability control method is also characterized in that the step 9 is carried out according to the following process:

step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;

step 9.2: with said i-th vehicle state parameter s_iAs input to the current i-th motion network model, an i-th output value μ(s) is output by the current i-th motion network model_i|θ^μ)；

With said i-th vehicle state parameter s_iIth vehicle motion parameter a_iAnd the ith output value mu(s) of the action network_i|θ^μ) All as the input of the current ith evaluation network model, and the ith vehicle state parameter s_iAnd ith vehicle motion parameter a_iOutputting an ith output value Q through the current ith evaluation network model_i(a_i) (ii) a From the ith output value mu(s) of the motion network model_i|θ^μ) Outputting an ith output value Q through the current ith evaluation network model_i(μ(s_i|θ^μ))；

With the updated ith vehicle state parameter s'_iOutputting an ith output value mu (s ') by the current ith target action network model as an input of the current ith target action network model'_i|θ^μ′)；

With the updated ith vehicle state parameter s'_iAnd ith output value mu (s ') of target action network model'_i|θ^μ′) As an input of the current ith target evaluation network model, outputting an ith output value Q 'by the current ith target evaluation network model'_i(a′_i)；

According to the ith output value Q of the current ith evaluation network model_i(μ(s_i|θ^μ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain an ith updated action network model as an (i + 1) th action network model;

output Q according to the current ith evaluation network model_i(a_i) And output Q 'of the current ith target evaluation network model'_i(a′_i) Updating the current ith evaluation network model by using a minimum loss function so as to obtain an ith updated evaluation network model which is used as an (i + 1) th evaluation network model;

step 9.3: and (3) after i +1 is assigned to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention utilizes the advantages of model-free and generalization prediction of a deep reinforcement learning algorithm to determine the input state and the output action related to the vehicle stability control, designs a reward function suitable for coordination control, constructs and trains an optimal action network model, and thus, the model can be utilized to decide an optimal coordination stability control strategy under stable working conditions and extreme working conditions, thereby realizing the vehicle stability control and ensuring the safety and comfort of drivers and passengers;

2. the deep reinforcement learning algorithm based on the invention does not need to design an algorithm model based on a vehicle model, and the adopted deep neural network has strong nonlinear expression capability, can express the nonlinear relation between the vehicle state and active steering and differential braking control, and is more in line with the real situation compared with a linear controller designed based on a simplified vehicle model;

3. compared with the control method without control, the active steering control, the direct yaw moment control and the linear distribution coordination control, the control method has better control effect under different working conditions, better robustness and better comfort under the limit working condition.

Drawings

FIG. 1 is an intelligent vehicle stability control system based on deep reinforcement learning according to the present invention;

FIG. 2 is a diagram of a training process of the deep reinforcement learning method of the present invention.

Detailed Description

In this embodiment, the intelligent automobile stability control method based on deep reinforcement learning can make a decision on a current correction turning angle and an additional yaw moment according to current state parameters of an automobile, so that automobile stability coordination control is realized. Specifically, as shown in fig. 1, the method comprises the following steps:

step 2: calculating the ideal yaw rate w using equation (1)_d：

β_d＝-min{|β|,|β_max|}·sign(δ_f) (3)

In the formula (3), beta is the vehicle mass center slip angle, beta_maxFor maximum centroid slip angle of vehicleAnd has the following components:

s＝{w,β,sw,w_d,β_d} (6)

in the formula (7), the reaction mixture is,

the steering angle is corrected for the steering wheel, the value range is (0,20), the unit is degree,

the value range of the additional yaw moment is (0,20), and the unit is N.m;

r＝r_e+r_ps+r_v+r_m+r_sw+r_st (8)

the reward function is the core of the whole depth reinforcement learning algorithm and can guide the adjustment direction of the parameters of the depth neural network. Design principles are firstly given in design, and then specific reward functions are designed according to the design principles.

In this example, the reward function is set to 4 priorities, and the higher the priority is, the more important the principle is, the design principle is:

level 1: the invention aims to realize the stability control of the automobile, so that the guarantee of the stability of the automobile is a primary task;

and 2, stage: steering control has an advantage over braking control, so it is ensured that steering control is prioritized over braking control;

and 3, level: the stability of the automobile is controlled by using a smaller active steering angle or a smaller braking pressure as far as possible;

4, level: the automobile is in a stable area, and the action output is 0 as much as possible;

in the formula (8), r_eFor the error reward function, corresponding to the level 1 design principle, the smaller the error, the larger the reward value, in order to highlight the importance of the level 1 design principle, the rate of change of the error reward function should be the largest, so the quadratic function is designed as the level 1 reward function, and there are:

in the formula (9), the reaction mixture is,

as the yaw-rate error, there is,

is the error of the centroid slip angle and has:

in the formula (8), r_psFor a fixed prize value function, a higher prize value is achieved using steering control preferentially, corresponding to a level 2 design rule, and having:

in the formula (8), r_vFor the speed difference reward function, corresponding to the level 2 control principle, steering has less influence on speed than braking, a larger reward value can be obtained, and there are:

in the formula (8), r_mFor the additional yaw moment reward function, corresponding to a 3-level design principle, there are:

in the formula (8), r_swFor the correction angle reward function, corresponding to a 3-level design principle, there are:

in the formula (8), r_stFor the stable domain reward function, corresponding to the 4-level design principle, in the stable domain, the smaller the motion, the larger the reward, and there are:

and 7: constructing a network model of the deep reinforcement learning method:

Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n₂M of individual neuron₂A layer-hiding layer, wherein the m-th layer₂The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; first stageThe initialized evaluation network parameter is theta^Q；

And 8: n samples were formed from the ith sample:

initializing the ith vehicle state parameter s_iAnd with the ith vehicle state parameter s_iAs input to the motion network model, μ(s) is output by the motion network model_i|θ^μ)；

Obtaining the ith vehicle motion parameter a by using the formula (17)_i：

a_i＝μ(s_i|θ^μ)+N_i (17)

In the formula (17), N_iRepresenting the ith random noise;

and step 9: as shown in fig. 2, the network model of the deep reinforcement learning method is trained with N samples:

step 9.2: with the ith vehicle state parameter s_iAs the input of the current ith motion network model, the current ith motion network model outputs the ith output value mu(s)_i|θ^μ)；

With the ith vehicle state parameter s_iIth vehicle motion parameter a_iAnd the ith output value mu(s) of the action network_i|θ^μ) All used as the input of the current ith evaluation network model, and are measured by the ith vehicle state parameter s_iAnd ith vehicle motion parameter a_iOutputting the ith output value Q through the current ith evaluation network model_i(a_i) (ii) a From the ith output value mu(s) of the motion network model_i|θ^μ) Outputting the ith output value Q through the current ith evaluation network model_i(μ(s_i|θ^μ))；

With the updated ith vehicle state parameter s'_iAs an input of the current ith target motion network model, the ith output value mu (s 'is output by the current ith target motion network model'_i|θ^μ′)；

With the updated ith vehicle state parameter s'_iAnd ith output value mu (s ') of target action network model'_i|θ^μ′) As the input of the current ith target evaluation network model, the ith output value Q 'is output by the current ith target evaluation network model'_i(a′_i)；

According to the ith output value Q of the current ith evaluation network model_i(μ(s_i|θ^μ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain the ith updated action network model as an (i + 1) th action network model;

output Q of the current ith evaluation network model_i(a_i) And the output Q of the current ith target evaluation network model_i′(a_i') updating the current ith evaluation network model by using a minimum loss function so as to obtain the ith updated evaluation network model as an i +1 th evaluation network model;

step 9.3: assigning i +1 to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution;

is the centroid slip angular velocity;

in the formula (19), epsilon is an adjustable parameter;

step 11: obtaining the current state parameter s of the vehicle_tAs input of the optimal action network model, thereby outputting the current additional yaw moment by using the optimal action network model

And correcting the rotation angle

w_d×(w-w_d)＞0 (20)

step 13: if delta_fIf greater than 0, the angle is corrected

Is directed leftward, if delta_fIf < 0, then let the correction corner

To the right;

step 14: if delta_fIf greater than 0, the angle is corrected

Is directed to the right, if delta_fIf < 0, then let the correction corner

To the left.

Claims

1. An intelligent automobile stability control method based on deep reinforcement learning is characterized by comprising the following steps:

step 2: calculating the ideal yaw rate w using equation (1)_d：

β_d＝-min{|β|,|β_max|}·sign(δ_f) (3)

s＝{w,β,sw,w_d,β_d} (6)

in the formula (7), the reaction mixture is,

the angle of rotation is corrected for the steering wheel,

an additional yaw moment;

r＝r_e+r_ps+r_v+r_m+r_sw+r_st (8)

in the formula (8), r_eIs an error reward function and has:

in the formula (9), the reaction mixture is,

as the yaw-rate error, there is,

error of centroid slip anglePoor, combined with:

in the formula (8), r_psIs a fixed prize value function and has:

in the formula (8), r_vA function is awarded for the speed difference and has:

in the formula (8), r_swFor correcting the angle reward function, and have:

in the formula (8), r_stA reward function for the stable domain, and having:

and 7: constructing a network model of the deep reinforcement learning method:

And 8: n samples were formed from the ith sample:

Obtaining the ith vehicle motion parameter a by using the formula (17)_i：

a_i＝μ(s_i|θ^μ)+N_i (17)

In the formula (17), N_iRepresenting the ith random noise;

obtaining the ith vehicle reward value r according to the formula (8)_iAnd obtaining the updated ith vehicle state parameter s_i'; thus obtaining the ith sample, denoted as(s)_i,a_i,r_i,s′_i) Further obtaining N samples;

is the centroid slip angular velocity;

in the formula (19), epsilon is an adjustable parameter;

And correcting the rotation angle

w_d×(w-w_d)＞0 (20)

step 13: if delta_fIf greater than 0, the angle is corrected

Is directed leftward, if delta_fIf < 0, then let the correction corner

To the right;

step 14: if delta_fIf greater than 0, the angle is corrected

Is directed to the right, if delta_fIf < 0, then let the correction corner

To the left.

2. The intelligent vehicle stability control method according to claim 1, wherein the step 9 is performed as follows:

With the updated ith vehicle state parameter s'_iAs input to the current i-th target action network modelThe current ith target motion network model outputs an ith output value mu (s'_i|θ^μ′)；