CN110450771B - Intelligent automobile stability control method based on deep reinforcement learning - Google Patents

Intelligent automobile stability control method based on deep reinforcement learning Download PDF

Info

Publication number
CN110450771B
CN110450771B CN201910809910.7A CN201910809910A CN110450771B CN 110450771 B CN110450771 B CN 110450771B CN 201910809910 A CN201910809910 A CN 201910809910A CN 110450771 B CN110450771 B CN 110450771B
Authority
CN
China
Prior art keywords
network model
ith
formula
vehicle
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910809910.7A
Other languages
Chinese (zh)
Other versions
CN110450771A (en
Inventor
黄鹤
郭伟锋
张炳力
张润
王博文
吴润晨
程进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201910809910.7A priority Critical patent/CN110450771B/en
Publication of CN110450771A publication Critical patent/CN110450771A/en
Application granted granted Critical
Publication of CN110450771B publication Critical patent/CN110450771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/18Conjoint control of vehicle sub-units of different type or different function including control of braking systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/20Conjoint control of vehicle sub-units of different type or different function including control of steering systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/02Control of vehicle driving stability
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/02Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to ambient conditions
    • B60W40/06Road conditions
    • B60W40/068Road friction coefficient
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/10Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/10Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
    • B60W40/105Speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/12Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to parameters of the vehicle itself, e.g. tyre models
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/12Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to parameters of the vehicle itself, e.g. tyre models
    • B60W40/13Load or weight
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/0098Details of control systems ensuring comfort, safety or stability not otherwise provided for
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0002Automatic control, details of type of controller or control system architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0043Signal treatments, identification of variables or parameters, parameter estimation or state estimation
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/10Longitudinal speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2530/00Input parameters relating to vehicle conditions or values, not covered by groups B60W2510/00 or B60W2520/00
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2530/00Input parameters relating to vehicle conditions or values, not covered by groups B60W2510/00 or B60W2520/00
    • B60W2530/10Weight
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/18Braking system
    • B60W2710/182Brake pressure, e.g. of fluid or between pad and disc
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2710/00Output or target parameters relating to a particular sub-units
    • B60W2710/20Steering systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Human Computer Interaction (AREA)
  • Steering Control In Accordance With Driving Conditions (AREA)

Abstract

The invention discloses an intelligent automobile stability control method based on deep reinforcement learning, which comprises the following steps: 1, acquiring decision output, vehicle structure parameters and driving parameters of a transverse controller of an automobile; 2, defining state parameters, action parameters and reward functions of the deep reinforcement learning method; 3, constructing and training a network model of the deep reinforcement learning method to obtain an optimal action network model; 4 obtaining the current state parameter s of the automobiletThereby outputting a current additional yaw moment v M using the optimal action network modeltAnd correcting the corner ^ δt(ii) a 5, judging the stable state of the automobile; 6, determining the current correction corner v according to the steering property of the automobile and the corner direction of the steering wheeltDirection and current additional yaw moment MtThe action wheel of (2). The invention can realize the optimal coordination control law between direct yaw moment control and steering control under the stable working condition and the limit working condition, thereby realizing the stability control of the vehicle and ensuring the safety and comfort of drivers and passengers.

Description

Intelligent automobile stability control method based on deep reinforcement learning
Technical Field
The invention relates to the field of automobile dynamics control, in particular to an intelligent automobile stability control method based on deep reinforcement learning.
Background
When the automobile turns, the tire slip angle is increased, the lateral force is increased, the automobile can run according to the intention of a driver, but under some low-adhesion and sharp-turning working conditions, the lateral force of the automobile easily reaches the adhesion limit, and the automobile can generate dangerous working conditions such as sideslip, sharp turning and side turning. Currently, the main ways to intervene in the above mentioned dangerous conditions are active steering control and direct yaw moment control. The active steering control is to change the yaw moment of the vehicle by inputting a correction steering angle to the steering wheel; the direct yaw moment control is mainly to adjust the understeer or oversteer of the vehicle by adjusting the wheel braking forces to form a braking force difference, thereby generating an additional yaw moment.
The influence of active steering and direct yaw moment control on the performance of the automobile has advantages and disadvantages, the influence of independent active steering control on the speed of the automobile is small, the comfort of drivers and passengers is ensured, but the effect is poor under the extreme working condition, the stability of the automobile cannot be controlled, and the safety requirement of the drivers and passengers cannot be met; the independent direct yaw moment control system can ensure the safety of drivers and passengers under the limit working condition, but has larger influence on the longitudinal acceleration of the vehicle, and can not meet the comfort requirement of the drivers and passengers. The vehicle is used as a complex nonlinear system, a plurality of coupling effects exist among the systems, relatively optimal control outputs exist for controlling the stability of the vehicle in each state of the vehicle, the optimal control outputs are not simple linear relations, and the safety and comfort of drivers and passengers cannot be well guaranteed by designing a linear coordination controller.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides an intelligent automobile stability control method based on deep reinforcement learning, so that the optimal coordination control law between direct yaw moment control and steering control under stable working conditions and extreme working conditions can be realized, the stability control of a vehicle is realized, and the safety and the comfort of drivers and passengers are ensured.
The invention adopts the following technical scheme for solving the technical problems:
the invention relates to an intelligent automobile stability control method based on deep reinforcement learning, which is characterized by comprising the following steps of:
step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controllerfAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axlesfAnd LrFront and rear wheel side yaw stiffness C1And C2The mass m of the automobile;
acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;
step 2: calculating the ideal yaw rate w using equation (1)d
Figure GDA0002743759200000011
In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:
Figure GDA0002743759200000021
and step 3: calculating the ideal centroid slip angle beta by using the formula (3)d
βd=-min{|β|,|βmax|}·sign(δf) (3)
In the formula (3), beta is the vehicle mass center slip angle, betamaxIs the maximum centroid slip angle of the vehicle and has:
Figure GDA0002743759200000022
Figure GDA0002743759200000023
and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):
s={w,β,sw,wdd} (6)
and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):
Figure GDA0002743759200000024
in the formula (7), the reaction mixture is,
Figure GDA0002743759200000025
the angle of rotation is corrected for the steering wheel,
Figure GDA0002743759200000026
an additional yaw moment;
step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):
r=re+rps+rv+rm+rsw+rst (8)
in the formula (8), reIs an error reward function and has:
Figure GDA0002743759200000027
in the formula (9), the reaction mixture is,
Figure GDA0002743759200000028
as the yaw-rate error, there is,
Figure GDA0002743759200000029
is the error of the centroid slip angle and has:
Figure GDA00027437592000000210
Figure GDA00027437592000000211
in the formula (8), rpsIs a fixed prize value function and has:
Figure GDA00027437592000000212
in the formula (8), rvA function is awarded for the speed difference and has:
Figure GDA00027437592000000213
in the formula (8), rmA function is awarded for the additional yaw moment and having:
Figure GDA0002743759200000031
in the formula (8), rswFor correcting the angle reward function, and have:
Figure GDA0002743759200000032
in the formula (8), rstA reward function for the stable domain, and having:
Figure GDA0002743759200000033
and 7: constructing a network model of the deep reinforcement learning method:
step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n1M of individual neuron1A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θμ
Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n2M of individual neuron2A layer-hiding layer, wherein the m-th layer2The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; initializing evaluation network parameter as thetaQ
Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter thetaμ′=θμConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter thetaQ′=θQ
And 8: n samples were formed from the ith sample:
initializing the ith vehicle state parameter siAnd with the ith vehicle state parameter siAs input to the motion network model, outputting μ(s) by the motion network modeliμ);
Obtaining the ith vehicle motion parameter a by using the formula (17)i
ai=μ(siμ)+Ni (17)
In the formula (17), NiRepresenting the ith random noise;
obtaining the ith vehicle reward value r according to the formula (8)iAnd obtaining an updated ith vehicle state parameter s'i(ii) a Thus obtaining the ith sample, denoted as(s)i,ai,ri,s′i) Further obtaining N samples;
and step 9: training the network model of the deep reinforcement learning method by using the N samples so as to obtain an optimal action network model and an optimal evaluation network model;
step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:
Figure GDA0002743759200000041
Figure GDA0002743759200000042
in the formula (18), k1To stabilize the domain first boundary coefficient, k2A second boundary coefficient of the stable domain;
Figure GDA0002743759200000043
is the centroid slip angular velocity;
in the formula (19), epsilon is an adjustable parameter;
step 11: obtaining the current state parameter s of the vehicletAs input to the optimal motion network model, thereby outputting the current additional yaw moment using the optimal motion network model
Figure GDA0002743759200000044
And correcting the rotation angle
Figure GDA0002743759200000045
Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;
wd×(w-wd)>0 (20)
step 13: if deltafIf greater than 0, the angle is corrected
Figure GDA0002743759200000046
Is directed leftward, if deltafIf < 0, then let the correction corner
Figure GDA0002743759200000047
To the right;
step 14: if deltafIf greater than 0, the angle is corrected
Figure GDA0002743759200000048
Is directed to the right, if deltafIf < 0, then let the correction corner
Figure GDA0002743759200000049
To the left.
The intelligent automobile stability control method is also characterized in that the step 9 is carried out according to the following process:
step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;
step 9.2: with said i-th vehicle state parameter siAs input to the current i-th motion network model, an i-th output value μ(s) is output by the current i-th motion network modeliμ);
With said i-th vehicle state parameter siIth vehicle motion parameter aiAnd the ith output value mu(s) of the action networkiμ) All as the input of the current ith evaluation network model, and the ith vehicle state parameter siAnd ith vehicle motion parameter aiOutputting an ith output value Q through the current ith evaluation network modeli(ai) (ii) a From the ith output value mu(s) of the motion network modeliμ) Outputting an ith output value Q through the current ith evaluation network modeli(μ(siμ));
With the updated ith vehicle state parameter s'iOutputting an ith output value mu (s ') by the current ith target action network model as an input of the current ith target action network model'iμ′);
With the updated ith vehicle state parameter s'iAnd ith output value mu (s ') of target action network model'iμ′) As an input of the current ith target evaluation network model, outputting an ith output value Q 'by the current ith target evaluation network model'i(a′i);
According to the ith output value Q of the current ith evaluation network modeli(μ(siμ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain an ith updated action network model as an (i + 1) th action network model;
output Q according to the current ith evaluation network modeli(ai) And output Q 'of the current ith target evaluation network model'i(a′i) Updating the current ith evaluation network model by using a minimum loss function so as to obtain an ith updated evaluation network model which is used as an (i + 1) th evaluation network model;
step 9.3: and (3) after i +1 is assigned to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention utilizes the advantages of model-free and generalization prediction of a deep reinforcement learning algorithm to determine the input state and the output action related to the vehicle stability control, designs a reward function suitable for coordination control, constructs and trains an optimal action network model, and thus, the model can be utilized to decide an optimal coordination stability control strategy under stable working conditions and extreme working conditions, thereby realizing the vehicle stability control and ensuring the safety and comfort of drivers and passengers;
2. the deep reinforcement learning algorithm based on the invention does not need to design an algorithm model based on a vehicle model, and the adopted deep neural network has strong nonlinear expression capability, can express the nonlinear relation between the vehicle state and active steering and differential braking control, and is more in line with the real situation compared with a linear controller designed based on a simplified vehicle model;
3. compared with the control method without control, the active steering control, the direct yaw moment control and the linear distribution coordination control, the control method has better control effect under different working conditions, better robustness and better comfort under the limit working condition.
Drawings
FIG. 1 is an intelligent vehicle stability control system based on deep reinforcement learning according to the present invention;
FIG. 2 is a diagram of a training process of the deep reinforcement learning method of the present invention.
Detailed Description
In this embodiment, the intelligent automobile stability control method based on deep reinforcement learning can make a decision on a current correction turning angle and an additional yaw moment according to current state parameters of an automobile, so that automobile stability coordination control is realized. Specifically, as shown in fig. 1, the method comprises the following steps:
step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controllerfAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axlesfAnd LrFront and rear wheel side yaw stiffness C1And C2The mass m of the automobile;
acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;
step 2: calculating the ideal yaw rate w using equation (1)d
Figure GDA0002743759200000061
In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:
Figure GDA0002743759200000062
and step 3: calculating the ideal centroid slip angle beta by using the formula (3)d
βd=-min{|β|,|βmax|}·sign(δf) (3)
In the formula (3), beta is the vehicle mass center slip angle, betamaxFor maximum centroid slip angle of vehicleAnd has the following components:
Figure GDA0002743759200000063
Figure GDA0002743759200000064
and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):
s={w,β,sw,wdd} (6)
and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):
Figure GDA0002743759200000065
in the formula (7), the reaction mixture is,
Figure GDA0002743759200000066
the steering angle is corrected for the steering wheel, the value range is (0,20), the unit is degree,
Figure GDA0002743759200000067
the value range of the additional yaw moment is (0,20), and the unit is N.m;
step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):
r=re+rps+rv+rm+rsw+rst (8)
the reward function is the core of the whole depth reinforcement learning algorithm and can guide the adjustment direction of the parameters of the depth neural network. Design principles are firstly given in design, and then specific reward functions are designed according to the design principles.
In this example, the reward function is set to 4 priorities, and the higher the priority is, the more important the principle is, the design principle is:
level 1: the invention aims to realize the stability control of the automobile, so that the guarantee of the stability of the automobile is a primary task;
and 2, stage: steering control has an advantage over braking control, so it is ensured that steering control is prioritized over braking control;
and 3, level: the stability of the automobile is controlled by using a smaller active steering angle or a smaller braking pressure as far as possible;
4, level: the automobile is in a stable area, and the action output is 0 as much as possible;
in the formula (8), reFor the error reward function, corresponding to the level 1 design principle, the smaller the error, the larger the reward value, in order to highlight the importance of the level 1 design principle, the rate of change of the error reward function should be the largest, so the quadratic function is designed as the level 1 reward function, and there are:
Figure GDA0002743759200000071
in the formula (9), the reaction mixture is,
Figure GDA0002743759200000072
as the yaw-rate error, there is,
Figure GDA0002743759200000073
is the error of the centroid slip angle and has:
Figure GDA0002743759200000074
Figure GDA0002743759200000075
in the formula (8), rpsFor a fixed prize value function, a higher prize value is achieved using steering control preferentially, corresponding to a level 2 design rule, and having:
Figure GDA0002743759200000076
in the formula (8), rvFor the speed difference reward function, corresponding to the level 2 control principle, steering has less influence on speed than braking, a larger reward value can be obtained, and there are:
Figure GDA0002743759200000077
in the formula (8), rmFor the additional yaw moment reward function, corresponding to a 3-level design principle, there are:
Figure GDA0002743759200000078
in the formula (8), rswFor the correction angle reward function, corresponding to a 3-level design principle, there are:
Figure GDA0002743759200000079
in the formula (8), rstFor the stable domain reward function, corresponding to the 4-level design principle, in the stable domain, the smaller the motion, the larger the reward, and there are:
Figure GDA00027437592000000710
and 7: constructing a network model of the deep reinforcement learning method:
step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n1M of individual neuron1A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θμ
Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n2M of individual neuron2A layer-hiding layer, wherein the m-th layer2The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; first stageThe initialized evaluation network parameter is thetaQ
Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter thetaμ′=θμConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter thetaQ′=θQ
And 8: n samples were formed from the ith sample:
initializing the ith vehicle state parameter siAnd with the ith vehicle state parameter siAs input to the motion network model, μ(s) is output by the motion network modeliμ);
Obtaining the ith vehicle motion parameter a by using the formula (17)i
ai=μ(siμ)+Ni (17)
In the formula (17), NiRepresenting the ith random noise;
obtaining the ith vehicle reward value r according to the formula (8)iAnd obtaining an updated ith vehicle state parameter s'i(ii) a Thus obtaining the ith sample, denoted as(s)i,ai,ri,s′i) Further obtaining N samples;
and step 9: as shown in fig. 2, the network model of the deep reinforcement learning method is trained with N samples:
step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;
step 9.2: with the ith vehicle state parameter siAs the input of the current ith motion network model, the current ith motion network model outputs the ith output value mu(s)iμ);
With the ith vehicle state parameter siIth vehicle motion parameter aiAnd the ith output value mu(s) of the action networkiμ) All used as the input of the current ith evaluation network model, and are measured by the ith vehicle state parameter siAnd ith vehicle motion parameter aiOutputting the ith output value Q through the current ith evaluation network modeli(ai) (ii) a From the ith output value mu(s) of the motion network modeliμ) Outputting the ith output value Q through the current ith evaluation network modeli(μ(siμ));
With the updated ith vehicle state parameter s'iAs an input of the current ith target motion network model, the ith output value mu (s 'is output by the current ith target motion network model'iμ′);
With the updated ith vehicle state parameter s'iAnd ith output value mu (s ') of target action network model'iμ′) As the input of the current ith target evaluation network model, the ith output value Q 'is output by the current ith target evaluation network model'i(a′i);
According to the ith output value Q of the current ith evaluation network modeli(μ(siμ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain the ith updated action network model as an (i + 1) th action network model;
output Q of the current ith evaluation network modeli(ai) And the output Q of the current ith target evaluation network modeli′(ai') updating the current ith evaluation network model by using a minimum loss function so as to obtain the ith updated evaluation network model as an i +1 th evaluation network model;
step 9.3: assigning i +1 to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution;
step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:
Figure GDA0002743759200000091
Figure GDA0002743759200000092
in the formula (18), k1To stabilize the domain first boundary coefficient, k2A second boundary coefficient of the stable domain;
Figure GDA0002743759200000093
is the centroid slip angular velocity;
in the formula (19), epsilon is an adjustable parameter;
step 11: obtaining the current state parameter s of the vehicletAs input of the optimal action network model, thereby outputting the current additional yaw moment by using the optimal action network model
Figure GDA0002743759200000094
And correcting the rotation angle
Figure GDA0002743759200000095
Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;
wd×(w-wd)>0 (20)
step 13: if deltafIf greater than 0, the angle is corrected
Figure GDA0002743759200000096
Is directed leftward, if deltafIf < 0, then let the correction corner
Figure GDA0002743759200000097
To the right;
step 14: if deltafIf greater than 0, the angle is corrected
Figure GDA0002743759200000098
Is directed to the right, if deltafIf < 0, then let the correction corner
Figure GDA0002743759200000099
To the left.

Claims (2)

1. An intelligent automobile stability control method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controllerfAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axlesfAnd LrFront and rear wheel side yaw stiffness C1And C2The mass m of the automobile;
acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;
step 2: calculating the ideal yaw rate w using equation (1)d
Figure FDA0002743759190000011
In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:
Figure FDA0002743759190000012
and step 3: calculating the ideal centroid slip angle beta by using the formula (3)d
βd=-min{|β|,|βmax|}·sign(δf) (3)
In the formula (3), beta is the vehicle mass center slip angle, betamaxIs the maximum centroid slip angle of the vehicle and has:
Figure FDA0002743759190000013
Figure FDA0002743759190000014
and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):
s={w,β,sw,wdd} (6)
and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):
Figure FDA0002743759190000015
in the formula (7), the reaction mixture is,
Figure FDA0002743759190000016
the angle of rotation is corrected for the steering wheel,
Figure FDA0002743759190000017
an additional yaw moment;
step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):
r=re+rps+rv+rm+rsw+rst (8)
in the formula (8), reIs an error reward function and has:
Figure FDA0002743759190000018
in the formula (9), the reaction mixture is,
Figure FDA0002743759190000021
as the yaw-rate error, there is,
Figure FDA0002743759190000022
error of centroid slip anglePoor, combined with:
Figure FDA0002743759190000023
Figure FDA0002743759190000024
in the formula (8), rpsIs a fixed prize value function and has:
Figure FDA0002743759190000025
in the formula (8), rvA function is awarded for the speed difference and has:
Figure FDA0002743759190000026
in the formula (8), rmA function is awarded for the additional yaw moment and having:
Figure FDA0002743759190000027
in the formula (8), rswFor correcting the angle reward function, and have:
Figure FDA0002743759190000028
in the formula (8), rstA reward function for the stable domain, and having:
Figure FDA0002743759190000029
and 7: constructing a network model of the deep reinforcement learning method:
step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n1M of individual neuron1A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θμ
Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n2M of individual neuron2A layer-hiding layer, wherein the m-th layer2The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; initializing evaluation network parameter as thetaQ
Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter thetaμ′=θμConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter thetaQ′=θQ
And 8: n samples were formed from the ith sample:
initializing the ith vehicle state parameter siAnd with the ith vehicle state parameter siAs input to the motion network model, outputting μ(s) by the motion network modeliμ);
Obtaining the ith vehicle motion parameter a by using the formula (17)i
ai=μ(siμ)+Ni (17)
In the formula (17), NiRepresenting the ith random noise;
obtaining the ith vehicle reward value r according to the formula (8)iAnd obtaining the updated ith vehicle state parameter si'; thus obtaining the ith sample, denoted as(s)i,ai,ri,s′i) Further obtaining N samples;
and step 9: training the network model of the deep reinforcement learning method by using the N samples so as to obtain an optimal action network model and an optimal evaluation network model;
step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:
Figure FDA0002743759190000031
Figure FDA0002743759190000032
in the formula (18), k1To stabilize the domain first boundary coefficient, k2A second boundary coefficient of the stable domain;
Figure FDA0002743759190000033
is the centroid slip angular velocity;
in the formula (19), epsilon is an adjustable parameter;
step 11: obtaining the current state parameter s of the vehicletAs input to the optimal motion network model, thereby outputting the current additional yaw moment using the optimal motion network model
Figure FDA0002743759190000034
And correcting the rotation angle
Figure FDA0002743759190000035
Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;
wd×(w-wd)>0 (20)
step 13: if deltafIf greater than 0, the angle is corrected
Figure FDA0002743759190000036
Is directed leftward, if deltafIf < 0, then let the correction corner
Figure FDA0002743759190000037
To the right;
step 14: if deltafIf greater than 0, the angle is corrected
Figure FDA0002743759190000038
Is directed to the right, if deltafIf < 0, then let the correction corner
Figure FDA0002743759190000039
To the left.
2. The intelligent vehicle stability control method according to claim 1, wherein the step 9 is performed as follows:
step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;
step 9.2: with said i-th vehicle state parameter siAs input to the current i-th motion network model, an i-th output value μ(s) is output by the current i-th motion network modeliμ);
With said i-th vehicle state parameter siIth vehicle motion parameter aiAnd the ith output value mu(s) of the action networkiμ) All as the input of the current ith evaluation network model, and the ith vehicle state parameter siAnd ith vehicle motion parameter aiOutputting an ith output value Q through the current ith evaluation network modeli(ai) (ii) a From the ith output value mu(s) of the motion network modeliμ) Outputting an ith output value Q through the current ith evaluation network modeli(μ(siμ));
With the updated ith vehicle state parameter s'iAs input to the current i-th target action network modelThe current ith target motion network model outputs an ith output value mu (s'iμ′);
With the updated ith vehicle state parameter s'iAnd ith output value mu (s ') of target action network model'iμ′) As an input of the current ith target evaluation network model, outputting an ith output value Q 'by the current ith target evaluation network model'i(a′i);
According to the ith output value Q of the current ith evaluation network modeli(μ(siμ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain an ith updated action network model as an (i + 1) th action network model;
output Q according to the current ith evaluation network modeli(ai) And output Q 'of the current ith target evaluation network model'i(a′i) Updating the current ith evaluation network model by using a minimum loss function so as to obtain an ith updated evaluation network model which is used as an (i + 1) th evaluation network model;
step 9.3: and (3) after i +1 is assigned to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution.
CN201910809910.7A 2019-08-29 2019-08-29 Intelligent automobile stability control method based on deep reinforcement learning Active CN110450771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910809910.7A CN110450771B (en) 2019-08-29 2019-08-29 Intelligent automobile stability control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910809910.7A CN110450771B (en) 2019-08-29 2019-08-29 Intelligent automobile stability control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN110450771A CN110450771A (en) 2019-11-15
CN110450771B true CN110450771B (en) 2021-03-09

Family

ID=68489893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910809910.7A Active CN110450771B (en) 2019-08-29 2019-08-29 Intelligent automobile stability control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN110450771B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4183667A1 (en) * 2021-11-18 2023-05-24 Volvo Truck Corporation Method for closed loop control of a position of a fifth wheel of a vehicle

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110843746B (en) * 2019-11-28 2022-06-14 的卢技术有限公司 Anti-lock brake control method and system based on reinforcement learning
CN111605542B (en) * 2020-05-06 2021-11-23 南京航空航天大学 Vehicle stability system based on safety boundary and control method
CN111746633B (en) * 2020-07-02 2022-06-17 南京航空航天大学 Vehicle distributed steering driving system control method based on reinforcement learning
CN111873991B (en) * 2020-07-22 2022-04-08 中国第一汽车股份有限公司 Vehicle steering control method, device, terminal and storage medium
CN112861269B (en) * 2021-03-11 2022-08-30 合肥工业大学 Automobile longitudinal multi-state control method based on deep reinforcement learning preferential extraction
CN113386790B (en) * 2021-06-09 2022-07-12 扬州大学 Automatic driving decision-making method for cross-sea bridge road condition
CN115123159A (en) * 2022-06-27 2022-09-30 重庆邮电大学 AEB control method and system based on DDPG deep reinforcement learning

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4021185B2 (en) * 2001-12-07 2007-12-12 本田技研工業株式会社 Yaw moment feedback control method
KR100684033B1 (en) * 2002-02-23 2007-02-16 주식회사 만도 Method for controlling the stability of vehicles
US7143864B2 (en) * 2002-09-27 2006-12-05 Ford Global Technologies, Llc. Yaw control for an automotive vehicle using steering actuators
DE102011010491A1 (en) * 2011-02-07 2012-08-09 Audi Ag Method for activating electronics stability control device of e.g. all-wheel driven motor car, involves determining function such that function value increases with increasing lateral force of front and rear wheels
CN105253141B (en) * 2015-09-09 2017-10-27 北京理工大学 A kind of vehicle handling stability control method adjusted based on wheel longitudinal force
CN106828464A (en) * 2017-01-06 2017-06-13 合肥工业大学 A kind of vehicle body stable control method and system based on coefficient of road adhesion estimation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4183667A1 (en) * 2021-11-18 2023-05-24 Volvo Truck Corporation Method for closed loop control of a position of a fifth wheel of a vehicle

Also Published As

Publication number Publication date
CN110450771A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN110450771B (en) Intelligent automobile stability control method based on deep reinforcement learning
CN101054092B (en) Driver workload-based vehicle stability enhancement control
JP4918148B2 (en) Vehicle motion control device
CN103057436B (en) Yawing moment control method of individual driven electromobile based on multi-agent
Goodarzi et al. Automatic path control based on integrated steering and external yaw-moment control
CN109733400B (en) Method, device and apparatus for distributing driving torque in a vehicle
CN106515716A (en) Coordination control device and method for chassis integrated control system of wheel driving electric vehicle
CN108694283A (en) A kind of forecast Control Algorithm and system for improving electric vehicle lateral stability
Jalali Stability control of electric vehicles with in-wheel motors
JP2014144681A (en) Vehicular driving force control unit
Kim et al. Active yaw control for handling performance improvement by using traction force
CN102729999A (en) Vehicle vibration control device and vehicle vibration control method
CN102971201A (en) Method for determining a toothed rack force for a steering device in a vehicle
US8442736B2 (en) System for enhancing cornering performance of a vehicle controlled by a safety system
CN106672072A (en) Control method for steer-by-wire automobile active front-wheel steering control system
JP5559833B2 (en) Vehicle motion control apparatus and method using jerk information
Kanchwala et al. Vehicle handling control of an electric vehicle using active torque distribution and rear wheel steering
CN114212074B (en) Vehicle active steering rollover prevention control method based on road adhesion coefficient estimation
JP2010149740A (en) Vehicle motion control apparatus
JP2011218953A (en) Device for control of drive force
Rengaraj Integration of active chassis control systems for improved vehicle handling performance
Chen et al. A compensated yaw-moment-based vehicle stability controller
Hou et al. Integrated chassis control using ANFIS
Hakima et al. Improvement of vehicle handling by an integrated control system of four wheel steering and ESP with fuzzy logic approach
JP2004203084A (en) Motion control device of vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant