CN110450771B - Intelligent automobile stability control method based on deep reinforcement learning - Google Patents
Intelligent automobile stability control method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN110450771B CN110450771B CN201910809910.7A CN201910809910A CN110450771B CN 110450771 B CN110450771 B CN 110450771B CN 201910809910 A CN201910809910 A CN 201910809910A CN 110450771 B CN110450771 B CN 110450771B
- Authority
- CN
- China
- Prior art keywords
- network model
- ith
- formula
- vehicle
- current
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000002787 reinforcement Effects 0.000 title claims abstract description 30
- 230000009471 action Effects 0.000 claims abstract description 56
- 230000006870 function Effects 0.000 claims abstract description 31
- 238000012937 correction Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims abstract description 4
- 238000011156 evaluation Methods 0.000 claims description 53
- 210000002569 neuron Anatomy 0.000 claims description 12
- 239000011541 reaction mixture Substances 0.000 claims description 6
- 230000001133 acceleration Effects 0.000 claims description 4
- 230000005484 gravity Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 description 12
- 230000008901 benefit Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011217 control strategy Methods 0.000 description 1
- 230000001808 coupling effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012887 quadratic function Methods 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W10/00—Conjoint control of vehicle sub-units of different type or different function
- B60W10/18—Conjoint control of vehicle sub-units of different type or different function including control of braking systems
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W10/00—Conjoint control of vehicle sub-units of different type or different function
- B60W10/20—Conjoint control of vehicle sub-units of different type or different function including control of steering systems
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
- B60W30/02—Control of vehicle driving stability
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/02—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to ambient conditions
- B60W40/06—Road conditions
- B60W40/068—Road friction coefficient
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/10—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/10—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
- B60W40/105—Speed
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/12—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to parameters of the vehicle itself, e.g. tyre models
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W40/00—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
- B60W40/12—Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to parameters of the vehicle itself, e.g. tyre models
- B60W40/13—Load or weight
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/0098—Details of control systems ensuring comfort, safety or stability not otherwise provided for
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0002—Automatic control, details of type of controller or control system architecture
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0043—Signal treatments, identification of variables or parameters, parameter estimation or state estimation
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2520/00—Input parameters relating to overall vehicle dynamics
- B60W2520/10—Longitudinal speed
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2530/00—Input parameters relating to vehicle conditions or values, not covered by groups B60W2510/00 or B60W2520/00
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2530/00—Input parameters relating to vehicle conditions or values, not covered by groups B60W2510/00 or B60W2520/00
- B60W2530/10—Weight
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2710/00—Output or target parameters relating to a particular sub-units
- B60W2710/18—Braking system
- B60W2710/182—Brake pressure, e.g. of fluid or between pad and disc
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2710/00—Output or target parameters relating to a particular sub-units
- B60W2710/20—Steering systems
Landscapes
- Engineering & Computer Science (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
- Human Computer Interaction (AREA)
- Steering Control In Accordance With Driving Conditions (AREA)
Abstract
The invention discloses an intelligent automobile stability control method based on deep reinforcement learning, which comprises the following steps: 1, acquiring decision output, vehicle structure parameters and driving parameters of a transverse controller of an automobile; 2, defining state parameters, action parameters and reward functions of the deep reinforcement learning method; 3, constructing and training a network model of the deep reinforcement learning method to obtain an optimal action network model; 4 obtaining the current state parameter s of the automobiletThereby outputting a current additional yaw moment v M using the optimal action network modeltAnd correcting the corner ^ δt(ii) a 5, judging the stable state of the automobile; 6, determining the current correction corner v according to the steering property of the automobile and the corner direction of the steering wheeltDirection and current additional yaw moment MtThe action wheel of (2). The invention can realize the optimal coordination control law between direct yaw moment control and steering control under the stable working condition and the limit working condition, thereby realizing the stability control of the vehicle and ensuring the safety and comfort of drivers and passengers.
Description
Technical Field
The invention relates to the field of automobile dynamics control, in particular to an intelligent automobile stability control method based on deep reinforcement learning.
Background
When the automobile turns, the tire slip angle is increased, the lateral force is increased, the automobile can run according to the intention of a driver, but under some low-adhesion and sharp-turning working conditions, the lateral force of the automobile easily reaches the adhesion limit, and the automobile can generate dangerous working conditions such as sideslip, sharp turning and side turning. Currently, the main ways to intervene in the above mentioned dangerous conditions are active steering control and direct yaw moment control. The active steering control is to change the yaw moment of the vehicle by inputting a correction steering angle to the steering wheel; the direct yaw moment control is mainly to adjust the understeer or oversteer of the vehicle by adjusting the wheel braking forces to form a braking force difference, thereby generating an additional yaw moment.
The influence of active steering and direct yaw moment control on the performance of the automobile has advantages and disadvantages, the influence of independent active steering control on the speed of the automobile is small, the comfort of drivers and passengers is ensured, but the effect is poor under the extreme working condition, the stability of the automobile cannot be controlled, and the safety requirement of the drivers and passengers cannot be met; the independent direct yaw moment control system can ensure the safety of drivers and passengers under the limit working condition, but has larger influence on the longitudinal acceleration of the vehicle, and can not meet the comfort requirement of the drivers and passengers. The vehicle is used as a complex nonlinear system, a plurality of coupling effects exist among the systems, relatively optimal control outputs exist for controlling the stability of the vehicle in each state of the vehicle, the optimal control outputs are not simple linear relations, and the safety and comfort of drivers and passengers cannot be well guaranteed by designing a linear coordination controller.
Disclosure of Invention
The invention aims to solve the defects of the prior art and provides an intelligent automobile stability control method based on deep reinforcement learning, so that the optimal coordination control law between direct yaw moment control and steering control under stable working conditions and extreme working conditions can be realized, the stability control of a vehicle is realized, and the safety and the comfort of drivers and passengers are ensured.
The invention adopts the following technical scheme for solving the technical problems:
the invention relates to an intelligent automobile stability control method based on deep reinforcement learning, which is characterized by comprising the following steps of:
step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controllerfAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axlesfAnd LrFront and rear wheel side yaw stiffness C1And C2The mass m of the automobile;
acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;
step 2: calculating the ideal yaw rate w using equation (1)d:
In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:
and step 3: calculating the ideal centroid slip angle beta by using the formula (3)d:
βd=-min{|β|,|βmax|}·sign(δf) (3)
In the formula (3), beta is the vehicle mass center slip angle, betamaxIs the maximum centroid slip angle of the vehicle and has:
and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):
s={w,β,sw,wd,βd} (6)
and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):
in the formula (7), the reaction mixture is,the angle of rotation is corrected for the steering wheel,an additional yaw moment;
step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):
r=re+rps+rv+rm+rsw+rst (8)
in the formula (8), reIs an error reward function and has:
in the formula (9), the reaction mixture is,as the yaw-rate error, there is,is the error of the centroid slip angle and has:
in the formula (8), rpsIs a fixed prize value function and has:
in the formula (8), rvA function is awarded for the speed difference and has:
in the formula (8), rmA function is awarded for the additional yaw moment and having:
in the formula (8), rswFor correcting the angle reward function, and have:
in the formula (8), rstA reward function for the stable domain, and having:
and 7: constructing a network model of the deep reinforcement learning method:
step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n1M of individual neuron1A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θμ;
Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n2M of individual neuron2A layer-hiding layer, wherein the m-th layer2The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; initializing evaluation network parameter as thetaQ;
Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter thetaμ′=θμConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter thetaQ′=θQ;
And 8: n samples were formed from the ith sample:
initializing the ith vehicle state parameter siAnd with the ith vehicle state parameter siAs input to the motion network model, outputting μ(s) by the motion network modeli|θμ);
Obtaining the ith vehicle motion parameter a by using the formula (17)i:
ai=μ(si|θμ)+Ni (17)
In the formula (17), NiRepresenting the ith random noise;
obtaining the ith vehicle reward value r according to the formula (8)iAnd obtaining an updated ith vehicle state parameter s'i(ii) a Thus obtaining the ith sample, denoted as(s)i,ai,ri,s′i) Further obtaining N samples;
and step 9: training the network model of the deep reinforcement learning method by using the N samples so as to obtain an optimal action network model and an optimal evaluation network model;
step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:
in the formula (18), k1To stabilize the domain first boundary coefficient, k2A second boundary coefficient of the stable domain;is the centroid slip angular velocity;
in the formula (19), epsilon is an adjustable parameter;
step 11: obtaining the current state parameter s of the vehicletAs input to the optimal motion network model, thereby outputting the current additional yaw moment using the optimal motion network modelAnd correcting the rotation angle
Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;
wd×(w-wd)>0 (20)
step 13: if deltafIf greater than 0, the angle is correctedIs directed leftward, if deltafIf < 0, then let the correction cornerTo the right;
step 14: if deltafIf greater than 0, the angle is correctedIs directed to the right, if deltafIf < 0, then let the correction cornerTo the left.
The intelligent automobile stability control method is also characterized in that the step 9 is carried out according to the following process:
step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;
step 9.2: with said i-th vehicle state parameter siAs input to the current i-th motion network model, an i-th output value μ(s) is output by the current i-th motion network modeli|θμ);
With said i-th vehicle state parameter siIth vehicle motion parameter aiAnd the ith output value mu(s) of the action networki|θμ) All as the input of the current ith evaluation network model, and the ith vehicle state parameter siAnd ith vehicle motion parameter aiOutputting an ith output value Q through the current ith evaluation network modeli(ai) (ii) a From the ith output value mu(s) of the motion network modeli|θμ) Outputting an ith output value Q through the current ith evaluation network modeli(μ(si|θμ));
With the updated ith vehicle state parameter s'iOutputting an ith output value mu (s ') by the current ith target action network model as an input of the current ith target action network model'i|θμ′);
With the updated ith vehicle state parameter s'iAnd ith output value mu (s ') of target action network model'i|θμ′) As an input of the current ith target evaluation network model, outputting an ith output value Q 'by the current ith target evaluation network model'i(a′i);
According to the ith output value Q of the current ith evaluation network modeli(μ(si|θμ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain an ith updated action network model as an (i + 1) th action network model;
output Q according to the current ith evaluation network modeli(ai) And output Q 'of the current ith target evaluation network model'i(a′i) Updating the current ith evaluation network model by using a minimum loss function so as to obtain an ith updated evaluation network model which is used as an (i + 1) th evaluation network model;
step 9.3: and (3) after i +1 is assigned to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention utilizes the advantages of model-free and generalization prediction of a deep reinforcement learning algorithm to determine the input state and the output action related to the vehicle stability control, designs a reward function suitable for coordination control, constructs and trains an optimal action network model, and thus, the model can be utilized to decide an optimal coordination stability control strategy under stable working conditions and extreme working conditions, thereby realizing the vehicle stability control and ensuring the safety and comfort of drivers and passengers;
2. the deep reinforcement learning algorithm based on the invention does not need to design an algorithm model based on a vehicle model, and the adopted deep neural network has strong nonlinear expression capability, can express the nonlinear relation between the vehicle state and active steering and differential braking control, and is more in line with the real situation compared with a linear controller designed based on a simplified vehicle model;
3. compared with the control method without control, the active steering control, the direct yaw moment control and the linear distribution coordination control, the control method has better control effect under different working conditions, better robustness and better comfort under the limit working condition.
Drawings
FIG. 1 is an intelligent vehicle stability control system based on deep reinforcement learning according to the present invention;
FIG. 2 is a diagram of a training process of the deep reinforcement learning method of the present invention.
Detailed Description
In this embodiment, the intelligent automobile stability control method based on deep reinforcement learning can make a decision on a current correction turning angle and an additional yaw moment according to current state parameters of an automobile, so that automobile stability coordination control is realized. Specifically, as shown in fig. 1, the method comprises the following steps:
step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controllerfAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axlesfAnd LrFront and rear wheel side yaw stiffness C1And C2The mass m of the automobile;
acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;
step 2: calculating the ideal yaw rate w using equation (1)d:
In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:
and step 3: calculating the ideal centroid slip angle beta by using the formula (3)d:
βd=-min{|β|,|βmax|}·sign(δf) (3)
In the formula (3), beta is the vehicle mass center slip angle, betamaxFor maximum centroid slip angle of vehicleAnd has the following components:
and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):
s={w,β,sw,wd,βd} (6)
and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):
in the formula (7), the reaction mixture is,the steering angle is corrected for the steering wheel, the value range is (0,20), the unit is degree,the value range of the additional yaw moment is (0,20), and the unit is N.m;
step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):
r=re+rps+rv+rm+rsw+rst (8)
the reward function is the core of the whole depth reinforcement learning algorithm and can guide the adjustment direction of the parameters of the depth neural network. Design principles are firstly given in design, and then specific reward functions are designed according to the design principles.
In this example, the reward function is set to 4 priorities, and the higher the priority is, the more important the principle is, the design principle is:
level 1: the invention aims to realize the stability control of the automobile, so that the guarantee of the stability of the automobile is a primary task;
and 2, stage: steering control has an advantage over braking control, so it is ensured that steering control is prioritized over braking control;
and 3, level: the stability of the automobile is controlled by using a smaller active steering angle or a smaller braking pressure as far as possible;
4, level: the automobile is in a stable area, and the action output is 0 as much as possible;
in the formula (8), reFor the error reward function, corresponding to the level 1 design principle, the smaller the error, the larger the reward value, in order to highlight the importance of the level 1 design principle, the rate of change of the error reward function should be the largest, so the quadratic function is designed as the level 1 reward function, and there are:
in the formula (9), the reaction mixture is,as the yaw-rate error, there is,is the error of the centroid slip angle and has:
in the formula (8), rpsFor a fixed prize value function, a higher prize value is achieved using steering control preferentially, corresponding to a level 2 design rule, and having:
in the formula (8), rvFor the speed difference reward function, corresponding to the level 2 control principle, steering has less influence on speed than braking, a larger reward value can be obtained, and there are:
in the formula (8), rmFor the additional yaw moment reward function, corresponding to a 3-level design principle, there are:
in the formula (8), rswFor the correction angle reward function, corresponding to a 3-level design principle, there are:
in the formula (8), rstFor the stable domain reward function, corresponding to the 4-level design principle, in the stable domain, the smaller the motion, the larger the reward, and there are:
and 7: constructing a network model of the deep reinforcement learning method:
step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n1M of individual neuron1A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θμ;
Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n2M of individual neuron2A layer-hiding layer, wherein the m-th layer2The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; first stageThe initialized evaluation network parameter is thetaQ;
Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter thetaμ′=θμConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter thetaQ′=θQ;
And 8: n samples were formed from the ith sample:
initializing the ith vehicle state parameter siAnd with the ith vehicle state parameter siAs input to the motion network model, μ(s) is output by the motion network modeli|θμ);
Obtaining the ith vehicle motion parameter a by using the formula (17)i:
ai=μ(si|θμ)+Ni (17)
In the formula (17), NiRepresenting the ith random noise;
obtaining the ith vehicle reward value r according to the formula (8)iAnd obtaining an updated ith vehicle state parameter s'i(ii) a Thus obtaining the ith sample, denoted as(s)i,ai,ri,s′i) Further obtaining N samples;
and step 9: as shown in fig. 2, the network model of the deep reinforcement learning method is trained with N samples:
step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;
step 9.2: with the ith vehicle state parameter siAs the input of the current ith motion network model, the current ith motion network model outputs the ith output value mu(s)i|θμ);
With the ith vehicle state parameter siIth vehicle motion parameter aiAnd the ith output value mu(s) of the action networki|θμ) All used as the input of the current ith evaluation network model, and are measured by the ith vehicle state parameter siAnd ith vehicle motion parameter aiOutputting the ith output value Q through the current ith evaluation network modeli(ai) (ii) a From the ith output value mu(s) of the motion network modeli|θμ) Outputting the ith output value Q through the current ith evaluation network modeli(μ(si|θμ));
With the updated ith vehicle state parameter s'iAs an input of the current ith target motion network model, the ith output value mu (s 'is output by the current ith target motion network model'i|θμ′);
With the updated ith vehicle state parameter s'iAnd ith output value mu (s ') of target action network model'i|θμ′) As the input of the current ith target evaluation network model, the ith output value Q 'is output by the current ith target evaluation network model'i(a′i);
According to the ith output value Q of the current ith evaluation network modeli(μ(si|θμ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain the ith updated action network model as an (i + 1) th action network model;
output Q of the current ith evaluation network modeli(ai) And the output Q of the current ith target evaluation network modeli′(ai') updating the current ith evaluation network model by using a minimum loss function so as to obtain the ith updated evaluation network model as an i +1 th evaluation network model;
step 9.3: assigning i +1 to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution;
step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:
in the formula (18), k1To stabilize the domain first boundary coefficient, k2A second boundary coefficient of the stable domain;is the centroid slip angular velocity;
in the formula (19), epsilon is an adjustable parameter;
step 11: obtaining the current state parameter s of the vehicletAs input of the optimal action network model, thereby outputting the current additional yaw moment by using the optimal action network modelAnd correcting the rotation angle
Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;
wd×(w-wd)>0 (20)
step 13: if deltafIf greater than 0, the angle is correctedIs directed leftward, if deltafIf < 0, then let the correction cornerTo the right;
Claims (2)
1. An intelligent automobile stability control method based on deep reinforcement learning is characterized by comprising the following steps:
step 1: obtaining front wheel turning angle delta of decision output of vehicle transverse controllerfAnd vehicle structural parameters including: vehicle wheel base L, distance L between center of mass and front and rear axlesfAnd LrFront and rear wheel side yaw stiffness C1And C2The mass m of the automobile;
acquiring vehicle running parameters, comprising: steering wheel angle sw, vehicle speed v and road surface friction coefficient mu;
step 2: calculating the ideal yaw rate w using equation (1)d:
In the formula (1), g is the gravity acceleration, w is the yaw velocity, and the following are provided:
and step 3: calculating the ideal centroid slip angle beta by using the formula (3)d:
βd=-min{|β|,|βmax|}·sign(δf) (3)
In the formula (3), beta is the vehicle mass center slip angle, betamaxIs the maximum centroid slip angle of the vehicle and has:
and 4, step 4: the vehicle state parameter s of the deep reinforcement learning method is defined by equation (6):
s={w,β,sw,wd,βd} (6)
and 5: the action parameter a of the deep reinforcement learning method is defined by equation (7):
in the formula (7), the reaction mixture is,the angle of rotation is corrected for the steering wheel,an additional yaw moment;
step 6: and (3) establishing a reward function r of the deep reinforcement learning method by using the formula (8):
r=re+rps+rv+rm+rsw+rst (8)
in the formula (8), reIs an error reward function and has:
in the formula (9), the reaction mixture is,as the yaw-rate error, there is,error of centroid slip anglePoor, combined with:
in the formula (8), rpsIs a fixed prize value function and has:
in the formula (8), rvA function is awarded for the speed difference and has:
in the formula (8), rmA function is awarded for the additional yaw moment and having:
in the formula (8), rswFor correcting the angle reward function, and have:
in the formula (8), rstA reward function for the stable domain, and having:
and 7: constructing a network model of the deep reinforcement learning method:
step 7.1: constructing an action network model, comprising: an input layer comprising a neuron, each comprising n1M of individual neuron1A layer hidden layer, comprising a layer output layer of 2 neurons; initializing an action network parameter to θμ;
Step 7.2: constructing an evaluation network model, comprising: two input layers each comprising 1 neuron, each comprising n2M of individual neuron2A layer-hiding layer, wherein the m-th layer2The layer hiding layer is a full connection layer and comprises a layer of output layer of 1 neuron; initializing evaluation network parameter as thetaQ;
Step 7.3: constructing a target action network model with the same structure as the action network model, and enabling a target action network parameter thetaμ′=θμConstructing a target evaluation network model with the same structure as the evaluation network model, and enabling a target evaluation network parameter thetaQ′=θQ;
And 8: n samples were formed from the ith sample:
initializing the ith vehicle state parameter siAnd with the ith vehicle state parameter siAs input to the motion network model, outputting μ(s) by the motion network modeli|θμ);
Obtaining the ith vehicle motion parameter a by using the formula (17)i:
ai=μ(si|θμ)+Ni (17)
In the formula (17), NiRepresenting the ith random noise;
obtaining the ith vehicle reward value r according to the formula (8)iAnd obtaining the updated ith vehicle state parameter si'; thus obtaining the ith sample, denoted as(s)i,ai,ri,s′i) Further obtaining N samples;
and step 9: training the network model of the deep reinforcement learning method by using the N samples so as to obtain an optimal action network model and an optimal evaluation network model;
step 10: judging whether the expression (18) and the expression (19) are both established, if so, indicating that the automobile is in a stable state, otherwise, indicating that the automobile is in an unstable state, and executing the step 11:
in the formula (18), k1To stabilize the domain first boundary coefficient, k2A second boundary coefficient of the stable domain;is the centroid slip angular velocity;
in the formula (19), epsilon is an adjustable parameter;
step 11: obtaining the current state parameter s of the vehicletAs input to the optimal motion network model, thereby outputting the current additional yaw moment using the optimal motion network modelAnd correcting the rotation angle
Step 12: judging whether the formula (20) is established, if so, indicating that the steering property of the automobile is understeer, making the action wheels as inner rear wheels, and executing the step 13, otherwise, indicating that the steering property of the automobile is oversteer, making the action wheels as outer front wheels, and executing the step 14;
wd×(w-wd)>0 (20)
step 13: if deltafIf greater than 0, the angle is correctedIs directed leftward, if deltafIf < 0, then let the correction cornerTo the right;
2. The intelligent vehicle stability control method according to claim 1, wherein the step 9 is performed as follows:
step 9.1: initializing a learning rate parameter as alpha and a return rate parameter as gamma; initializing i to 1;
step 9.2: with said i-th vehicle state parameter siAs input to the current i-th motion network model, an i-th output value μ(s) is output by the current i-th motion network modeli|θμ);
With said i-th vehicle state parameter siIth vehicle motion parameter aiAnd the ith output value mu(s) of the action networki|θμ) All as the input of the current ith evaluation network model, and the ith vehicle state parameter siAnd ith vehicle motion parameter aiOutputting an ith output value Q through the current ith evaluation network modeli(ai) (ii) a From the ith output value mu(s) of the motion network modeli|θμ) Outputting an ith output value Q through the current ith evaluation network modeli(μ(si|θμ));
With the updated ith vehicle state parameter s'iAs input to the current i-th target action network modelThe current ith target motion network model outputs an ith output value mu (s'i|θμ′);
With the updated ith vehicle state parameter s'iAnd ith output value mu (s ') of target action network model'i|θμ′) As an input of the current ith target evaluation network model, outputting an ith output value Q 'by the current ith target evaluation network model'i(a′i);
According to the ith output value Q of the current ith evaluation network modeli(μ(si|θμ) Updating the current ith action network model by utilizing a strategy gradient method so as to obtain an ith updated action network model as an (i + 1) th action network model;
output Q according to the current ith evaluation network modeli(ai) And output Q 'of the current ith target evaluation network model'i(a′i) Updating the current ith evaluation network model by using a minimum loss function so as to obtain an ith updated evaluation network model which is used as an (i + 1) th evaluation network model;
step 9.3: and (3) after i +1 is assigned to i, judging whether i is greater than N, if so, indicating that an optimal action network model and an optimal evaluation network model are obtained, otherwise, returning to the step 9.2 for execution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910809910.7A CN110450771B (en) | 2019-08-29 | 2019-08-29 | Intelligent automobile stability control method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910809910.7A CN110450771B (en) | 2019-08-29 | 2019-08-29 | Intelligent automobile stability control method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110450771A CN110450771A (en) | 2019-11-15 |
CN110450771B true CN110450771B (en) | 2021-03-09 |
Family
ID=68489893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910809910.7A Active CN110450771B (en) | 2019-08-29 | 2019-08-29 | Intelligent automobile stability control method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110450771B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4183667A1 (en) * | 2021-11-18 | 2023-05-24 | Volvo Truck Corporation | Method for closed loop control of a position of a fifth wheel of a vehicle |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110843746B (en) * | 2019-11-28 | 2022-06-14 | 的卢技术有限公司 | Anti-lock brake control method and system based on reinforcement learning |
CN111605542B (en) * | 2020-05-06 | 2021-11-23 | 南京航空航天大学 | Vehicle stability system based on safety boundary and control method |
CN111746633B (en) * | 2020-07-02 | 2022-06-17 | 南京航空航天大学 | Vehicle distributed steering driving system control method based on reinforcement learning |
CN111873991B (en) * | 2020-07-22 | 2022-04-08 | 中国第一汽车股份有限公司 | Vehicle steering control method, device, terminal and storage medium |
CN112861269B (en) * | 2021-03-11 | 2022-08-30 | 合肥工业大学 | Automobile longitudinal multi-state control method based on deep reinforcement learning preferential extraction |
CN113386790B (en) * | 2021-06-09 | 2022-07-12 | 扬州大学 | Automatic driving decision-making method for cross-sea bridge road condition |
CN115123159A (en) * | 2022-06-27 | 2022-09-30 | 重庆邮电大学 | AEB control method and system based on DDPG deep reinforcement learning |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4021185B2 (en) * | 2001-12-07 | 2007-12-12 | 本田技研工業株式会社 | Yaw moment feedback control method |
KR100684033B1 (en) * | 2002-02-23 | 2007-02-16 | 주식회사 만도 | Method for controlling the stability of vehicles |
US7143864B2 (en) * | 2002-09-27 | 2006-12-05 | Ford Global Technologies, Llc. | Yaw control for an automotive vehicle using steering actuators |
DE102011010491A1 (en) * | 2011-02-07 | 2012-08-09 | Audi Ag | Method for activating electronics stability control device of e.g. all-wheel driven motor car, involves determining function such that function value increases with increasing lateral force of front and rear wheels |
CN105253141B (en) * | 2015-09-09 | 2017-10-27 | 北京理工大学 | A kind of vehicle handling stability control method adjusted based on wheel longitudinal force |
CN106828464A (en) * | 2017-01-06 | 2017-06-13 | 合肥工业大学 | A kind of vehicle body stable control method and system based on coefficient of road adhesion estimation |
-
2019
- 2019-08-29 CN CN201910809910.7A patent/CN110450771B/en active Active
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP4183667A1 (en) * | 2021-11-18 | 2023-05-24 | Volvo Truck Corporation | Method for closed loop control of a position of a fifth wheel of a vehicle |
Also Published As
Publication number | Publication date |
---|---|
CN110450771A (en) | 2019-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110450771B (en) | Intelligent automobile stability control method based on deep reinforcement learning | |
CN101054092B (en) | Driver workload-based vehicle stability enhancement control | |
JP4918148B2 (en) | Vehicle motion control device | |
CN103057436B (en) | Yawing moment control method of individual driven electromobile based on multi-agent | |
Goodarzi et al. | Automatic path control based on integrated steering and external yaw-moment control | |
CN109733400B (en) | Method, device and apparatus for distributing driving torque in a vehicle | |
CN106515716A (en) | Coordination control device and method for chassis integrated control system of wheel driving electric vehicle | |
CN108694283A (en) | A kind of forecast Control Algorithm and system for improving electric vehicle lateral stability | |
Jalali | Stability control of electric vehicles with in-wheel motors | |
JP2014144681A (en) | Vehicular driving force control unit | |
Kim et al. | Active yaw control for handling performance improvement by using traction force | |
CN102729999A (en) | Vehicle vibration control device and vehicle vibration control method | |
CN102971201A (en) | Method for determining a toothed rack force for a steering device in a vehicle | |
US8442736B2 (en) | System for enhancing cornering performance of a vehicle controlled by a safety system | |
CN106672072A (en) | Control method for steer-by-wire automobile active front-wheel steering control system | |
JP5559833B2 (en) | Vehicle motion control apparatus and method using jerk information | |
Kanchwala et al. | Vehicle handling control of an electric vehicle using active torque distribution and rear wheel steering | |
CN114212074B (en) | Vehicle active steering rollover prevention control method based on road adhesion coefficient estimation | |
JP2010149740A (en) | Vehicle motion control apparatus | |
JP2011218953A (en) | Device for control of drive force | |
Rengaraj | Integration of active chassis control systems for improved vehicle handling performance | |
Chen et al. | A compensated yaw-moment-based vehicle stability controller | |
Hou et al. | Integrated chassis control using ANFIS | |
Hakima et al. | Improvement of vehicle handling by an integrated control system of four wheel steering and ESP with fuzzy logic approach | |
JP2004203084A (en) | Motion control device of vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |