CN112590774A

CN112590774A - Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning

Info

Publication number: CN112590774A
Application number: CN202011530836.4A
Authority: CN
Inventors: 冷搏; 刘铭; 熊璐; 余卓平
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2021-04-02
Anticipated expiration: 2040-12-22
Also published as: CN112590774B

Abstract

The invention relates to an intelligent electric automobile drifting storage control method based on deep reinforcement learning, which comprises the following steps of: 1) constructing a vehicle dynamics model for deep reinforcement learning and a tire model under a tire force saturation working condition; 2) and a TD3 algorithm facing the drifting storage control is adopted to realize the drifting storage of the intelligent electric automobile. Compared with the prior art, the invention has high control precision and good robustness, can ensure that the vehicle can accurately finish the drifting and warehousing actions, can ensure that the vehicle can accurately reach the warehouse location by continuously adjusting the steering wheel angle in the drifting process, and can actively change the central position of the warehouse location in the drifting process of the vehicle so as to ensure that the vehicle drifts towards the updated warehouse location.

Description

Intelligent electric automobile drifting and warehousing control method based on deep reinforcement learning

Technical Field

The invention relates to the field of automobile warehousing control, in particular to an intelligent electric automobile drifting warehousing control method based on deep reinforcement learning.

Background

The vehicle continues to run with the rear tires saturated and the rear axle slipping, called drift, in two different drift states:

(1) the rear axle drives and the rear wheel rotates in a sliding mode, the mass center slip angle of the vehicle and the vehicle speed can be kept at a constant value by controlling the driving force of the rear axle and the steering angle of the front wheel, so that the vehicle is in a stable state, and most of vehicles in the market are driven by the front axle, so that the research value of the drifting action in the state is relatively small.

(2) The method comprises the following steps that a drifting action is reproduced according to an open-loop control law and possibly interfered by an external environment and a self-vehicle state, so that a vehicle cannot drift to stop at a storage position, for example, a lateral displacement error and a course angle error exist in the process of approaching the storage position, the vehicle does not completely meet a preset drifting trigger pose state when the drifting action is triggered, a certain deviation exists, and the deviation is reserved until the drifting is finished according to the completion of the drifting action of an open-loop controller; in addition, because the response of the bottom-layer actuator is limited, the open-loop control cannot ensure that the response of the actuator is consistent every time, and the vehicle can deviate from a preset drift track when the response deviates; non-uniformity in the road surface causes sudden changes in tire force during the drift process, which changes the drift path.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide an intelligent electric vehicle drifting and warehousing control method based on deep reinforcement learning.

The purpose of the invention can be realized by the following technical scheme:

1. an intelligent electric vehicle drifting and warehousing control method based on deep reinforcement learning is characterized by comprising the following steps:

1) constructing a vehicle dynamics model for deep reinforcement learning and a tire model under a tire force saturation working condition;

2) and a TD3 algorithm facing the drifting storage control is adopted to realize the drifting storage of the intelligent electric automobile.

In the step 1), the vehicle dynamics model is specifically to consider front-back and left-right load transferThe four-wheel three-degree-of-freedom vehicle dynamics model comprises the speed v at the position of the mass center of the vehicle_mA centroid slip angle β and a yaw rate ω.

In the four-wheel three-degree-of-freedom vehicle dynamics model, the expression of four-wheel vertical force considering longitudinal and lateral acceleration is as follows:

in the formula, h_mIs the height of the center of mass, b_f、b_rA front and rear track width of a_x、a_yLongitudinal and lateral accelerations at the centre of mass, without taking account of the effect of the rotation of the body, F_zFL、F_zFR、F_zRL、F_zRRThe vertical forces of the left front wheel, the right front wheel, the left rear wheel and the right rear wheel are respectively, m is the mass of the electric automobile, g is the gravity acceleration, l is the wheelbase and l is the vertical force_f、l_rDistance of front and rear axes to centre of mass, F_xFL、F_xFR、F_xRL、F_xRRRespectively a left front vehicle, a right front vehicle, a left rear vehicle and a right rear vehicleLongitudinal force of the wheel, F_yFL、F_yFR、F_yRL、F_yRRThe lateral forces of the left front wheel, the right front wheel, the left rear wheel and the right rear wheel are respectively, and delta is the corner of the front wheel.

In the drifting process, considering that the load transfer is too large to cause a certain wheel to be lifted off the ground, the situation that the vertical load of the wheel is reduced to 0 and the load transfer reaches the upper limit occurs, when the steering wheel drifts to the left, the load transfers to the right, and when the left rear wheel is lifted off the ground, the vertical force of the left rear wheel is 0, at the moment, the load which is transferred too much is redistributed to the left front wheel and the right rear wheel according to the longitudinal lateral acceleration, the wheelbase and the wheelbase, and then the following steps are carried out:

ΔF_trans＝|F_zRL|

F′_zRL＝0

wherein, Δ F_transIs an excessively transferred load, F'_zRLIs assigned a vertical force, F ', of the rear left rear wheel'_zRRIs distributed as a vertical force, F'_zFLTo distribute the vertical force of the rear left front wheel.

The four-wheel three-degree-of-freedom vehicle dynamics model considering front-back and left-right load transfer is subjected to stress analysis, and the obtained vehicle dynamics balance equation is as follows:

φ＝β+ψ

the longitudinal speed v of the vehicle is obtained by calculation according to the above_mxAnd lateral vehicle speed v_myThen, there are:

v_mx＝v_m·cosβ

v_my＝v_m·sinβ

wherein the content of the first and second substances,

is the rate of change of speed at the vehicle's center of mass,

is the centroid side slip angular velocity, phi is the global azimuth angle of the vehicle speed at the centroid,

is the global azimuth velocity of the vehicle speed at the centroid,

is the rate of change of yaw angular velocity, psi is the global heading angle of the vehicle, I_zIs a yaw moment of inertia, v_xFor the longitudinal speed of the vehicle, v_yIs the vehicle lateral speed.

In the step 1), the tire model for deep reinforcement learning training comprises a front wheel tire force model and a rear wheel tire force model.

For the rear wheel tire force model, in the drifting process, the rear wheel is braked and locked and is purely rubbed on the road surface, the tire force direction of the rear wheel is opposite to the direction of the wheel center instantaneous speed, and the expression of the longitudinal and lateral tire force components of the rear wheel obtained by carrying out stress analysis on the rear wheel is as follows:

for the left rear wheel:

for the right rear wheel:

F_{r_sat}＝μ₁F_z

wherein v is_xRL、v_yRLRespectively the longitudinal and lateral speeds at the wheel center of the left and the rear wheels v_xRR、v_yRRRespectively the longitudinal and lateral speeds, lambda, at the wheel center of the right rear wheel_L、λ_RAre respectively the wheel center slip angles of the left and right rear wheels F_xRL、F_yRLLongitudinal and lateral forces of the left rear wheel, F_xRR、F_yRRLongitudinal and lateral forces of the right rear wheel, respectively, F_{rRL_sat}、F_{rRR_sat}Respectively, the horizontal saturated tire force of the left and right rear wheels, F_{r_sat}Represents the corresponding wheel horizontal saturated tire force, mu₁For road surface utilization of the coefficient of adhesion, F, when the wheels are locked_zIndicating the vertical force of the corresponding wheel.

For the front wheel tire force model, in the drifting process, the front wheel tire force is not saturated, and the improved Burckhardt tire model is adopted to fit the tire force to express the relation between the lateral force and the slip angle, and the following steps are provided:

wherein, theta₁～θ₅As a fitting parameter, α is a front wheel side slip angle;

left wheel slip angle alpha_LAnd a right wheel side slip angle alpha_RCan be obtained by the following formula:

the front wheel is in a free rolling state without applying braking force and driving force, and has F_xFL＝0，F_xFRWhen determining the front wheel tire force direction, only the lateral forces are taken into account, which is then perpendicular to the tire plane and is determined by the front wheel steering angle.

The step 2) specifically comprises the following steps:

21) designing a TD3 algorithm for drift storage control, and constructing an Actor network and a Critic network, wherein the method specifically comprises the following steps:

the Critic network and the Actor network are both BP neural networks composed of full connection layers, the input of the Critic network is the vehicle state and action, the output is Q value, the input of the Actor network is the vehicle state, the output is action, the vehicle state is a parameter representing the vehicle state in the drifting process, and the parameter comprises the base coordinates (e) under a relative coordinate system with the vehicle mass center as the origin and the locomotive orientation as the positive direction of the y axis_x、e_y) And the direction of the storage location

Velocity v at the center of mass of the vehicle_mThe mass center slip angle beta and the yaw angular velocity omega, wherein the action is the steering wheel rotation angle;

22) constructing a reward function r (k) having:

wherein, w_x、w_y、

Are each e_x、e_yAnd

k is time;

23) and training the Actor network and the Critic network, and finishing the drifting and warehousing of the intelligent electric automobile.

In step 23), before training the Actor network and Critic network, determining the boundary of the drifting warehousing controller, randomly taking a value of the target base position of each vehicle drifting according to the boundary, in iterative training, calculating the vehicle state by the vehicle according to the randomly selected target base position and orientation, training the Critic network and the Actor network according to the value, randomly updating the target base position in the training process, expanding the training data set and improving the capacity.

Compared with the prior art, the invention has the following advantages:

the intelligent electric automobile drifting and warehousing control method is designed based on the deep reinforcement learning TD3 algorithm, the control precision is improved, the problem that errors exist in drifting and warehousing due to the fact that the road surface is not uniform is solved, the center point of the warehouse position can be changed, vehicles can move to the updated warehouse position, and the robustness of a control system is improved.

And secondly, in the process of drifting and warehousing, the vehicle can adjust the pose by continuously adjusting the angle of the steering wheel, so that the vehicle can accurately drift and warehouse.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a diagram illustrating the state parameter definition of a part of the drift process.

Fig. 3 is a flow chart of a drift control algorithm based on deep reinforcement learning.

Detailed Description

The invention is described in detail below with reference to the figures and specific embodiments.

As shown in fig. 1, the invention provides an intelligent electric vehicle drifting and warehousing control method based on deep reinforcement learning, which comprises the following steps:

1) the method comprises the following steps of building a vehicle dynamics model and a tire model for deep reinforcement learning training, and specifically comprises the following steps:

11) building vehicle dynamics model for deep reinforcement learning

Four-wheel three-degree-of-freedom vehicle dynamics model considering front-back and left-right load transfer, wherein three degrees of freedom are respectively the speed v at the position of the vehicle mass center_mThe size beta of the centroid slip angle and the yaw angular velocity omega;

since the longitudinal lateral acceleration of the vehicle is large during the drift, the influence of the front-rear and left-right load transfer of the vehicle on the vertical force of the tire must be considered. The four-wheel vertical force calculation formula considering the longitudinal and lateral acceleration is as follows (1):

in the formula, h_mIs the height of the center of mass, b_f、b_rA front and rear track width, a_x、a_yThe longitudinal lateral acceleration without considering the influence of the rotation of the vehicle body at the centroid is obtained by the following formula (2):

in the drifting process, the situation that one wheel is lifted off the ground due to overlarge load transfer, the vertical load of the corresponding wheel is reduced to 0, and the load transfer reaches the upper limit needs to be considered. Due to the tail-flick braking process, the load is transferred towards the front axle, so only the possibility of the rear wheels coming off the ground is considered. If the steering wheel drifts to the left and the load is transferred to the right, there is a possibility that the left rear wheel will lift off the ground. When F is obtained by calculation according to a formula_zRLWhen the wheel vertical force is less than 0, the wheel vertical force is made to be 0, and the excessively transferred load is redistributed to the left front wheel and the right rear wheel according to the longitudinal lateral acceleration, the wheel base and the wheel base, and the formula is expressed as follows:

and (3) carrying out stress analysis on the vehicle model to obtain a vehicle dynamic balance equation as follows:

in the formula, delta is a front wheel corner; phi is the global azimuth angle of the vehicle speed at the centroid,

is the rate of change of velocity direction; psi is the global azimuth angle of the vehicle head,

the vehicle yaw angular velocity, namely the change rate of the direction of the vehicle head; according to phi ═ beta + psi and the integral of the above formula, the mass center speed v of the vehicle at each moment can be obtained_mThe mass center slip angle beta and the yaw angular velocity omega, and then the longitudinal and lateral vehicle speed of the vehicle is obtained according to the formula (8):

12) building tire model for deep reinforcement learning under tire force saturation working condition

Different from the driving conditions of the conventional working conditions, the rear wheel tire force is saturated during drifting, the lateral speed and the mass center lateral deviation angle of the vehicle body are large, and the longitudinal lateral vehicle speed is in a state of rapid change, so that the vehicle system is a strong nonlinear time-varying system with high longitudinal-lateral coupling, and the vehicle real-time saturated tire force is obtained by the formula (9):

F_{r_sat}＝μ₁F_z (9)

in the formula (I), the compound is shown in the specification,

for horizontal tire force resultant, μ₁For road surface utilization of the coefficient of adhesion, mu, when the wheels are locked₁＝0.9μ_maxI.e. a peak adhesion coefficient of 0.9 times, the peak adhesion coefficient being 1.

121) Rear wheel tire force model

In the process of locking and braking of the rear axle, the tire force is saturated, no matter how the magnitude of the slip angle changes, the magnitude of the longitudinal and lateral force resultant force is unchanged, the change of the slip angle can be ignored when the horizontal tire force of the rear wheel in the drifting process is solved, and the tire force of the rear axle in the drifting state can be directly solved.

Since the rear wheel brakes are locked and the wheel is purely frictional on the road, the tire force direction is determined by the wheel center speed direction, i.e. the tire force direction is opposite to the direction of the wheel center instantaneous speed. And (3) carrying out stress analysis on the rear wheel in the drifting process to obtain an expression of the longitudinal and lateral tire force component of the rear wheel:

left rear wheel:

the right rear wheel:

in the formula, v_xRL、v_yRLRespectively the longitudinal and lateral speeds at the wheel center of the left and the rear wheels v_xRR,v_yRRRespectively the longitudinal and lateral speeds of the wheel center of the right rear wheel; lambda [ alpha ]_L、λ_RRespectively are the wheel center slip angles of the left and right rear wheels; f_xRL、F_yRLAre respectively asLongitudinal and lateral forces of the left and rear wheels, F_xRR、F_yRRRespectively the longitudinal force and the lateral force of the right rear wheel; f_{rRL_sat}、F_{rRR_sat}The horizontal saturated tire force of the left and right rear wheels is respectively.

122) Front wheel tyre force model

The tire force of the front wheel is not saturated, the longitudinal side of the front wheel is decoupled, and the tire lateral force is obtained by adopting a tire model suitable for the quasi-static working condition. Fitting the tire force by adopting an improved Burckhardt tire model, and expressing the relation between the lateral force and the slip angle, wherein the following steps are carried out:

in the formula, theta₁～θ₅As a fitting parameter, α is a front wheel slip angle, and left and right wheel slip angles can be obtained by equations (13) and (14).

Since no braking force or driving force is applied, the front wheel is considered to be in a free rolling state, and the wheel longitudinal force is approximately 0, i.e., F_xFL＝0，F _xFR0. Only the lateral forces are considered in determining the front wheel tire force direction, so the front wheel tire force direction is perpendicular to the tire plane, determined by the front wheel steering angle.

2) TD3 algorithm design facing drift warehousing control.

In the drifting process, a deep reinforcement learning algorithm is adopted, an established drifting vehicle dynamics model is taken as a basis, and accurate drifting and warehousing of the vehicle are realized according to an end-to-end drifting controller, and the method specifically comprises the following steps:

in the TD3 algorithm, the input of a Critic network is the vehicle state and action, and the output is a Q value; the input of the Actor network is the vehicle state, and the output is the action, namely the steering wheel angle;

and selecting parameters representing the vehicle state in the drifting process as the input of the Critic network and the Actor network, wherein the parameters can uniquely represent the vehicle state at a certain moment in drifting and have dynamic correlation with the steering wheel angle input value. The 6 state parameters are: relative coordinate system lower position coordinate e taking vehicle mass center as origin and vehicle head orientation as positive y-axis direction_x、e_yAnd the direction of the storage location

Resultant velocity v of longitudinal lateral vehicle velocity of vehicle_mA centroid slip angle β and a yaw rate ω. e.g. of the type_x、e_yAnd

reflecting the difference between the current time position and the course angle of the vehicle and the expected state in the drift process, as shown in FIG. 2, v_mBeta and omega characterize the rate of change of the first three.

After the deep neural network trained by the reinforcement learning algorithm is determined, the reward function is designed so as to calculate reward values corresponding to different states of the vehicle in the drifting process. The reward function is designed as follows:

in the formula, w_x、w_y、

Are each e_x、e_yAnd

the weight of (c). Because the displacement error from the center of the storage position and the heading angle error from the orientation of the storage position when the vehicle is stopped stably are concerned, the third power of the vehicle speed is put in the denominator term, so that the lower the vehicle speed is and the closer the vehicle is to the stop, the longitudinal side of the vehicle can be made to be verticalThe larger the absolute value of the reward value calculated by the displacement error and the course angle error is. According to the algorithm principle, when the vehicle finally stops at a position far away from the storage position, a very small reward value is calculated; when the vehicle stops near the center of the garage, the calculated reward value is close to 0, and the target Q value corresponding to the preamble state and the action is larger. The Actor network maximizes the Q value as much as possible when calculating the steering wheel angle according to the vehicle state, so that the vehicle is finally parked in a garage. When the reward function is designed, the controlled quantity is put into the reward function, but in the process of drifting and warehousing, the steering wheel angle is always adjusted, which is a continuous process, the influence of one-time steering on the result of drifting and warehousing cannot be defined, and therefore the weight coefficient is set to be 0.

Before network training, the "boundary" of the on-board drift-in-garage controller is first determined, and it is believed that the vehicle's final position and final heading angle will not exceed this boundary regardless of the applied steering wheel angle.

And randomly taking values of the target storage position of each vehicle drift according to the controller boundary. When a complete drift process is finished, a random target library position (X) is set_aim,Y_aim) And orientation psi_aimAnd satisfies the constraints of the controller boundaries described above.

In the iterative training, the vehicle calculates the vehicle state e with the target garage position and orientation_x、e_yAnd

the Critic network and the Actor network are trained accordingly, the position of the target library position is randomly updated in the training process, the training data set is expanded, and the generalization capability of the network can be improved.

Examples

In this embodiment, the method for controlling the drift storage implemented according to the above method specifically includes:

step one, building a four-wheel three-degree-of-freedom vehicle dynamics model based on drifting and warehousing of depth reinforcement learning and building a tire model under a tire force saturation working condition. Considering front and rear and left and right loadsA four-wheel three-degree-of-freedom vehicle dynamic model for load transfer. The three degrees of freedom are respectively: velocity v at the vehicle's center of mass_mThe size beta of the centroid slip angle and the yaw angular velocity omega.

And step two, performing Critic network and Actor network design and reward function design on the basis of the vehicle dynamics model of the deep reinforcement learning. The input of the Critic network is the vehicle state and action, and the output is the Q value; the input of the Actor network is the vehicle state, and the output is the action. The number of input quantity and output quantity is small, the corresponding relation is simple, a Critic network and an Actor network are built by adopting a BP neural network consisting of full connection layers, and a drift control algorithm flow based on deep reinforcement learning is shown in FIG. 3.

Claims

2. The intelligent electric vehicle drifting storage control method based on deep reinforcement learning of claim 1, wherein in step 1), the vehicle dynamics model is a four-wheel three-degree-of-freedom vehicle dynamics model considering front-back and left-right load transfer, and the three degrees of freedom include speed v at vehicle mass center_mA centroid slip angle β and a yaw rate ω.

3. The intelligent electric vehicle drifting warehousing control method based on deep reinforcement learning of claim 2 is characterized in that in the four-wheel three-degree-of-freedom vehicle dynamics model, an expression of four-wheel vertical force considering longitudinal and lateral acceleration is as follows:

in the formula, h_mIs the height of the center of mass, b_f、b_rA front and rear track width of a_x、a_yLongitudinal and lateral accelerations at the centre of mass, without taking account of the effect of the rotation of the body, F_zFL、F_zFR、F_zRL、F_zRRThe vertical forces of the left front wheel, the right front wheel, the left rear wheel and the right rear wheel are respectively, m is the mass of the electric automobile, g is the gravity acceleration, l is the wheelbase and l is the vertical force_f、l_rDistance of front and rear axes to centre of mass, F_xFL、F_xFR、F_xRL、F_xRRThe longitudinal forces of the left front wheel, the right front wheel, the left rear wheel and the right rear wheel respectively, F_yFL、F_yFR、F_yRL、F_yRRThe lateral forces of the left front wheel, the right front wheel, the left rear wheel and the right rear wheel are respectively, and delta is the corner of the front wheel.

4. The intelligent electric vehicle drifting storage control method based on deep reinforcement learning of claim 3 is characterized in that in the drifting process, considering that a certain wheel is lifted off the ground due to excessive load transfer, the situation that the vertical load of the wheel is reduced to 0 and the load transfer reaches the upper limit occurs, when a steering wheel drifts to the left side, the load is transferred to the right side, and when a left rear wheel is lifted off the ground, the vertical force of the left rear wheel is 0, at this time, the excessively transferred load is redistributed to the left front wheel and the right rear wheel according to the longitudinal lateral acceleration, the wheelbase and the wheelbase, and the following steps are included:

ΔF_trans＝|F_zRL|

F′_zRL＝0

5. The intelligent electric vehicle drifting storage control method based on deep reinforcement learning of claim 4 is characterized in that a four-wheel three-degree-of-freedom vehicle dynamics model considering front-rear and left-right load transfer is subjected to stress analysis, and a vehicle dynamics balance equation is obtained by:

φ＝β+ψ

v_mx＝v_m·cosβ

v_my＝v_m·sinβ

wherein the content of the first and second substances,

is the rate of change of speed at the vehicle's center of mass,

is the global azimuth velocity of the vehicle speed at the centroid,

6. The intelligent electric vehicle drifting-in control method based on deep reinforcement learning of claim 5, characterized in that in step 1), the tire models used for deep reinforcement learning training include a front tire force model and a rear tire force model.

7. The intelligent electric vehicle drifting storage control method based on deep reinforcement learning of claim 6 is characterized in that, for the rear wheel tire force model, in the drifting process, the rear wheel is locked in a braking mode and is purely rubbed on the road surface, the direction of the tire force of the rear wheel is opposite to the direction of the instantaneous speed of the wheel center of the wheel, and the expression of the longitudinal lateral tire force component of the rear wheel obtained by performing stress analysis on the rear wheel is as follows:

for the left rear wheel:

for the right rear wheel:

F_{r_sat}＝μ₁F_z

8. The intelligent electric vehicle drifting warehousing control method based on deep reinforcement learning of claim 7 is characterized in that for a front wheel tire force model, in the drifting process, if the front wheel tire force is not saturated yet, the tire force is fitted by adopting an improved Burckhardt tire model to express the relation between the lateral force and the slip angle, and the following steps are provided:

9. The intelligent electric vehicle drifting-in control method based on deep reinforcement learning according to claim 8, wherein the step 2) specifically comprises the following steps:

the Critic network and the Actor network are both BP neural networks composed of full connection layers, the input of the Critic network is the vehicle state and action, the output is the Q value, the input of the Actor network is the vehicle state, the output is action, the Critic network and the Actor network are the BP neural networks composed of full connection layers, the input of the Actor network is the vehicle state, the output of the Actor network is action, the CriticThe vehicle state is a parameter for representing the vehicle state in the drifting process, and comprises a lower library position coordinate (e) of a relative coordinate system with the vehicle mass center as an origin and the head orientation as the positive direction of the y axis_x、e_y) And the direction of the storage location

22) constructing a reward function r (k) having:

wherein, w_x、w_y、

Are each e_x、e_yAnd

k is time;

10. The intelligent electric vehicle drifting and warehousing control method based on deep reinforcement learning of claim 9 is characterized in that in step 23), before training an Actor network and a Critic network, a boundary of a drifting and warehousing controller is determined, a random value is taken for a target position of a warehouse of vehicle drifting each time according to the boundary, in iterative training, a vehicle calculates a vehicle state according to a randomly selected target position and orientation, the Critic network and the Actor network are trained accordingly, and a training data set is expanded and the capacity is improved by randomly updating the target position in the training process.