CN115826402A

CN115826402A - Active suspension control strategy generation method based on deep reinforcement learning algorithm

Info

Publication number: CN115826402A
Application number: CN202211445241.8A
Authority: CN
Inventors: 张步云; 赵妍; 王勇; 张云顺; 刘志强; 徐旗钊; 胡正林
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-03-21

Abstract

The invention discloses an active suspension control strategy generation method based on a deep reinforcement learning algorithm, which relates to the technical field of intelligent control and artificial intelligence and comprises the following steps: the method comprises the following steps: establishing a control problem model of the active suspension semi-vehicle model based on the active suspension semi-vehicle model; step two: a strategy neural network is built to represent the control strategy of the active suspension; step three: updating the strategy neural network through a reward function; step four: and (5) performing iterative training on the strategy neural network to generate a converged active suspension control strategy. Based on the SAC reinforcement learning algorithm, the optimal active suspension control strategy is sought through training the constructed suspension control strategy network, and after the generated control strategy is verified, the dynamic self-adaptive vibration reduction control of the active suspension can be realized through the control strategy.

Description

Active suspension control strategy generation method based on deep reinforcement learning algorithm

Technical Field

The invention relates to the technical field of intelligent control and artificial intelligence, in particular to an active suspension control strategy generation method based on a deep reinforcement learning algorithm.

Background

The vehicle suspension system plays an important role in ensuring the operation stability, driving safety and riding comfort of a vehicle, however, the dynamic characteristics of the traditional passive suspension are not easy to change due to the fixed parameters of the system, the expected performance of the suspension is greatly limited, and the defects of the passive suspension system can be overcome by adopting the active suspension system with adjustable dynamic parameters. In practical applications, while semi-active suspensions may to some extent break through the performance limitations of passive suspensions, their control performance is not particularly desirable due to the inconvenient adjustment of their shock absorbers. The active suspension changes the structure of a suspension system, greatly improves the control effect of the suspension system, greatly improves various performances of an automobile system, and currently, most of various commercial vehicles adopt the active suspension system to improve the riding comfort and stability.

Traditional suspension control methods such as Skyhook (Skyhook) control, model Predictive Control (MPC) and the like need to depend on a specific model of a suspension system, however, an active suspension system has a highly nonlinear characteristic and is not easy to model, and if the nonlinear factors are not considered, the performance involved in control is seriously reduced. In recent years, with the continuous development of a deep reinforcement learning algorithm, control methods such as a deep Q neural network (DQN), a deep deterministic strategy gradient (DDPG), a near-end strategy optimization (PPO), a SAC (SAC) and the like are proposed in succession, and particularly, the SAC introduces a maximum entropy model, so that the environment can be explored while a higher reward value is obtained, and a better action can be learned more quickly to accelerate the convergence of the algorithm. And the control method based on the neural network and the reinforcement learning has great advantages in processing the nonlinear problem with sufficient prior information.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an active suspension control strategy generation method based on a deep reinforcement learning algorithm.

The present invention achieves the above-described object by the following technical means.

An active suspension control strategy generation method based on a deep reinforcement learning algorithm comprises the following steps:

the method comprises the following steps: establishing a control problem model of the active suspension semi-vehicle model based on the active suspension semi-vehicle model;

step two: a strategy neural network is built to represent the control strategy of the active suspension;

step three: constructing a reward function in an SAC reinforcement learning algorithm;

step four: and (4) performing iterative training on the strategy neural network in the step two based on the reward function in the step three to generate a converged active suspension control strategy.

In the scheme, in the first step, data of the vehicle body on the random road surface are obtained, the data are visualized through Matlab/Python, abnormal data are removed, the obtained data are analyzed, and parameters which have large influence on active suspension control are obtained through screening and are used as state observation quantities.

In the scheme, the state observed quantity of the active suspension system obtained in the step one is used as the input of a strategy neural network, the active control force action of the active suspension is output, and the active control forces obtained in different states form an action observation sequence of the active suspension system; and respectively taking the state observed quantity of the active suspension and the action observed sequence of the active control force as the input and the output of the active suspension controller.

In the scheme, in the step one, the state observation quantity of the suspension system comprises vehicle body vertical displacement, vehicle body vertical acceleration, vehicle pitch angle acceleration and road surface unevenness q at the front wheels _f And road surface unevenness q at the rear wheel _r The observed quantity of the state at time t is expressed as

Wherein z is _c Representing vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, q _f Representing the road surface irregularity at the front wheels, q _r Indicating the unevenness of the road surface at the rear wheels.

In the scheme, in the second step, the strategy neural network is a controller of the active suspension, receives the state observed quantity of the active suspension, and selects the active control force F matched with the state observed quantity _alf And F _alr The method is characterized in that the method acts on the front suspension and the rear suspension respectively, a new response is generated after the suspension system receives the active control force, and then the state observed quantity of the suspension system is updated, and the processes are circulated so as to realize the vibration damping control of the active suspension.

In the above embodiment, in the second step, the motion observation sequence at time t is represented by a _t ＝{F _alf ，F _alr Obtaining a control problem model of the active suspension semi-vehicle model in the step one:

in the scheme, updating of the parameters of the strategy neural network is realized through an SAC reinforcement learning algorithm in the fourth step, an action observation sequence made by the controller under the random state observation quantity is obtained through training, and a reward function is constructed to judge the quality of the action under the random state observation quantity.

In the scheme, the SAC reinforcement learning algorithm is a model-free algorithm based on an Actor-Critic framework and used for a continuous action space, the strategy network Actor is used for guiding the active suspension to select the magnitude of the active control force, and the value network Critic is used for judging the advantages and disadvantages of the currently selected active control force strategy, so that the update of the active suspension control strategy is realized.

In the above scheme, in step S3, a reward function is constructed to evaluate the quality of the action under the random state observed quantity, where the reward function is:

wherein, F _alf Active control force for front suspension controller, F _alr For active control of the rear suspension controller, q ₁ 、q ₂ Weight coefficient, q, representing the front and rear suspension main control forces, respectively ₃ And q is ₄ Weight coefficients q of the vertical acceleration of the vehicle body and the pitch angle acceleration of the vehicle, respectively ₅ And q is ₆ Respectively, the weighting coefficients of the road surface unevenness at the front and rear wheels.

In the scheme, in the fourth step, the effectiveness of the active suspension control strategy obtained after iterative training convergence is verified by taking the vibration data of the active suspension of the real vehicle as a data source, and the generalization and the adaptivity of the active suspension control strategy are improved by finely adjusting the active suspension control strategy.

The invention has the beneficial effects that:

(1) The invention applies the SAC reinforcement learning algorithm to the control strategy generation of the active suspension, carries out off-line training on the constructed strategy network, judges the quality of the strategy selected by the active suspension through a reward function, and generates a safe active suspension control strategy after the training is converged.

(2) Compared with other reinforcement learning algorithms: the deep Q neural network (DQN) has the problems of low training speed and even difficulty in convergence in the training generated by the active suspension semi-vehicle model strategy; the control strategy for generating the active suspension based on the SAC reinforcement learning algorithm can be trained through a random strategy, the environment is kept to be explored while a higher reward value is obtained, better learning actions can be carried out more quickly to accelerate convergence of the algorithm, and a better active suspension control strategy is generated.

Drawings

FIG. 1 is a diagram of a SAC-based active suspension control strategy generation framework;

FIG. 2 is a schematic diagram of a SAC-based active suspension reinforcement learning algorithm principle;

FIG. 3 is SAC-based active suspension reinforcement learning algorithm pseudo-code;

FIG. 4 is a hardware-in-the-loop simulation platform framework diagram.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "axial," "radial," "vertical," "horizontal," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the present invention and for simplicity in description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the first step, data of a vehicle body on a random road surface are obtained, the data are visualized through Matlab/Python, abnormal data are removed, the obtained data are analyzed, and parameters which have large influence on active suspension control are obtained through screening and are used as state observation quantity. And obtaining the state observed quantity of the active suspension system.

Taking the state observation quantity of the active suspension system obtained in the step one as the input of a strategy neural network, outputting the action of the active control force of the active suspension, and forming an action observation sequence of the active suspension system by the active control force obtained in different states; and respectively taking the state observed quantity of the active suspension and the active control force action observed sequence as the input and the output of the active suspension controller.

In the first step, the state observation quantity of the suspension system comprises vehicle body vertical displacement, vehicle body vertical acceleration, vehicle pitch angle acceleration and road surface unevenness q at front wheels _f And road surface unevenness q at the rear wheel _r The observed quantity of the state at time t is expressed as

Wherein z is _c Representing vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, q _f Representing the unevenness of the road surface at the front wheels, q _r Indicating the unevenness of the road surface at the rear wheels.

In the second step, the strategy neural network is a controller of the active suspension, receives the state observed quantity of the active suspension, and selects the active control force F matched with the state observed quantity _alf And F _alr The method is characterized in that the method acts on the front suspension and the rear suspension respectively, a new response is generated after the suspension system receives the active control force, and then the state observed quantity of the suspension system is updated, and the processes are circulated so as to realize the vibration damping control of the active suspension.

In step two, the motion observation sequence at time t is represented as a _t ＝{F _alf ，F _alr Obtaining a control problem model of the active suspension semi-vehicle model in the step one:

and in the fourth step, updating parameters of the strategy neural network is realized through an SAC reinforcement learning algorithm, an action observation sequence made by the controller under the random state observed quantity is obtained through training, and a reward function is constructed to judge the quality of the action under the random state observed quantity.

The SAC reinforcement learning algorithm is a model-free algorithm based on an Actor-Critic framework and used for a continuous action space, the strategy network Actor is used for guiding the active suspension to select the magnitude of the active control force, and the value network Critic is used for judging the advantages and disadvantages of the currently selected active control force strategy, so that the control strategy of the active suspension is updated.

In step S3, a reward function is constructed to judge the quality of the action under the random state observation quantity, wherein the reward function is as follows:

And in the fourth step, the vibration data of the active suspension of the real vehicle is used as a data source to verify the effectiveness of the active suspension control strategy obtained after iterative training convergence, and the active suspension control strategy is finely adjusted to improve the generalization and the adaptivity of the active suspension control strategy.

With reference to the attached figure 1, the first stage is the construction of an active suspension semi-vehicle model and a semi-vehicle active suspension system control problem model, and the screening and preprocessing of active suspension vibration data;

the active suspension semi-vehicle model is as follows:

according to Newton's second law, the dynamic differential equation of the half-vehicle four-degree-of-freedom active suspension model is obtained as follows:

wherein the content of the first and second substances,m _c representing sprung mass, I _c Representing the moment of inertia of the sprung mass about the y-axis, z _c Representing the vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, and a and b representing the longitudinal distance from the front and rear shafts to the mass center of the vehicle respectively; k is a radical of _lf And k _lr Respectively representing front and rear suspension stiffness, c _lf And c _lr Respectively representing front and rear suspension damping, F _alf And F _alr Respectively representing the active control forces of the front and rear suspensions; m is _1lf And m _1lr Respectively representing front and rear wheel masses, k _tlf And k _tlr Respectively representing front and rear tire stiffness; z is a radical of _f 、z _uf 、q _f Respectively showing the sprung displacement of the front suspension, the unsprung displacement of the front suspension and the road surface unevenness at the front wheel; z is a radical of _r 、z _ur 、q _r Respectively showing the sprung displacement of the rear suspension, the unsprung displacement of the rear suspension and the road surface unevenness at the rear wheel.

In the above formula, (1) - (2) are used to describe the motion characteristics of the vehicle body, and (3) - (4) are used to describe the motion characteristics of the wheel.

z _f ＝z _c -asinθ≈z _c -aθ (5)

z _r ＝z _c +bsinθ≈z _c +aθ (6)

The above equation is converted into a state space expression form, namely:

Y＝CX+DU (8)

and defining the state variable as

Control input quantity is U = [ F = _alf F _alr q _f q _r ] ^T An output of

The state space equation coefficient matrix obtained by calculation is as follows:

and in the second stage, a neural strategy network is built to represent a control strategy of the suspension, state observation quantities such as vertical acceleration of a vehicle body, pitch angle acceleration of the vehicle, road surface unevenness at front and rear wheels and the like are used as input of the network, and active control force action of the suspension is output. The vertical acceleration of the related vehicle body is mainly used for evaluating the riding comfort, the pitch acceleration of the vehicle is mainly used for evaluating the running smoothness of the vehicle, and the unevenness of the road surface at the front wheel and the rear wheel is mainly used for representing the random excitation of the road surface.

The third phase is the update of the neural strategy network and the construction of the reward function. The method is used for judging the quality of actions under random state observation quantity by constructing a reward function, wherein the reward function is as follows:

And the fourth stage is iterative training of the network, a converged active suspension control strategy is generated, and the feasibility and the effectiveness of the generated control strategy are verified through an active suspension hardware-in-loop simulation experiment platform. The strategy network is updated through a SAC reinforcement learning algorithm, the SAC reinforcement learning algorithm is a model-free algorithm based on an Actor-Critic framework and can be used for a continuous action space, the strategy network Actor guides an active suspension to select the magnitude of active control force, the merit of the currently selected active control force strategy is judged through the value network Critic, and then the active suspension strategy is updated. The Actor-criticic framework selects 1 Actor network and 4Q-criticic networks.

According to the method, iterative training is carried out by taking vibration data of the active suspension of the real vehicle as a data source, the converged strategy neural network has strong generalization capability, and after the generated control strategy is verified by a hardware-in-loop simulation experiment platform, dynamic self-adaptive vibration reduction control of the active suspension on a complex and variable road surface can be realized.

With reference to fig. 2 and fig. 3, the general implementation steps of the active suspension control strategy generation method based on the deep reinforcement learning algorithm in the present embodiment are as follows:

step 1: initializing network parameters, specifically including the following: initializing Actor network (policy network) parameter phi and Critic network (evaluation network) parameter theta ₁ And theta ₂ Initializing target network weights θ ₁ ′←θ ₁ ，θ ₂ ′←θ ₂ ；

And 2, step: initializing a Replay-Buffer D, and playing back a cache pool D according to experience, wherein the Buffer is mainly used for storing experience data of the active suspension;

and step 3: training a strategy network and an evaluation network by carrying out Mini-batch sampling on Replay-Buffer D, setting M rounds, wherein each round comprises T steps, and firstly obtaining an initial observation sequence s of the suspension system before carrying out each step ₁ The specific implementation steps for each step are as follows:

step 3.1: headThe active control force of the active suspension, namely action a, is selected according to the current strategy _t ～π _φ (a _t |s _t ) The action specifically includes the main power F of the front active suspension _alf And main power F of rear active suspension _alr ；

Step 3.2: main power a _t After the selection is finished, the main power acts on the environment of the active suspension system, the reward obtained after the environment executes the main power can be calculated through the set reward function, and at the moment, the environment state is transferred to s _t+1 ；

Step 3.3: will(s) _t ，a _t ，r _t ，s _t+1 ) Put into Replay-buffer D, the r is _t Represents the reward at time t;

step 3.4: sampling N tuples from D {(s) _i ，a _i ，r _i ，s _i+1 )} _{t＝1，…，N} And updating the Actor network and the Critic network by a gradient descent method, wherein the specific flow is as follows:

step 3.4.1: the SAC algorithm encourages more exploration and obtains more stable training performance by pursuing maximum entropy, and the strategy function pi ^* The expression of (a) is as follows:

where ρ is _π Representing the distribution of the strategy pi; r(s) _t ，a _t ) Represents a state s _t And a _t Instant awards are made;

representing the entropy of the current strategy pi; α represents the relative importance of the entropy term to the reward, referred to as the temperature parameter, and can be adjusted automatically by minimizing J (α) throughout the training process, i.e.;

wherein, the first and the second end of the pipe are connected with each other,

generally A represents the dimension of the action.

Step 3.4.2: training parameter theta of soft Q-Network by minimizing soft Bellman residual omega (t) _i Namely:

where p is the distribution given the current state and action, given the next state;

a function of state values; theta _i ' stands for goal, soft Q-Network and theta _i ' update by: theta' _i ←τ·θ _i +(1-τ)·θ′ _i I =1,2; τ represents a target smooth primer (target smooth factor), and is generally 0.001;

step 3.4.3: by minimizing J _π (φ) to implement the updating of the policy network, i.e.:

wherein, y _φ (ω _t ；s _t ) For the action of the strategy, ω _t Representing a noise variable;

the strategy evaluation and improvement of each gradient step length are completed through steps 3.4.1, 3.4.2 and 3.4.3, namely the updating of the Actor network and the Critic network is completed.

Through the steps, after M rounds of training, a control strategy of the active suspension can be obtained, namely the magnitude of the active force applied by the active suspension under different state observation quantities is used for completing vibration damping control.

With reference to fig. 4, in order to verify whether the generated control strategy can be applied in a real environment, the feasibility and the effectiveness of the generated control strategy are verified by the active suspension hardware on a ring simulation experiment platform. A hardware-in-loop simulation test platform based on a dSPACE/MicroAutoBox active suspension is built, the platform comprises an upper layer and a lower layer, the upper layer is used for generating ideal active control force, and the lower layer is used for obtaining the active control force in an actual environment and feeding back the active control force to the upper layer. The upper layer mainly adopts the SAC deep reinforcement learning algorithm-based active suspension control system provided by the invention, and realizes the output of a control strategy by combining a MicroAutoBox and an upper computer; the microautoBox comprises an active suspension semi-vehicle model, a semi-vehicle active suspension system control problem model, a random road surface excitation model and an active suspension control strategy generated by a deep reinforcement learning algorithm; the upper computer can acquire the states of the vertical acceleration of a vehicle body, the pitch angle acceleration of the vehicle, the road surface unevenness at the wheel position and the like of a semi-vehicle active suspension system in the MicroAutoBox in real time through ControlDesk software; the lower layer mainly simulates front and rear active suspensions respectively by two electromagnetic linear actuators to construct an active suspension semi-vehicle test platform, expected active control force from the upper layer is converted into input of the electromagnetic linear actuators through a DSP controller and a power amplifier, and the electromagnetic linear actuators output actual active control force to a semi-vehicle active suspension system in the upper layer of the MicroAutoBox. The feasibility and the effectiveness of the generated control strategy are verified by comparing the theoretical active control force of the suspension generated based on the deep reinforcement learning algorithm with the actual active control force executed by the electromagnetic linear actuator.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims

1. An active suspension control strategy generation method based on a deep reinforcement learning algorithm is characterized by comprising the following steps:

2. The active suspension control strategy generation method based on the deep reinforcement learning algorithm is characterized in that in the first step, data of a vehicle body on a random road surface are obtained, the data are visualized through Matlab/Python, abnormal data are removed, the obtained data are analyzed, and parameters which have large influence on active suspension control are obtained through screening and serve as state observation quantities.

3. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 2, characterized in that the state observation quantity of the active suspension system obtained in the first step is used as an input of a strategy neural network to output the active control force action of the active suspension, and the active control forces obtained in different states form an action observation sequence of the active suspension system; and respectively taking the state observed quantity of the active suspension and the active control force action observed sequence as the input and the output of the active suspension controller.

4. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 2, wherein in the first step, the state observation quantity of the suspension system comprises vehicle body vertical displacement, vehicle body vertical acceleration, vehicle pitch angle acceleration, and road surface unevenness q at front wheels _f And road surface unevenness q at the rear wheel _r The observed quantity of the state at time t is expressed as

5. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 1, wherein in the second step, the strategy neural network is a controller of the active suspension, the strategy neural network receives the state observed quantity of the active suspension, and selects the active control force F matched with the state observed quantity _alf And F _alf The method is characterized in that the method acts on the front suspension and the rear suspension respectively, a new response is generated after the suspension system receives the active control force, and then the state observed quantity of the suspension system is updated, and the processes are circulated so as to realize the vibration damping control of the active suspension.

6. Active suspension control based on deep reinforcement learning algorithm according to claim 4The strategy generation method is characterized in that in the second step, the action observation sequence at the time t is represented as a _t ＝{F _alf ,F _alr Obtaining a control problem model of the active suspension semi-vehicle model in the step one:

7. the active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 1, wherein in the fourth step, updating of the parameters of the strategy neural network is realized through the SAC reinforcement learning algorithm, an action observation sequence made by the controller under the random state observation is obtained through training, and a reward function is constructed to judge the quality of the action under the random state observation.

8. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 7, wherein the SAC reinforcement learning algorithm is an Actor-Critic framework-based modeless algorithm for a continuous motion space, the strategy network Actor is used for guiding the active suspension to select the magnitude of the active control force, and the value network Critic is used for judging the advantages and disadvantages of the currently selected active control force strategy, so as to update the active suspension control strategy.

9. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 7, wherein in step S3, a reward function is constructed to evaluate the quality of the action under the random state observation quantity, and the reward function is:

wherein, F _alf Active control force for front suspension controller, F _alr Is a rear suspensionActive control force of the gantry controller, q ₁ 、q ₂ Weight coefficient, q, representing the front and rear suspension main control forces, respectively ₃ And q is ₄ Weight coefficients q of the vertical acceleration of the vehicle body and the pitch angle acceleration of the vehicle, respectively ₅ And q is ₆ Respectively, the weighting coefficients of the road surface unevenness at the front and rear wheels.

10. The active suspension control strategy generation method based on the deep reinforcement learning algorithm according to claim 1, characterized in that in the fourth step, the vibration data of the active suspension of the real vehicle is used as a data source to verify the effectiveness of the active suspension control strategy obtained after the iterative training convergence, and the active suspension control strategy is finely adjusted to improve the generalization and the adaptivity of the active suspension control strategy.