CN115826402A - Active suspension control strategy generation method based on deep reinforcement learning algorithm - Google Patents

Active suspension control strategy generation method based on deep reinforcement learning algorithm Download PDF

Info

Publication number
CN115826402A
CN115826402A CN202211445241.8A CN202211445241A CN115826402A CN 115826402 A CN115826402 A CN 115826402A CN 202211445241 A CN202211445241 A CN 202211445241A CN 115826402 A CN115826402 A CN 115826402A
Authority
CN
China
Prior art keywords
active suspension
active
strategy
control
reinforcement learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211445241.8A
Other languages
Chinese (zh)
Inventor
张步云
赵妍
王勇
张云顺
刘志强
徐旗钊
胡正林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202211445241.8A priority Critical patent/CN115826402A/en
Publication of CN115826402A publication Critical patent/CN115826402A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Vehicle Body Suspensions (AREA)

Abstract

The invention discloses an active suspension control strategy generation method based on a deep reinforcement learning algorithm, which relates to the technical field of intelligent control and artificial intelligence and comprises the following steps: the method comprises the following steps: establishing a control problem model of the active suspension semi-vehicle model based on the active suspension semi-vehicle model; step two: a strategy neural network is built to represent the control strategy of the active suspension; step three: updating the strategy neural network through a reward function; step four: and (5) performing iterative training on the strategy neural network to generate a converged active suspension control strategy. Based on the SAC reinforcement learning algorithm, the optimal active suspension control strategy is sought through training the constructed suspension control strategy network, and after the generated control strategy is verified, the dynamic self-adaptive vibration reduction control of the active suspension can be realized through the control strategy.

Description

Active suspension control strategy generation method based on deep reinforcement learning algorithm
Technical Field
The invention relates to the technical field of intelligent control and artificial intelligence, in particular to an active suspension control strategy generation method based on a deep reinforcement learning algorithm.
Background
The vehicle suspension system plays an important role in ensuring the operation stability, driving safety and riding comfort of a vehicle, however, the dynamic characteristics of the traditional passive suspension are not easy to change due to the fixed parameters of the system, the expected performance of the suspension is greatly limited, and the defects of the passive suspension system can be overcome by adopting the active suspension system with adjustable dynamic parameters. In practical applications, while semi-active suspensions may to some extent break through the performance limitations of passive suspensions, their control performance is not particularly desirable due to the inconvenient adjustment of their shock absorbers. The active suspension changes the structure of a suspension system, greatly improves the control effect of the suspension system, greatly improves various performances of an automobile system, and currently, most of various commercial vehicles adopt the active suspension system to improve the riding comfort and stability.
Traditional suspension control methods such as Skyhook (Skyhook) control, model Predictive Control (MPC) and the like need to depend on a specific model of a suspension system, however, an active suspension system has a highly nonlinear characteristic and is not easy to model, and if the nonlinear factors are not considered, the performance involved in control is seriously reduced. In recent years, with the continuous development of a deep reinforcement learning algorithm, control methods such as a deep Q neural network (DQN), a deep deterministic strategy gradient (DDPG), a near-end strategy optimization (PPO), a SAC (SAC) and the like are proposed in succession, and particularly, the SAC introduces a maximum entropy model, so that the environment can be explored while a higher reward value is obtained, and a better action can be learned more quickly to accelerate the convergence of the algorithm. And the control method based on the neural network and the reinforcement learning has great advantages in processing the nonlinear problem with sufficient prior information.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an active suspension control strategy generation method based on a deep reinforcement learning algorithm.
The present invention achieves the above-described object by the following technical means.
An active suspension control strategy generation method based on a deep reinforcement learning algorithm comprises the following steps:
the method comprises the following steps: establishing a control problem model of the active suspension semi-vehicle model based on the active suspension semi-vehicle model;
step two: a strategy neural network is built to represent the control strategy of the active suspension;
step three: constructing a reward function in an SAC reinforcement learning algorithm;
step four: and (4) performing iterative training on the strategy neural network in the step two based on the reward function in the step three to generate a converged active suspension control strategy.
In the scheme, in the first step, data of the vehicle body on the random road surface are obtained, the data are visualized through Matlab/Python, abnormal data are removed, the obtained data are analyzed, and parameters which have large influence on active suspension control are obtained through screening and are used as state observation quantities.
In the scheme, the state observed quantity of the active suspension system obtained in the step one is used as the input of a strategy neural network, the active control force action of the active suspension is output, and the active control forces obtained in different states form an action observation sequence of the active suspension system; and respectively taking the state observed quantity of the active suspension and the action observed sequence of the active control force as the input and the output of the active suspension controller.
In the scheme, in the step one, the state observation quantity of the suspension system comprises vehicle body vertical displacement, vehicle body vertical acceleration, vehicle pitch angle acceleration and road surface unevenness q at the front wheels f And road surface unevenness q at the rear wheel r The observed quantity of the state at time t is expressed as
Figure BDA0003950006110000021
Wherein z is c Representing vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, q f Representing the road surface irregularity at the front wheels, q r Indicating the unevenness of the road surface at the rear wheels.
In the scheme, in the second step, the strategy neural network is a controller of the active suspension, receives the state observed quantity of the active suspension, and selects the active control force F matched with the state observed quantity alf And F alr The method is characterized in that the method acts on the front suspension and the rear suspension respectively, a new response is generated after the suspension system receives the active control force, and then the state observed quantity of the suspension system is updated, and the processes are circulated so as to realize the vibration damping control of the active suspension.
In the above embodiment, in the second step, the motion observation sequence at time t is represented by a t ={F alf ,F alr Obtaining a control problem model of the active suspension semi-vehicle model in the step one:
Figure BDA0003950006110000022
in the scheme, updating of the parameters of the strategy neural network is realized through an SAC reinforcement learning algorithm in the fourth step, an action observation sequence made by the controller under the random state observation quantity is obtained through training, and a reward function is constructed to judge the quality of the action under the random state observation quantity.
In the scheme, the SAC reinforcement learning algorithm is a model-free algorithm based on an Actor-Critic framework and used for a continuous action space, the strategy network Actor is used for guiding the active suspension to select the magnitude of the active control force, and the value network Critic is used for judging the advantages and disadvantages of the currently selected active control force strategy, so that the update of the active suspension control strategy is realized.
In the above scheme, in step S3, a reward function is constructed to evaluate the quality of the action under the random state observed quantity, where the reward function is:
Figure BDA0003950006110000031
wherein, F alf Active control force for front suspension controller, F alr For active control of the rear suspension controller, q 1 、q 2 Weight coefficient, q, representing the front and rear suspension main control forces, respectively 3 And q is 4 Weight coefficients q of the vertical acceleration of the vehicle body and the pitch angle acceleration of the vehicle, respectively 5 And q is 6 Respectively, the weighting coefficients of the road surface unevenness at the front and rear wheels.
In the scheme, in the fourth step, the effectiveness of the active suspension control strategy obtained after iterative training convergence is verified by taking the vibration data of the active suspension of the real vehicle as a data source, and the generalization and the adaptivity of the active suspension control strategy are improved by finely adjusting the active suspension control strategy.
The invention has the beneficial effects that:
(1) The invention applies the SAC reinforcement learning algorithm to the control strategy generation of the active suspension, carries out off-line training on the constructed strategy network, judges the quality of the strategy selected by the active suspension through a reward function, and generates a safe active suspension control strategy after the training is converged.
(2) Compared with other reinforcement learning algorithms: the deep Q neural network (DQN) has the problems of low training speed and even difficulty in convergence in the training generated by the active suspension semi-vehicle model strategy; the control strategy for generating the active suspension based on the SAC reinforcement learning algorithm can be trained through a random strategy, the environment is kept to be explored while a higher reward value is obtained, better learning actions can be carried out more quickly to accelerate convergence of the algorithm, and a better active suspension control strategy is generated.
Drawings
FIG. 1 is a diagram of a SAC-based active suspension control strategy generation framework;
FIG. 2 is a schematic diagram of a SAC-based active suspension reinforcement learning algorithm principle;
FIG. 3 is SAC-based active suspension reinforcement learning algorithm pseudo-code;
FIG. 4 is a hardware-in-the-loop simulation platform framework diagram.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "axial," "radial," "vertical," "horizontal," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the present invention and for simplicity in description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting. Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
An active suspension control strategy generation method based on a deep reinforcement learning algorithm comprises the following steps:
the method comprises the following steps: establishing a control problem model of the active suspension semi-vehicle model based on the active suspension semi-vehicle model;
step two: a strategy neural network is built to represent the control strategy of the active suspension;
step three: constructing a reward function in an SAC reinforcement learning algorithm;
step four: and (4) performing iterative training on the strategy neural network in the step two based on the reward function in the step three to generate a converged active suspension control strategy.
In the first step, data of a vehicle body on a random road surface are obtained, the data are visualized through Matlab/Python, abnormal data are removed, the obtained data are analyzed, and parameters which have large influence on active suspension control are obtained through screening and are used as state observation quantity. And obtaining the state observed quantity of the active suspension system.
Taking the state observation quantity of the active suspension system obtained in the step one as the input of a strategy neural network, outputting the action of the active control force of the active suspension, and forming an action observation sequence of the active suspension system by the active control force obtained in different states; and respectively taking the state observed quantity of the active suspension and the active control force action observed sequence as the input and the output of the active suspension controller.
In the first step, the state observation quantity of the suspension system comprises vehicle body vertical displacement, vehicle body vertical acceleration, vehicle pitch angle acceleration and road surface unevenness q at front wheels f And road surface unevenness q at the rear wheel r The observed quantity of the state at time t is expressed as
Figure BDA0003950006110000041
Wherein z is c Representing vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, q f Representing the unevenness of the road surface at the front wheels, q r Indicating the unevenness of the road surface at the rear wheels.
In the second step, the strategy neural network is a controller of the active suspension, receives the state observed quantity of the active suspension, and selects the active control force F matched with the state observed quantity alf And F alr The method is characterized in that the method acts on the front suspension and the rear suspension respectively, a new response is generated after the suspension system receives the active control force, and then the state observed quantity of the suspension system is updated, and the processes are circulated so as to realize the vibration damping control of the active suspension.
In step two, the motion observation sequence at time t is represented as a t ={F alf ,F alr Obtaining a control problem model of the active suspension semi-vehicle model in the step one:
Figure BDA0003950006110000051
and in the fourth step, updating parameters of the strategy neural network is realized through an SAC reinforcement learning algorithm, an action observation sequence made by the controller under the random state observed quantity is obtained through training, and a reward function is constructed to judge the quality of the action under the random state observed quantity.
The SAC reinforcement learning algorithm is a model-free algorithm based on an Actor-Critic framework and used for a continuous action space, the strategy network Actor is used for guiding the active suspension to select the magnitude of the active control force, and the value network Critic is used for judging the advantages and disadvantages of the currently selected active control force strategy, so that the control strategy of the active suspension is updated.
In step S3, a reward function is constructed to judge the quality of the action under the random state observation quantity, wherein the reward function is as follows:
Figure BDA0003950006110000052
wherein, F alf Active control force for front suspension controller, F alr For active control of the rear suspension controller, q 1 、q 2 Weight coefficient, q, representing the front and rear suspension main control forces, respectively 3 And q is 4 Weight coefficients q of the vertical acceleration of the vehicle body and the pitch angle acceleration of the vehicle, respectively 5 And q is 6 Respectively, the weighting coefficients of the road surface unevenness at the front and rear wheels.
And in the fourth step, the vibration data of the active suspension of the real vehicle is used as a data source to verify the effectiveness of the active suspension control strategy obtained after iterative training convergence, and the active suspension control strategy is finely adjusted to improve the generalization and the adaptivity of the active suspension control strategy.
With reference to the attached figure 1, the first stage is the construction of an active suspension semi-vehicle model and a semi-vehicle active suspension system control problem model, and the screening and preprocessing of active suspension vibration data;
the active suspension semi-vehicle model is as follows:
according to Newton's second law, the dynamic differential equation of the half-vehicle four-degree-of-freedom active suspension model is obtained as follows:
Figure BDA0003950006110000053
Figure BDA0003950006110000054
Figure BDA0003950006110000055
Figure BDA0003950006110000056
wherein the content of the first and second substances,m c representing sprung mass, I c Representing the moment of inertia of the sprung mass about the y-axis, z c Representing the vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, and a and b representing the longitudinal distance from the front and rear shafts to the mass center of the vehicle respectively; k is a radical of lf And k lr Respectively representing front and rear suspension stiffness, c lf And c lr Respectively representing front and rear suspension damping, F alf And F alr Respectively representing the active control forces of the front and rear suspensions; m is 1lf And m 1lr Respectively representing front and rear wheel masses, k tlf And k tlr Respectively representing front and rear tire stiffness; z is a radical of f 、z uf 、q f Respectively showing the sprung displacement of the front suspension, the unsprung displacement of the front suspension and the road surface unevenness at the front wheel; z is a radical of r 、z ur 、q r Respectively showing the sprung displacement of the rear suspension, the unsprung displacement of the rear suspension and the road surface unevenness at the rear wheel.
In the above formula, (1) - (2) are used to describe the motion characteristics of the vehicle body, and (3) - (4) are used to describe the motion characteristics of the wheel.
z f =z c -asinθ≈z c -aθ (5)
z r =z c +bsinθ≈z c +aθ (6)
The above equation is converted into a state space expression form, namely:
Figure BDA0003950006110000061
Y=CX+DU (8)
and defining the state variable as
Figure BDA0003950006110000062
Control input quantity is U = [ F = alf F alr q f q r ] T An output of
Figure BDA0003950006110000063
The state space equation coefficient matrix obtained by calculation is as follows:
Figure BDA0003950006110000064
Figure BDA0003950006110000065
Figure BDA0003950006110000066
Figure BDA0003950006110000071
and in the second stage, a neural strategy network is built to represent a control strategy of the suspension, state observation quantities such as vertical acceleration of a vehicle body, pitch angle acceleration of the vehicle, road surface unevenness at front and rear wheels and the like are used as input of the network, and active control force action of the suspension is output. The vertical acceleration of the related vehicle body is mainly used for evaluating the riding comfort, the pitch acceleration of the vehicle is mainly used for evaluating the running smoothness of the vehicle, and the unevenness of the road surface at the front wheel and the rear wheel is mainly used for representing the random excitation of the road surface.
The third phase is the update of the neural strategy network and the construction of the reward function. The method is used for judging the quality of actions under random state observation quantity by constructing a reward function, wherein the reward function is as follows:
Figure BDA0003950006110000072
wherein, F alf Active control force for front suspension controller, F alr For active control of the rear suspension controller, q 1 、q 2 Weight coefficient, q, representing the front and rear suspension main control forces, respectively 3 And q is 4 Weight coefficients q of the vertical acceleration of the vehicle body and the pitch angle acceleration of the vehicle, respectively 5 And q is 6 Respectively, the weighting coefficients of the road surface unevenness at the front and rear wheels.
And the fourth stage is iterative training of the network, a converged active suspension control strategy is generated, and the feasibility and the effectiveness of the generated control strategy are verified through an active suspension hardware-in-loop simulation experiment platform. The strategy network is updated through a SAC reinforcement learning algorithm, the SAC reinforcement learning algorithm is a model-free algorithm based on an Actor-Critic framework and can be used for a continuous action space, the strategy network Actor guides an active suspension to select the magnitude of active control force, the merit of the currently selected active control force strategy is judged through the value network Critic, and then the active suspension strategy is updated. The Actor-criticic framework selects 1 Actor network and 4Q-criticic networks.
According to the method, iterative training is carried out by taking vibration data of the active suspension of the real vehicle as a data source, the converged strategy neural network has strong generalization capability, and after the generated control strategy is verified by a hardware-in-loop simulation experiment platform, dynamic self-adaptive vibration reduction control of the active suspension on a complex and variable road surface can be realized.
With reference to fig. 2 and fig. 3, the general implementation steps of the active suspension control strategy generation method based on the deep reinforcement learning algorithm in the present embodiment are as follows:
step 1: initializing network parameters, specifically including the following: initializing Actor network (policy network) parameter phi and Critic network (evaluation network) parameter theta 1 And theta 2 Initializing target network weights θ 1 ′←θ 1 ,θ 2 ′←θ 2
And 2, step: initializing a Replay-Buffer D, and playing back a cache pool D according to experience, wherein the Buffer is mainly used for storing experience data of the active suspension;
and step 3: training a strategy network and an evaluation network by carrying out Mini-batch sampling on Replay-Buffer D, setting M rounds, wherein each round comprises T steps, and firstly obtaining an initial observation sequence s of the suspension system before carrying out each step 1 The specific implementation steps for each step are as follows:
step 3.1: headThe active control force of the active suspension, namely action a, is selected according to the current strategy t ~π φ (a t |s t ) The action specifically includes the main power F of the front active suspension alf And main power F of rear active suspension alr
Step 3.2: main power a t After the selection is finished, the main power acts on the environment of the active suspension system, the reward obtained after the environment executes the main power can be calculated through the set reward function, and at the moment, the environment state is transferred to s t+1
Step 3.3: will(s) t ,a t ,r t ,s t+1 ) Put into Replay-buffer D, the r is t Represents the reward at time t;
step 3.4: sampling N tuples from D {(s) i ,a i ,r i ,s i+1 )} t=1,…,N And updating the Actor network and the Critic network by a gradient descent method, wherein the specific flow is as follows:
step 3.4.1: the SAC algorithm encourages more exploration and obtains more stable training performance by pursuing maximum entropy, and the strategy function pi * The expression of (a) is as follows:
Figure BDA0003950006110000081
where ρ is π Representing the distribution of the strategy pi; r(s) t ,a t ) Represents a state s t And a t Instant awards are made;
Figure BDA0003950006110000082
representing the entropy of the current strategy pi; α represents the relative importance of the entropy term to the reward, referred to as the temperature parameter, and can be adjusted automatically by minimizing J (α) throughout the training process, i.e.;
Figure BDA0003950006110000083
Figure BDA0003950006110000084
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003950006110000085
generally A represents the dimension of the action.
Step 3.4.2: training parameter theta of soft Q-Network by minimizing soft Bellman residual omega (t) i Namely:
Figure BDA0003950006110000086
Figure BDA0003950006110000087
Figure BDA0003950006110000088
where p is the distribution given the current state and action, given the next state;
Figure BDA0003950006110000089
a function of state values; theta i ' stands for goal, soft Q-Network and theta i ' update by: theta' i ←τ·θ i +(1-τ)·θ′ i I =1,2; τ represents a target smooth primer (target smooth factor), and is generally 0.001;
step 3.4.3: by minimizing J π (φ) to implement the updating of the policy network, i.e.:
Figure BDA0003950006110000091
wherein, y φt ;s t ) For the action of the strategy, ω t Representing a noise variable;
the strategy evaluation and improvement of each gradient step length are completed through steps 3.4.1, 3.4.2 and 3.4.3, namely the updating of the Actor network and the Critic network is completed.
Through the steps, after M rounds of training, a control strategy of the active suspension can be obtained, namely the magnitude of the active force applied by the active suspension under different state observation quantities is used for completing vibration damping control.
With reference to fig. 4, in order to verify whether the generated control strategy can be applied in a real environment, the feasibility and the effectiveness of the generated control strategy are verified by the active suspension hardware on a ring simulation experiment platform. A hardware-in-loop simulation test platform based on a dSPACE/MicroAutoBox active suspension is built, the platform comprises an upper layer and a lower layer, the upper layer is used for generating ideal active control force, and the lower layer is used for obtaining the active control force in an actual environment and feeding back the active control force to the upper layer. The upper layer mainly adopts the SAC deep reinforcement learning algorithm-based active suspension control system provided by the invention, and realizes the output of a control strategy by combining a MicroAutoBox and an upper computer; the microautoBox comprises an active suspension semi-vehicle model, a semi-vehicle active suspension system control problem model, a random road surface excitation model and an active suspension control strategy generated by a deep reinforcement learning algorithm; the upper computer can acquire the states of the vertical acceleration of a vehicle body, the pitch angle acceleration of the vehicle, the road surface unevenness at the wheel position and the like of a semi-vehicle active suspension system in the MicroAutoBox in real time through ControlDesk software; the lower layer mainly simulates front and rear active suspensions respectively by two electromagnetic linear actuators to construct an active suspension semi-vehicle test platform, expected active control force from the upper layer is converted into input of the electromagnetic linear actuators through a DSP controller and a power amplifier, and the electromagnetic linear actuators output actual active control force to a semi-vehicle active suspension system in the upper layer of the MicroAutoBox. The feasibility and the effectiveness of the generated control strategy are verified by comparing the theoretical active control force of the suspension generated based on the deep reinforcement learning algorithm with the actual active control force executed by the electromagnetic linear actuator.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made in the above embodiments by those of ordinary skill in the art without departing from the principle and spirit of the present invention.

Claims (10)

1. An active suspension control strategy generation method based on a deep reinforcement learning algorithm is characterized by comprising the following steps:
the method comprises the following steps: establishing a control problem model of the active suspension semi-vehicle model based on the active suspension semi-vehicle model;
step two: a strategy neural network is built to represent the control strategy of the active suspension;
step three: constructing a reward function in an SAC reinforcement learning algorithm;
step four: and (4) performing iterative training on the strategy neural network in the step two based on the reward function in the step three to generate a converged active suspension control strategy.
2. The active suspension control strategy generation method based on the deep reinforcement learning algorithm is characterized in that in the first step, data of a vehicle body on a random road surface are obtained, the data are visualized through Matlab/Python, abnormal data are removed, the obtained data are analyzed, and parameters which have large influence on active suspension control are obtained through screening and serve as state observation quantities.
3. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 2, characterized in that the state observation quantity of the active suspension system obtained in the first step is used as an input of a strategy neural network to output the active control force action of the active suspension, and the active control forces obtained in different states form an action observation sequence of the active suspension system; and respectively taking the state observed quantity of the active suspension and the active control force action observed sequence as the input and the output of the active suspension controller.
4. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 2, wherein in the first step, the state observation quantity of the suspension system comprises vehicle body vertical displacement, vehicle body vertical acceleration, vehicle pitch angle acceleration, and road surface unevenness q at front wheels f And road surface unevenness q at the rear wheel r The observed quantity of the state at time t is expressed as
Figure FDA0003950006100000011
Wherein z is c Representing vertical displacement of the vehicle body, theta representing the pitch angle of the vehicle body, q f Representing the road surface irregularity at the front wheels, q r Indicating the unevenness of the road surface at the rear wheels.
5. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 1, wherein in the second step, the strategy neural network is a controller of the active suspension, the strategy neural network receives the state observed quantity of the active suspension, and selects the active control force F matched with the state observed quantity alf And F alf The method is characterized in that the method acts on the front suspension and the rear suspension respectively, a new response is generated after the suspension system receives the active control force, and then the state observed quantity of the suspension system is updated, and the processes are circulated so as to realize the vibration damping control of the active suspension.
6. Active suspension control based on deep reinforcement learning algorithm according to claim 4The strategy generation method is characterized in that in the second step, the action observation sequence at the time t is represented as a t ={F alf ,F alr Obtaining a control problem model of the active suspension semi-vehicle model in the step one:
Figure FDA0003950006100000012
7. the active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 1, wherein in the fourth step, updating of the parameters of the strategy neural network is realized through the SAC reinforcement learning algorithm, an action observation sequence made by the controller under the random state observation is obtained through training, and a reward function is constructed to judge the quality of the action under the random state observation.
8. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 7, wherein the SAC reinforcement learning algorithm is an Actor-Critic framework-based modeless algorithm for a continuous motion space, the strategy network Actor is used for guiding the active suspension to select the magnitude of the active control force, and the value network Critic is used for judging the advantages and disadvantages of the currently selected active control force strategy, so as to update the active suspension control strategy.
9. The active suspension control strategy generation method based on the deep reinforcement learning algorithm as claimed in claim 7, wherein in step S3, a reward function is constructed to evaluate the quality of the action under the random state observation quantity, and the reward function is:
Figure FDA0003950006100000021
wherein, F alf Active control force for front suspension controller, F alr Is a rear suspensionActive control force of the gantry controller, q 1 、q 2 Weight coefficient, q, representing the front and rear suspension main control forces, respectively 3 And q is 4 Weight coefficients q of the vertical acceleration of the vehicle body and the pitch angle acceleration of the vehicle, respectively 5 And q is 6 Respectively, the weighting coefficients of the road surface unevenness at the front and rear wheels.
10. The active suspension control strategy generation method based on the deep reinforcement learning algorithm according to claim 1, characterized in that in the fourth step, the vibration data of the active suspension of the real vehicle is used as a data source to verify the effectiveness of the active suspension control strategy obtained after the iterative training convergence, and the active suspension control strategy is finely adjusted to improve the generalization and the adaptivity of the active suspension control strategy.
CN202211445241.8A 2022-11-18 2022-11-18 Active suspension control strategy generation method based on deep reinforcement learning algorithm Pending CN115826402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211445241.8A CN115826402A (en) 2022-11-18 2022-11-18 Active suspension control strategy generation method based on deep reinforcement learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211445241.8A CN115826402A (en) 2022-11-18 2022-11-18 Active suspension control strategy generation method based on deep reinforcement learning algorithm

Publications (1)

Publication Number Publication Date
CN115826402A true CN115826402A (en) 2023-03-21

Family

ID=85529033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211445241.8A Pending CN115826402A (en) 2022-11-18 2022-11-18 Active suspension control strategy generation method based on deep reinforcement learning algorithm

Country Status (1)

Country Link
CN (1) CN115826402A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117784593A (en) * 2024-02-23 2024-03-29 哈尔滨工程大学 Model-free vibration active control method based on Kalman filter

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117784593A (en) * 2024-02-23 2024-03-29 哈尔滨工程大学 Model-free vibration active control method based on Kalman filter
CN117784593B (en) * 2024-02-23 2024-05-03 哈尔滨工程大学 Model-free vibration active control method based on Kalman filter

Similar Documents

Publication Publication Date Title
JP2005538886A (en) Fuzzy controller using a reduced number of sensors
CN111487863B (en) Active suspension reinforcement learning control method based on deep Q neural network
CN109334378B (en) Vehicle ISD suspension active control method based on single neuron PID control
CN111781940B (en) Train attitude control method based on DQN reinforcement learning
CN108859648B (en) Suspension shock absorber damping control switching weighting coefficient determination method
CN106347059B (en) A kind of wheel hub driving electric vehicle active suspension double loop PID control method based on particle cluster algorithm
CN108345218A (en) Vehicle active suspension PID controller design method based on teaching optimization algorithm
CN115826402A (en) Active suspension control strategy generation method based on deep reinforcement learning algorithm
CN112158045A (en) Active suspension control method based on depth certainty strategy gradient
CN109002599A (en) The automobile ride method for optimization analysis tested based on field cause for gossip
CN111444623A (en) Collaborative optimization method and system for damping nonlinear commercial vehicle suspension dynamics
CN112506043B (en) Control method and control system for rail vehicle and vertical shock absorber
Ozcan et al. Optimisation of Nonlinear Spring and Damper Characteristics for Vehicle Ride and Handling Improvement
CN113591360B (en) Magneto-rheological damper structural parameter optimization method based on whole vehicle dynamics model
CN114590090A (en) Direct-drive semi-active suspension control system construction method based on self-adaptive LQR (Low-speed response) wheel hub
CN113761768A (en) Integrated optimization design method of magneto-rheological damper for whole vehicle vibration suppression
Wang et al. Learning-based vibration control of vehicle active suspension
CN110443003B (en) Control and optimal design method of active stabilizer bar system
CN115630442A (en) Vehicle ISD suspension topology optimization design method based on power drive damping
CN117841591B (en) ISD suspension control method based on improved fuzzy neural network PID
Liu et al. Optimal control for automotive seat suspension system based on acceleration based particle swarm optimization
CN115782496B (en) Intelligent evolution method of semi-active suspension system based on MAP control
CN113110031A (en) Fuzzy PID active suspension control system and method based on genetic algorithm optimization
Li et al. LQG control of vehicle active suspension using whale optimization algorithm
Qamar et al. Online adaptive full car active suspension control using b-spline fuzzy-neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination