CN112977606A

CN112977606A - Steering compensation control method and device of steering-by-wire system based on DDPG

Info

Publication number: CN112977606A
Application number: CN202110357530.1A
Authority: CN
Inventors: 薛仲瑾; 李亮; 赵锦涛; 黄昌尧; 钟志华
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2021-06-18
Anticipated expiration: 2041-04-01
Also published as: CN112977606B

Abstract

The invention discloses a steering compensation control method and a device of a steering-by-wire system based on DDPG (distributed data processing), which comprises the steps of establishing an action Actor network and an action value Critic network of the steering-by-wire system, and constructing a depth certainty strategy gradient learning algorithm framework according to the two networks; designing a reward function required by training; establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework; according to steering scenes of the steer-by-wire system under different working conditions, hardware-in-loop and real vehicle training is carried out on the depth certainty strategy gradient algorithm, and parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm are adjusted, so that the depth certainty strategy gradient algorithm obtains a compensation value of a steering angle of the steer-by-wire system. The method does not need to know the control strategy of the bottom controller of the steer-by-wire system, can be widely suitable for the steer-by-wire system with any structural form, and realizes accurate corner control.

Description

Steering compensation control method and device of steering-by-wire system based on DDPG

Technical Field

The invention relates to the technical field of automobile steering systems, in particular to a steering compensation control method and device of a steering-by-wire system based on DDPG.

Background

With the gradual maturity of the traditional automobile theory and the rapid development of the electronic technology industry level and the control technology, the development theme of the automobile industry is gradually changed into intellectualization and electronization. The steering system of an automobile is one of five major systems of the automobile, is used for controlling the driving direction of the automobile, and is an important ring for controlling the stability of the automobile. The steer-by-wire utilizes an execution motor to realize the following control of the front wheel corner to the target corner, and provides a good hardware basis for automatic driving.

In a broad sense, any steering system that can be controlled by an electric signal rather than by the driver can be considered an SBW system. As shown in fig. 1, there are 4 possible installation positions of a motor in a steering system, namely, the middle of a steering column, the tail end of the steering column, and the left end and the right end of a steering rack; the pipe column and the rack can be selected from mechanical hard connection, clutch soft connection and complete separation, namely, three forms are available. Then the possible configuration of the SBW is 3X 2⁴-1-47 species. Considering that the types of the motors can be selected to be brushless motors, permanent magnet synchronous motors, double-winding motors and the like, and the transmission mechanism can be selected to be a combination of a worm gear, a pinion, a belt pulley, a ball screw and the like, the final scheme of the SBW is more than 2000. In the actual control process, the control logic of the bottom layer Controller of the steer-by-wire system is not opened to the upper layer Controller, often the upper layer Controller of the vehicle gives a target front wheel steering angle to be executed by the steer-by-wire system, and the target front wheel steering angle is sent to the steer-by-wire bottom layer through Controller Area Network (CAN) communicationAnd the controller is used for realizing the target rotation angle by controlling the execution motor through the underlying controller, and the whole process is shown in FIG. 2. Because the system has the problems of communication delay, steering system delay and the like in the actual control process, a certain error and response delay often exist between the actual output rotation angle of the steer-by-wire system and the target value given by the upper controller. In addition, because the control strategies of different steer-by-wire systems are different, the control effect is often different. However, in the actual vehicle control process, even a small error of the steering angle has a great influence on the control effect, and especially under the limit working conditions such as high speed and the like, the vehicle is easy to be unstable due to the error of the front wheel steering angle.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, the invention aims to provide a steering compensation control method of a steering-by-wire system based on DDPG, which can be widely adapted to the steering-by-wire system with any structural form without knowing the control strategy of an underlying controller of the steering-by-wire system.

Another object of the present invention is to provide a steering compensation control apparatus for a steer-by-wire system based on DDPG.

In order to achieve the above object, an embodiment of the invention provides a steering compensation control method for a steer-by-wire system based on DDPG, which includes:

s1, establishing an action Actor network and an action value criticic network of the steer-by-wire system, and establishing a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network;

s2, designing a reward function required by training;

s3, establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework;

and S4, performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.

In addition, the steering compensation control method of the DDPG-based steer-by-wire system according to the above-described embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the S1 further includes:

s11, defining a state space S ═ { v ═ v_x,w_z,δ,δ_desAnd the state vector s_t＝[v_{x_t},w_{z_t},δ_t,δ_t-1,δ_{des_t},δ_{des_t-1}]^T,s_tE S, wherein v_xFor longitudinal speed of the vehicle, w_zFor yaw rate of vehicle, delta is actual turning angle, delta_desIs a target corner, t is the current moment, and t-1 is the last moment;

s12, establishing the action Actor network a ═ μ (S | θ)^μ) Where μ denotes an Actor network, the state variable s is the network input, θ^μIs a network parameter, a is a network output action;

s13, establishing the action value criticic network Q (S, a | theta)^Q) Wherein Q represents Critic network, the state variable s and the output action a of the Actor network are input, and theta^QAre network parameters.

Further, in one embodiment of the present invention, the reward function is:

r＝-w₁|δ_des-δ_a|-w₂(δ_des-δ_a)²-w₃|ΔI_output|

wherein, delta_desIs the target angle of rotation, delta_aFor steer-by-wire actual steering angle, Δ I_outputIs the distance, w, between the current output and the output of the action network at the last moment_iAnd i is 1,2 and 3, which are weight coefficients of each term.

Further, in one embodiment of the present invention, the action Actor network and the action merit Critic network are hidden neural networks.

Further, in an embodiment of the present invention, S3 further includes:

s301, randomly initializing an online Critic network Q (S, a | theta)^Q) And online Actor network mu (s | theta)^μ)；

S302, initializing a target Critic network Q (S, a | theta)^Q') And the target Actor network mu (s | theta)^μ')，θ^Q'←θ^Q，θ^μ'←θ^μ；

S303, initializing an experience playback set R;

s304, the online Actor network obtains the action a based on the state S_t＝μ(s_t|θ^μ)+N_tWherein N is_tIs random process noise;

s305, executing the action a_tReceive a training reward r_tAnd new state s_t+1；

S306, mixing (S)_t,a_t,r_t,s_t+1) Storing the experience playback set R;

s307, sampling N random samples (S) from the empirical replay set R_i,a_i,r_i,s_i+1) Let y_i＝r_iγQ'(s_i+1,μ'(s_i+1|θ^μ')|θ^Q') (ii) a Gamma is a discount factor, and the value of gamma is (0, 1);

s308, by minimizing the loss function

Updating network parameter theta of online Critic network^Q；

S309, according to the sampling gradient

Updating an Actor strategy;

s310, updating the target network

θ^Q'←ξθ^Q+(1-ξ)θ^Q'

θ^μ'←ξθ^μ+(1-ξ)θ^μ'

Xi is an updating parameter, and xi is 1;

s311 returns to S304, and a plurality of iterations are performed.

Further, in an embodiment of the present invention, training the depth deterministic strategy gradient algorithm further comprises:

according to steering scenes of the steer-by-wire system under different working conditions, performing hardware-in-loop training on a depth certainty strategy gradient algorithm, wherein the hardware-in-loop training system comprises an upper computer PC, a lower computer PXI, a steer-by-wire system ECU and a steer-by-wire rack; in the training process, the output of the DDPG is used as a compensation value of a target corner, a compensated target corner command is sent to a bottom controller of the wire-controlled steering system, in addition, an actual corner executed by the bottom layer of the wire-controlled steering rack is sent to an upper computer and used as the input of vehicle operation simulation software Carsim, and the current state s of the DDPG_t＝[v_{x_t},w_{z_t},δ_t,δ_t-1,δ_{des_t},δ_{des_t-1}]^T,s_tThe epsilon S is obtained through the vehicle state output by Carsim and the target turning angle initially input by the system, and parameters of the Actor and Critic networks are adjusted by utilizing a learning algorithm.

Further, in an embodiment of the present invention, S4 is followed by:

and S5, applying the algorithm prior network parameters obtained by the hardware-in-loop training as initial values of the algorithm network parameters to the real vehicle, and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.

Further, in an embodiment of the present invention, the target rotation angle compensation value output by the depth deterministic strategy gradient algorithm is added to the target rotation angle to obtain a compensated target rotation angle command, and the compensated target rotation angle command is used as the target rotation angle actually sent to the underlying controller of the steering-by-wire system.

In order to achieve the above object, an embodiment of another aspect of the present invention provides a steering compensation control apparatus for a steer-by-wire system based on a DDPG, including:

the system comprises a building module, a control module and a control module, wherein the building module is used for building an action Actor network and an action value criticic network of a steer-by-wire system and building a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network;

the training module is used for designing a reward function required by training;

the establishing module is used for establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework;

and the compensation module is used for performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm so as to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.

In addition, the steering compensation control device of the steering-by-wire system based on the DDPG according to the above embodiment of the present invention may further have the following additional technical features:

further, in an embodiment of the present invention, the method further includes:

and the adjusting module is used for applying the algorithm prior network parameters obtained by the hardware-in-the-loop training as initial values of the algorithm network parameters to the real vehicle and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.

The steering compensation control method and the device of the steering-by-wire system based on the DDPG in the embodiment of the invention do not need to know the control strategy of the bottom controller of the steering-by-wire system, can be widely adapted to the steering-by-wire system with any structural form, can realize accurate steering angle control by the steering compensation control method operated by the upper controller of the vehicle, and solve the problem that the quick, accurate and stable front wheel steering angle response of the steering-by-wire system is difficult to realize in the related technology.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a steer-by-wire control method;

FIG. 2 is a schematic diagram of a steer-by-wire control method;

FIG. 3 is a flow chart of a DDPG based steer-by-wire system steer compensation control method according to one embodiment of the present invention;

FIG. 4 is a block diagram of an action Actor network and an action value Critic network according to one embodiment of the invention;

FIG. 5 is a flow diagram of a deep deterministic tacticity algorithm according to one embodiment of the present invention;

FIG. 6 is a schematic diagram of a steer-by-wire hardware-in-the-loop system according to one embodiment of the present invention;

FIG. 7 is a flow chart of a steer-by-wire compensation algorithm training process according to one embodiment of the present invention;

fig. 8 is a schematic structural diagram of a steering compensation control device of a DDPG-based steer-by-wire system according to an embodiment of the invention.

Reference numerals: 1-an actuator motor; 2-road induction motor.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The following describes a steering compensation control method and device of a DDPG-based steer-by-wire system according to an embodiment of the present invention with reference to the accompanying drawings.

A steering compensation control method of a DDPG-based steer-by-wire system proposed according to an embodiment of the present invention will be described first with reference to the accompanying drawings.

Fig. 3 is a flowchart of a steering compensation control method of a DDPG-based steer-by-wire system according to an embodiment of the present invention.

As shown in fig. 3, the steer-by-wire system steering compensation control method based on Deep Deterministic Policy Gradient (DDPG) includes the following steps:

and step S1, establishing an action Actor network and an action value criticic network of the steer-by-wire system, and establishing a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network.

Further, S1 further includes:

s12, establishing an action Actor network a ═ μ (S | θ)^μ) Where μ denotes an Actor network, the state variable s is the network input, θ^μIs a network parameter, a is a network output Action (Action);

s13, establishing a criticic network Q (S, a | theta) of action value^Q) Wherein Q represents Critic network, the state variable s and the output action a of the Actor network are input, and theta^QAre network parameters.

The structure and interrelation of the action Actor network and the action value Critic network are shown in fig. 4, and both are hidden neural networks.

In step S2, a reward function for training is designed.

The training reward r is designed as follows:

r＝-w₁|δ_des-δ_a|-w₂(δ_des-δ_a)²-w₃|ΔI_output|

wherein, delta_desIs the target angle of rotation, delta_aFor steer-by-wire actual steering angle, Δ I_outputTo action networksThe distance between the current output and the output at the last moment, w_iAnd i is 1,2 and 3, which are weight coefficients of each term.

The training reward is divided into two parts, the first two items in the above formula in order to evaluate the tracking error between the actual rotation angle and the target rotation angle, and the last item in the above formula in order to reduce jitter.

And step S3, establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework.

The algorithm flow for establishing the depth deterministic strategy gradient is shown in fig. 5.

S303, initializing an experience playback set R;

S306, mixing (S)_t,a_t,r_t,s_t+1) Storing the experience playback set R;

s308, by minimizing the loss function

Updating network parameter theta of online Critic network^Q；

S309, according to the sampling gradient

Updating an Actor strategy;

s310, updating the target network

θ^Q'←ξθ^Q+(1-ξ)θ^Q'

θ^μ'←ξθ^μ+(1-ξ)θ^μ'

Xi is an updating parameter, and xi is 1;

s311 returns to S304, and a plurality of iterations are performed.

It will be appreciated that i ═ 1 … T, and that the iteration is ended by a number of iterations until either an iteration end condition is met or the number of iterations is met.

And step S4, performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.

Further, training the depth-deterministic strategy gradient algorithm further comprises:

and performing hardware-in-loop training on the depth certainty strategy gradient algorithm according to the steering scene of the steer-by-wire system under different working conditions.

In order to train a steer-by-wire compensation algorithm, steering scenes under different working conditions are set up in a simulation environment, hardware-in-loop pre-training is carried out on a steer-by-wire system, a control schematic diagram of the hardware-in-loop system is shown in figure 6, the hardware-in-loop system comprises an upper computer PC, a lower computer PXI, a steer-by-wire system ECU and a steer-by-wire rack, in the control process, the upper computer provides a vehicle running simulation environment Carsim for the hardware-in-loop system, an upper control program is downloaded to the lower computer PXI, the lower computer sends a target corner obtained by the upper control program to the steer-by-wire system ECU, the ECU controls an actuator of the steer-by-wire system to realize corner control of the steer-by-wire rack, and meanwhile, a sensor of the rack can also send collected signals such as the steer-byA machine is provided. The training process of the algorithm is shown in figure 7, the aim of the algorithm is to ensure that the steering-by-wire system can quickly and accurately realize a target corner, in the training process, the output of the DDPG is used as a compensation value of the target corner and is added with the target corner to obtain a compensated target corner command, the compensated target corner command is sent to a bottom controller of the steering-by-wire system, in addition, the actual corner executed by the bottom layer of the steering-by-wire system is sent to an upper computer and is used as the input of simulation software Carsim, and the current state s of the DDPG_t＝[v_{x_t},w_{z_t},δ_t,δ_t-1,δ_{des_t},δ_{des_t-1}]^T,s_tThe epsilon S is obtained through the vehicle state output by Carsim and the target turning angle initially input by the system, and parameters of the Actor and Critic networks are adjusted by utilizing a learning algorithm.

In order to ensure the performance of the compensation algorithm provided by the invention, the training process of the algorithm should contain as many scenes as possible. The training should use time-varying target turning angles and time-varying longitudinal vehicle speeds. And then, performing batch pre-training based on the data to obtain algorithm prior network parameters.

Further, in the embodiment of the present invention, the method further includes:

It can be understood that the algorithm prior network parameters obtained by hardware in-loop training are applied to the real vehicle as initial values of the algorithm network parameters, and the network parameters of the algorithm are updated in real time based on instant data in the running process of the vehicle. In order to improve the generalization migration capability of the algorithm, ensure the performance of the algorithm in a new environment, establish a lifelong learning mechanism, maintain playback cache and avoid long-term forgetting by adopting a batch training method.

According to the steering compensation control method of the steering-by-wire system based on the DDPG, which is provided by the embodiment of the invention, the control strategy of a bottom layer controller of the steering-by-wire system does not need to be known, the method can be widely suitable for the steering-by-wire system with any structural form, the steering compensation control method operated by an upper layer controller of a vehicle can realize accurate steering angle control, and the problem that the quick, accurate and stable front wheel steering angle response of the steering-by-wire system is difficult to realize in the related technology is solved.

Next, a steering compensation control apparatus of a DDPG-based steer-by-wire system proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 8 is a schematic structural diagram of a steering compensation control device of a DDPG-based steer-by-wire system according to an embodiment of the present invention.

As shown in fig. 8, the DDPG-based steer-by-wire system steering compensation control apparatus includes: a building module 801, a training module 802, a building module 803 and a compensation module 804.

The building module 801 is used for building an action Actor network and an action value criticic network of the steer-by-wire system and building a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network.

A training module 802 for designing a reward function required for training.

The establishing module 803 is configured to establish a depth certainty policy gradient algorithm according to the reward function and the depth certainty policy gradient learning algorithm framework.

And the compensation module 804 is used for performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm so that the depth certainty strategy gradient algorithm obtains a target corner compensation value.

Further, in the embodiment of the present invention, the target corner compensation value output by the depth deterministic strategy gradient algorithm is added to the target corner to obtain a compensated target corner command, and the compensated target corner command is used as the target corner actually sent to the bottom controller of the steering-by-wire system.

It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.

According to the steering compensation control device of the steering-by-wire system based on the DDPG, which is provided by the embodiment of the invention, the control strategy of a bottom layer controller of the steering-by-wire system does not need to be known, the device can be widely suitable for the steering-by-wire system with any structural form, the steering compensation control method operated by an upper layer controller of a vehicle can realize accurate steering angle control, and the problem that the quick, accurate and stable front wheel steering angle response of the steering-by-wire system is difficult to realize in the related technology is solved.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A steering compensation control method of a steering-by-wire system based on DDPG is characterized by comprising the following steps:

s2, designing a reward function required by training;

2. The method according to claim 1, wherein the S1 further comprises:

s13, establishing the action value CriticNetwork Q (s, a | θ)^Q) Wherein Q represents Critic network, the state variable s and the output action a of the Actor network are input, and theta^QAre network parameters.

3. The method of claim 1, wherein the reward function is:

r＝-w₁|δ_des-δ_a|-w₂(δ_des-δ_a)²-w₃|ΔI_output|

4. The method of claim 1, wherein the action Actor network and the action value Critic network are hidden neural networks.

5. The method of claim 1, wherein S3 further comprises:

S303, initializing an experience playback set R;

S306, mixing (S)_t,a_t,r_t,s_t+1) Storing the experience playback set R;

s308, by minimizing the loss function

Updating network parameter theta of online Critic network^Q；

S309, according to the sampling gradient

Updating an Actor strategy;

s310, updating the target network

θ^Q′←ξθ^Q+(1-ξ)θ^Q′

θ^μ′←ξθ^μ+(1-ξ)θ^μ′

Xi is an updating parameter, and xi is 1;

s311 returns to S304, and a plurality of iterations are performed.

6. The method of claim 1, wherein the depth-deterministic strategy gradient algorithm is trained, further comprising:

7. The method of claim 6, wherein the step of S4 is further followed by:

8. The method according to claim 1, characterized in that the target rotation angle compensation value output by the depth deterministic strategy gradient algorithm is added to the target rotation angle to obtain a compensated target rotation angle command, and the compensated target rotation angle command is used as the target rotation angle actually sent to the underlying controller of the steering-by-wire system.

9. A steering compensation control device of a wire control steering system based on DDPG is characterized by comprising the following components:

10. A DDPG-based steer-by-wire system steering compensation control apparatus according to claim 9, further comprising: