CN112977606A - Steering compensation control method and device of steering-by-wire system based on DDPG - Google Patents

Steering compensation control method and device of steering-by-wire system based on DDPG Download PDF

Info

Publication number
CN112977606A
CN112977606A CN202110357530.1A CN202110357530A CN112977606A CN 112977606 A CN112977606 A CN 112977606A CN 202110357530 A CN202110357530 A CN 202110357530A CN 112977606 A CN112977606 A CN 112977606A
Authority
CN
China
Prior art keywords
network
steering
algorithm
action
strategy gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110357530.1A
Other languages
Chinese (zh)
Other versions
CN112977606B (en
Inventor
薛仲瑾
李亮
赵锦涛
黄昌尧
钟志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110357530.1A priority Critical patent/CN112977606B/en
Publication of CN112977606A publication Critical patent/CN112977606A/en
Application granted granted Critical
Publication of CN112977606B publication Critical patent/CN112977606B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B62LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
    • B62DMOTOR VEHICLES; TRAILERS
    • B62D5/00Power-assisted or power-driven steering
    • B62D5/04Power-assisted or power-driven steering electrical, e.g. using an electric servo-motor connected to, or forming part of, the steering gear
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B62LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
    • B62DMOTOR VEHICLES; TRAILERS
    • B62D6/00Arrangements for automatically controlling steering depending on driving conditions sensed and responded to, e.g. control circuits
    • B62D6/008Control of feed-back to the steering input member, e.g. simulating road feel in steer-by-wire applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Geometry (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Transportation (AREA)
  • General Health & Medical Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Steering Control In Accordance With Driving Conditions (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a steering compensation control method and a device of a steering-by-wire system based on DDPG (distributed data processing), which comprises the steps of establishing an action Actor network and an action value Critic network of the steering-by-wire system, and constructing a depth certainty strategy gradient learning algorithm framework according to the two networks; designing a reward function required by training; establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework; according to steering scenes of the steer-by-wire system under different working conditions, hardware-in-loop and real vehicle training is carried out on the depth certainty strategy gradient algorithm, and parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm are adjusted, so that the depth certainty strategy gradient algorithm obtains a compensation value of a steering angle of the steer-by-wire system. The method does not need to know the control strategy of the bottom controller of the steer-by-wire system, can be widely suitable for the steer-by-wire system with any structural form, and realizes accurate corner control.

Description

Steering compensation control method and device of steering-by-wire system based on DDPG
Technical Field
The invention relates to the technical field of automobile steering systems, in particular to a steering compensation control method and device of a steering-by-wire system based on DDPG.
Background
With the gradual maturity of the traditional automobile theory and the rapid development of the electronic technology industry level and the control technology, the development theme of the automobile industry is gradually changed into intellectualization and electronization. The steering system of an automobile is one of five major systems of the automobile, is used for controlling the driving direction of the automobile, and is an important ring for controlling the stability of the automobile. The steer-by-wire utilizes an execution motor to realize the following control of the front wheel corner to the target corner, and provides a good hardware basis for automatic driving.
In a broad sense, any steering system that can be controlled by an electric signal rather than by the driver can be considered an SBW system. As shown in fig. 1, there are 4 possible installation positions of a motor in a steering system, namely, the middle of a steering column, the tail end of the steering column, and the left end and the right end of a steering rack; the pipe column and the rack can be selected from mechanical hard connection, clutch soft connection and complete separation, namely, three forms are available. Then the possible configuration of the SBW is 3X 24-1-47 species. Considering that the types of the motors can be selected to be brushless motors, permanent magnet synchronous motors, double-winding motors and the like, and the transmission mechanism can be selected to be a combination of a worm gear, a pinion, a belt pulley, a ball screw and the like, the final scheme of the SBW is more than 2000. In the actual control process, the control logic of the bottom layer Controller of the steer-by-wire system is not opened to the upper layer Controller, often the upper layer Controller of the vehicle gives a target front wheel steering angle to be executed by the steer-by-wire system, and the target front wheel steering angle is sent to the steer-by-wire bottom layer through Controller Area Network (CAN) communicationAnd the controller is used for realizing the target rotation angle by controlling the execution motor through the underlying controller, and the whole process is shown in FIG. 2. Because the system has the problems of communication delay, steering system delay and the like in the actual control process, a certain error and response delay often exist between the actual output rotation angle of the steer-by-wire system and the target value given by the upper controller. In addition, because the control strategies of different steer-by-wire systems are different, the control effect is often different. However, in the actual vehicle control process, even a small error of the steering angle has a great influence on the control effect, and especially under the limit working conditions such as high speed and the like, the vehicle is easy to be unstable due to the error of the front wheel steering angle.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, the invention aims to provide a steering compensation control method of a steering-by-wire system based on DDPG, which can be widely adapted to the steering-by-wire system with any structural form without knowing the control strategy of an underlying controller of the steering-by-wire system.
Another object of the present invention is to provide a steering compensation control apparatus for a steer-by-wire system based on DDPG.
In order to achieve the above object, an embodiment of the invention provides a steering compensation control method for a steer-by-wire system based on DDPG, which includes:
s1, establishing an action Actor network and an action value criticic network of the steer-by-wire system, and establishing a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network;
s2, designing a reward function required by training;
s3, establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework;
and S4, performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.
In addition, the steering compensation control method of the DDPG-based steer-by-wire system according to the above-described embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the S1 further includes:
s11, defining a state space S ═ { v ═ vx,wz,δ,δdesAnd the state vector st=[vx_t,wz_ttt-1des_tdes_t-1]T,stE S, wherein vxFor longitudinal speed of the vehicle, wzFor yaw rate of vehicle, delta is actual turning angle, deltadesIs a target corner, t is the current moment, and t-1 is the last moment;
s12, establishing the action Actor network a ═ μ (S | θ)μ) Where μ denotes an Actor network, the state variable s is the network input, θμIs a network parameter, a is a network output action;
s13, establishing the action value criticic network Q (S, a | theta)Q) Wherein Q represents Critic network, the state variable s and the output action a of the Actor network are input, and thetaQAre network parameters.
Further, in one embodiment of the present invention, the reward function is:
r=-w1desa|-w2desa)2-w3|ΔIoutput|
wherein, deltadesIs the target angle of rotation, deltaaFor steer-by-wire actual steering angle, Δ IoutputIs the distance, w, between the current output and the output of the action network at the last momentiAnd i is 1,2 and 3, which are weight coefficients of each term.
Further, in one embodiment of the present invention, the action Actor network and the action merit Critic network are hidden neural networks.
Further, in an embodiment of the present invention, S3 further includes:
s301, randomly initializing an online Critic network Q (S, a | theta)Q) And online Actor network mu (s | theta)μ);
S302, initializing a target Critic network Q (S, a | theta)Q') And the target Actor network mu (s | theta)μ'),θQ'←θQ,θμ'←θμ
S303, initializing an experience playback set R;
s304, the online Actor network obtains the action a based on the state St=μ(stμ)+NtWherein N istIs random process noise;
s305, executing the action atReceive a training reward rtAnd new state st+1
S306, mixing (S)t,at,rt,st+1) Storing the experience playback set R;
s307, sampling N random samples (S) from the empirical replay set Ri,ai,ri,si+1) Let yi=riγQ'(si+1,μ'(si+1μ')|θQ') (ii) a Gamma is a discount factor, and the value of gamma is (0, 1);
s308, by minimizing the loss function
Figure BDA0003004050990000031
Updating network parameter theta of online Critic networkQ
S309, according to the sampling gradient
Figure BDA0003004050990000032
Updating an Actor strategy;
s310, updating the target network
θQ'←ξθQ+(1-ξ)θQ'
θμ'←ξθμ+(1-ξ)θμ'
Xi is an updating parameter, and xi is 1;
s311 returns to S304, and a plurality of iterations are performed.
Further, in an embodiment of the present invention, training the depth deterministic strategy gradient algorithm further comprises:
according to steering scenes of the steer-by-wire system under different working conditions, performing hardware-in-loop training on a depth certainty strategy gradient algorithm, wherein the hardware-in-loop training system comprises an upper computer PC, a lower computer PXI, a steer-by-wire system ECU and a steer-by-wire rack; in the training process, the output of the DDPG is used as a compensation value of a target corner, a compensated target corner command is sent to a bottom controller of the wire-controlled steering system, in addition, an actual corner executed by the bottom layer of the wire-controlled steering rack is sent to an upper computer and used as the input of vehicle operation simulation software Carsim, and the current state s of the DDPGt=[vx_t,wz_ttt-1des_tdes_t-1]T,stThe epsilon S is obtained through the vehicle state output by Carsim and the target turning angle initially input by the system, and parameters of the Actor and Critic networks are adjusted by utilizing a learning algorithm.
Further, in an embodiment of the present invention, S4 is followed by:
and S5, applying the algorithm prior network parameters obtained by the hardware-in-loop training as initial values of the algorithm network parameters to the real vehicle, and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.
Further, in an embodiment of the present invention, the target rotation angle compensation value output by the depth deterministic strategy gradient algorithm is added to the target rotation angle to obtain a compensated target rotation angle command, and the compensated target rotation angle command is used as the target rotation angle actually sent to the underlying controller of the steering-by-wire system.
In order to achieve the above object, an embodiment of another aspect of the present invention provides a steering compensation control apparatus for a steer-by-wire system based on a DDPG, including:
the system comprises a building module, a control module and a control module, wherein the building module is used for building an action Actor network and an action value criticic network of a steer-by-wire system and building a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network;
the training module is used for designing a reward function required by training;
the establishing module is used for establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework;
and the compensation module is used for performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm so as to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.
In addition, the steering compensation control device of the steering-by-wire system based on the DDPG according to the above embodiment of the present invention may further have the following additional technical features:
further, in an embodiment of the present invention, the method further includes:
and the adjusting module is used for applying the algorithm prior network parameters obtained by the hardware-in-the-loop training as initial values of the algorithm network parameters to the real vehicle and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.
The steering compensation control method and the device of the steering-by-wire system based on the DDPG in the embodiment of the invention do not need to know the control strategy of the bottom controller of the steering-by-wire system, can be widely adapted to the steering-by-wire system with any structural form, can realize accurate steering angle control by the steering compensation control method operated by the upper controller of the vehicle, and solve the problem that the quick, accurate and stable front wheel steering angle response of the steering-by-wire system is difficult to realize in the related technology.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow chart of a steer-by-wire control method;
FIG. 2 is a schematic diagram of a steer-by-wire control method;
FIG. 3 is a flow chart of a DDPG based steer-by-wire system steer compensation control method according to one embodiment of the present invention;
FIG. 4 is a block diagram of an action Actor network and an action value Critic network according to one embodiment of the invention;
FIG. 5 is a flow diagram of a deep deterministic tacticity algorithm according to one embodiment of the present invention;
FIG. 6 is a schematic diagram of a steer-by-wire hardware-in-the-loop system according to one embodiment of the present invention;
FIG. 7 is a flow chart of a steer-by-wire compensation algorithm training process according to one embodiment of the present invention;
fig. 8 is a schematic structural diagram of a steering compensation control device of a DDPG-based steer-by-wire system according to an embodiment of the invention.
Reference numerals: 1-an actuator motor; 2-road induction motor.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The following describes a steering compensation control method and device of a DDPG-based steer-by-wire system according to an embodiment of the present invention with reference to the accompanying drawings.
A steering compensation control method of a DDPG-based steer-by-wire system proposed according to an embodiment of the present invention will be described first with reference to the accompanying drawings.
Fig. 3 is a flowchart of a steering compensation control method of a DDPG-based steer-by-wire system according to an embodiment of the present invention.
As shown in fig. 3, the steer-by-wire system steering compensation control method based on Deep Deterministic Policy Gradient (DDPG) includes the following steps:
and step S1, establishing an action Actor network and an action value criticic network of the steer-by-wire system, and establishing a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network.
Further, S1 further includes:
s11, defining a state space S ═ { v ═ vx,wz,δ,δdesAnd the state vector st=[vx_t,wz_ttt-1des_tdes_t-1]T,stE S, wherein vxFor longitudinal speed of the vehicle, wzFor yaw rate of vehicle, delta is actual turning angle, deltadesIs a target corner, t is the current moment, and t-1 is the last moment;
s12, establishing an action Actor network a ═ μ (S | θ)μ) Where μ denotes an Actor network, the state variable s is the network input, θμIs a network parameter, a is a network output Action (Action);
s13, establishing a criticic network Q (S, a | theta) of action valueQ) Wherein Q represents Critic network, the state variable s and the output action a of the Actor network are input, and thetaQAre network parameters.
The structure and interrelation of the action Actor network and the action value Critic network are shown in fig. 4, and both are hidden neural networks.
In step S2, a reward function for training is designed.
The training reward r is designed as follows:
r=-w1desa|-w2desa)2-w3|ΔIoutput|
wherein, deltadesIs the target angle of rotation, deltaaFor steer-by-wire actual steering angle, Δ IoutputTo action networksThe distance between the current output and the output at the last moment, wiAnd i is 1,2 and 3, which are weight coefficients of each term.
The training reward is divided into two parts, the first two items in the above formula in order to evaluate the tracking error between the actual rotation angle and the target rotation angle, and the last item in the above formula in order to reduce jitter.
And step S3, establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework.
The algorithm flow for establishing the depth deterministic strategy gradient is shown in fig. 5.
S301, randomly initializing an online Critic network Q (S, a | theta)Q) And online Actor network mu (s | theta)μ);
S302, initializing a target Critic network Q (S, a | theta)Q') And the target Actor network mu (s | theta)μ'),θQ'←θQ,θμ'←θμ
S303, initializing an experience playback set R;
s304, the online Actor network obtains the action a based on the state St=μ(stμ)+NtWherein N istIs random process noise;
s305, executing the action atReceive a training reward rtAnd new state st+1
S306, mixing (S)t,at,rt,st+1) Storing the experience playback set R;
s307, sampling N random samples (S) from the empirical replay set Ri,ai,ri,si+1) Let yi=riγQ'(si+1,μ'(si+1μ')|θQ') (ii) a Gamma is a discount factor, and the value of gamma is (0, 1);
s308, by minimizing the loss function
Figure BDA0003004050990000061
Updating network parameter theta of online Critic networkQ
S309, according to the sampling gradient
Figure BDA0003004050990000062
Updating an Actor strategy;
s310, updating the target network
θQ'←ξθQ+(1-ξ)θQ'
θμ'←ξθμ+(1-ξ)θμ'
Xi is an updating parameter, and xi is 1;
s311 returns to S304, and a plurality of iterations are performed.
It will be appreciated that i ═ 1 … T, and that the iteration is ended by a number of iterations until either an iteration end condition is met or the number of iterations is met.
And step S4, performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.
Further, training the depth-deterministic strategy gradient algorithm further comprises:
and performing hardware-in-loop training on the depth certainty strategy gradient algorithm according to the steering scene of the steer-by-wire system under different working conditions.
In order to train a steer-by-wire compensation algorithm, steering scenes under different working conditions are set up in a simulation environment, hardware-in-loop pre-training is carried out on a steer-by-wire system, a control schematic diagram of the hardware-in-loop system is shown in figure 6, the hardware-in-loop system comprises an upper computer PC, a lower computer PXI, a steer-by-wire system ECU and a steer-by-wire rack, in the control process, the upper computer provides a vehicle running simulation environment Carsim for the hardware-in-loop system, an upper control program is downloaded to the lower computer PXI, the lower computer sends a target corner obtained by the upper control program to the steer-by-wire system ECU, the ECU controls an actuator of the steer-by-wire system to realize corner control of the steer-by-wire rack, and meanwhile, a sensor of the rack can also send collected signals such as the steer-byA machine is provided. The training process of the algorithm is shown in figure 7, the aim of the algorithm is to ensure that the steering-by-wire system can quickly and accurately realize a target corner, in the training process, the output of the DDPG is used as a compensation value of the target corner and is added with the target corner to obtain a compensated target corner command, the compensated target corner command is sent to a bottom controller of the steering-by-wire system, in addition, the actual corner executed by the bottom layer of the steering-by-wire system is sent to an upper computer and is used as the input of simulation software Carsim, and the current state s of the DDPGt=[vx_t,wz_ttt-1des_tdes_t-1]T,stThe epsilon S is obtained through the vehicle state output by Carsim and the target turning angle initially input by the system, and parameters of the Actor and Critic networks are adjusted by utilizing a learning algorithm.
In order to ensure the performance of the compensation algorithm provided by the invention, the training process of the algorithm should contain as many scenes as possible. The training should use time-varying target turning angles and time-varying longitudinal vehicle speeds. And then, performing batch pre-training based on the data to obtain algorithm prior network parameters.
Further, in the embodiment of the present invention, the method further includes:
and S5, applying the algorithm prior network parameters obtained by the hardware-in-loop training as initial values of the algorithm network parameters to the real vehicle, and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.
It can be understood that the algorithm prior network parameters obtained by hardware in-loop training are applied to the real vehicle as initial values of the algorithm network parameters, and the network parameters of the algorithm are updated in real time based on instant data in the running process of the vehicle. In order to improve the generalization migration capability of the algorithm, ensure the performance of the algorithm in a new environment, establish a lifelong learning mechanism, maintain playback cache and avoid long-term forgetting by adopting a batch training method.
According to the steering compensation control method of the steering-by-wire system based on the DDPG, which is provided by the embodiment of the invention, the control strategy of a bottom layer controller of the steering-by-wire system does not need to be known, the method can be widely suitable for the steering-by-wire system with any structural form, the steering compensation control method operated by an upper layer controller of a vehicle can realize accurate steering angle control, and the problem that the quick, accurate and stable front wheel steering angle response of the steering-by-wire system is difficult to realize in the related technology is solved.
Next, a steering compensation control apparatus of a DDPG-based steer-by-wire system proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.
Fig. 8 is a schematic structural diagram of a steering compensation control device of a DDPG-based steer-by-wire system according to an embodiment of the present invention.
As shown in fig. 8, the DDPG-based steer-by-wire system steering compensation control apparatus includes: a building module 801, a training module 802, a building module 803 and a compensation module 804.
The building module 801 is used for building an action Actor network and an action value criticic network of the steer-by-wire system and building a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network.
A training module 802 for designing a reward function required for training.
The establishing module 803 is configured to establish a depth certainty policy gradient algorithm according to the reward function and the depth certainty policy gradient learning algorithm framework.
And the compensation module 804 is used for performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm so that the depth certainty strategy gradient algorithm obtains a target corner compensation value.
Further, in the embodiment of the present invention, the method further includes:
and the adjusting module is used for applying the algorithm prior network parameters obtained by the hardware-in-the-loop training as initial values of the algorithm network parameters to the real vehicle and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.
Further, in the embodiment of the present invention, the target corner compensation value output by the depth deterministic strategy gradient algorithm is added to the target corner to obtain a compensated target corner command, and the compensated target corner command is used as the target corner actually sent to the bottom controller of the steering-by-wire system.
It should be noted that the foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and is not repeated herein.
According to the steering compensation control device of the steering-by-wire system based on the DDPG, which is provided by the embodiment of the invention, the control strategy of a bottom layer controller of the steering-by-wire system does not need to be known, the device can be widely suitable for the steering-by-wire system with any structural form, the steering compensation control method operated by an upper layer controller of a vehicle can realize accurate steering angle control, and the problem that the quick, accurate and stable front wheel steering angle response of the steering-by-wire system is difficult to realize in the related technology is solved.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A steering compensation control method of a steering-by-wire system based on DDPG is characterized by comprising the following steps:
s1, establishing an action Actor network and an action value criticic network of the steer-by-wire system, and establishing a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network;
s2, designing a reward function required by training;
s3, establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework;
and S4, performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.
2. The method according to claim 1, wherein the S1 further comprises:
s11, defining a state space S ═ { v ═ vx,wz,δ,δdesAnd the state vector st=[vx_t,wz_ttt-1des_tdes_t-1]T,stE S, wherein vxFor longitudinal speed of the vehicle, wzFor yaw rate of vehicle, delta is actual turning angle, deltadesIs a target corner, t is the current moment, and t-1 is the last moment;
s12, establishing the action Actor network a ═ μ (S | θ)μ) Where μ denotes an Actor network, the state variable s is the network input, θμIs a network parameter, a is a network output action;
s13, establishing the action value CriticNetwork Q (s, a | θ)Q) Wherein Q represents Critic network, the state variable s and the output action a of the Actor network are input, and thetaQAre network parameters.
3. The method of claim 1, wherein the reward function is:
r=-w1desa|-w2desa)2-w3|ΔIoutput|
wherein, deltadesIs the target angle of rotation, deltaaFor steer-by-wire actual steering angle, Δ IoutputIs the distance, w, between the current output and the output of the action network at the last momentiAnd i is 1,2 and 3, which are weight coefficients of each term.
4. The method of claim 1, wherein the action Actor network and the action value Critic network are hidden neural networks.
5. The method of claim 1, wherein S3 further comprises:
s301, randomly initializing an online Critic network Q (S, a | theta)Q) And online Actor network mu (s | theta)μ);
S302, initializing a target Critic network Q (S, a | theta)Q') And the target Actor network mu (s | theta)μ'),θQ'←θQ,θμ'←θμ
S303, initializing an experience playback set R;
s304, the online Actor network obtains the action a based on the state St=μ(stμ)+NtWherein N istIs random process noise;
s305, executing the action atReceive a training reward rtAnd new state st+1
S306, mixing (S)t,at,rt,st+1) Storing the experience playback set R;
s307, sampling N random samples (S) from the empirical replay set Ri,ai,ri,si+1) Let yi=riγQ'(si+1,μ'(si+1μ')|θQ') (ii) a Gamma is a discount factor, and the value of gamma is (0, 1);
s308, by minimizing the loss function
Figure FDA0003004050980000021
Updating network parameter theta of online Critic networkQ
S309, according to the sampling gradient
Figure FDA0003004050980000022
Updating an Actor strategy;
s310, updating the target network
θQ′←ξθQ+(1-ξ)θQ′
θμ′←ξθμ+(1-ξ)θμ′
Xi is an updating parameter, and xi is 1;
s311 returns to S304, and a plurality of iterations are performed.
6. The method of claim 1, wherein the depth-deterministic strategy gradient algorithm is trained, further comprising:
according to steering scenes of the steer-by-wire system under different working conditions, performing hardware-in-loop training on a depth certainty strategy gradient algorithm, wherein the hardware-in-loop training system comprises an upper computer PC, a lower computer PXI, a steer-by-wire system ECU and a steer-by-wire rack; in the training process, the output of the DDPG is used as a compensation value of a target corner, a compensated target corner command is sent to a bottom controller of the wire-controlled steering system, in addition, an actual corner executed by the bottom layer of the wire-controlled steering rack is sent to an upper computer and used as the input of vehicle operation simulation software Carsim, and the current state s of the DDPGt=[vx_t,wz_ttt-1des_tdes_t-1]T,stThe epsilon S is obtained through the vehicle state output by Carsim and the target turning angle initially input by the system, and parameters of the Actor and Critic networks are adjusted by utilizing a learning algorithm.
7. The method of claim 6, wherein the step of S4 is further followed by:
and S5, applying the algorithm prior network parameters obtained by the hardware-in-loop training as initial values of the algorithm network parameters to the real vehicle, and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.
8. The method according to claim 1, characterized in that the target rotation angle compensation value output by the depth deterministic strategy gradient algorithm is added to the target rotation angle to obtain a compensated target rotation angle command, and the compensated target rotation angle command is used as the target rotation angle actually sent to the underlying controller of the steering-by-wire system.
9. A steering compensation control device of a wire control steering system based on DDPG is characterized by comprising the following components:
the system comprises a building module, a control module and a control module, wherein the building module is used for building an action Actor network and an action value criticic network of a steer-by-wire system and building a depth certainty strategy gradient learning algorithm framework according to the action Actor network and the action value criticic network;
the training module is used for designing a reward function required by training;
the establishing module is used for establishing a depth certainty strategy gradient algorithm according to the reward function and the depth certainty strategy gradient learning algorithm framework;
and the compensation module is used for performing hardware-in-loop and real vehicle training on the depth certainty strategy gradient algorithm, and adjusting parameters of an Actor network and a Critic network of the depth certainty strategy gradient algorithm so as to enable the depth certainty strategy gradient algorithm to obtain a target corner compensation value.
10. A DDPG-based steer-by-wire system steering compensation control apparatus according to claim 9, further comprising:
and the adjusting module is used for applying the algorithm prior network parameters obtained by the hardware-in-the-loop training as initial values of the algorithm network parameters to the real vehicle and updating the algorithm network parameters in real time according to the real-time data in the running process of the vehicle.
CN202110357530.1A 2021-04-01 2021-04-01 Steering compensation control method and device of steering-by-wire system based on DDPG Active CN112977606B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110357530.1A CN112977606B (en) 2021-04-01 2021-04-01 Steering compensation control method and device of steering-by-wire system based on DDPG

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110357530.1A CN112977606B (en) 2021-04-01 2021-04-01 Steering compensation control method and device of steering-by-wire system based on DDPG

Publications (2)

Publication Number Publication Date
CN112977606A true CN112977606A (en) 2021-06-18
CN112977606B CN112977606B (en) 2022-11-11

Family

ID=76338856

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110357530.1A Active CN112977606B (en) 2021-04-01 2021-04-01 Steering compensation control method and device of steering-by-wire system based on DDPG

Country Status (1)

Country Link
CN (1) CN112977606B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007230471A (en) * 2006-03-03 2007-09-13 Nissan Motor Co Ltd Steering control device for vehicle
CN109664938A (en) * 2018-12-29 2019-04-23 南京航空航天大学 Steering-by-wire dual motors system and its Yaw stability compensation policy based on driving behavior identification
KR20190044402A (en) * 2017-10-20 2019-04-30 주식회사 만도 Vehicle pulling compensation apparatus for steer-by-wire system and method thereof
CN110322017A (en) * 2019-08-13 2019-10-11 吉林大学 Automatic Pilot intelligent vehicle Trajectory Tracking Control strategy based on deeply study
CN110949499A (en) * 2019-11-26 2020-04-03 江苏大学 Unmanned driving corner compensation system of commercial vehicle and control method thereof
CN110989576A (en) * 2019-11-14 2020-04-10 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN111985614A (en) * 2020-07-23 2020-11-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007230471A (en) * 2006-03-03 2007-09-13 Nissan Motor Co Ltd Steering control device for vehicle
KR20190044402A (en) * 2017-10-20 2019-04-30 주식회사 만도 Vehicle pulling compensation apparatus for steer-by-wire system and method thereof
CN109664938A (en) * 2018-12-29 2019-04-23 南京航空航天大学 Steering-by-wire dual motors system and its Yaw stability compensation policy based on driving behavior identification
CN110322017A (en) * 2019-08-13 2019-10-11 吉林大学 Automatic Pilot intelligent vehicle Trajectory Tracking Control strategy based on deeply study
CN110989576A (en) * 2019-11-14 2020-04-10 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN110949499A (en) * 2019-11-26 2020-04-03 江苏大学 Unmanned driving corner compensation system of commercial vehicle and control method thereof
CN111985614A (en) * 2020-07-23 2020-11-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何祥坤: "自动驾驶汽车紧急避撞系统的运动控制与决策方法研究", 《中国优秀博硕士学位论文全文数据库(博士) 工程科技Ⅱ辑》 *

Also Published As

Publication number Publication date
CN112977606B (en) 2022-11-11

Similar Documents

Publication Publication Date Title
Ji et al. Shared steering torque control for lane change assistance: A stochastic game-theoretic approach
Huang et al. Sliding mode predictive tracking control for uncertain steer-by-wire system
US20180170421A1 (en) Electric power steering apparatus
CN103381826B (en) Based on the self-adapting cruise control method of approximate Policy iteration
CN104360596B (en) Limited time friction parameter identification and adaptive sliding mode control method for electromechanical servo system
DE102011086295B4 (en) A method of controlling a motor drive torque in an electric power steering system
CN109204447B (en) Electric power steering system with unified architecture for multiple operating modes
EP3699698B1 (en) Method and device for processing control parameter, and storage medium
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN107807657B (en) Flexible spacecraft attitude self-adaptive control method based on path planning
CN110341787B (en) EPS-based automobile driving mode switching method
CN112558468B (en) Launching platform adaptive robust output feedback control method based on double observers
CN113297783A (en) Method and device for supporting the planning of maneuvers of a vehicle or robot
CN112977606B (en) Steering compensation control method and device of steering-by-wire system based on DDPG
Liu et al. Iterative learning based neural network sliding mode control for repetitive tasks: With application to a PMLSM with uncertainties and external disturbances
Zhu et al. Composite chattering-free discrete-time sliding mode controller design for active front steering system of electric vehicles
Hodgson et al. Effect of vehicle mass changes on the accuracy of Kalman filter estimation of electric vehicle speed
JP2007514600A (en) Method and system for controlling steering angle of steerable rear wheel and corresponding vehicle
CN114063453B (en) Helicopter system control method, system, device and medium based on reinforcement learning
CN113985870B (en) Path planning method based on meta reinforcement learning
Pano et al. Shared control based on an ecological feedforward and a driver model based feedback
Wang et al. A Fractional Derivative‐Based Lateral Preview Driver Model for Autonomous Automobile Path Tracking
WO2022189021A1 (en) Apparatus and method for influencing a vehicle behaviour
DE102017217084A1 (en) Method for controlling a steering system with an electric steering assistance
Li et al. DOS-robust dynamic speed tracking controller for an integrated motor-gearbox powertrain system of a connected car

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant