CN114228690B

CN114228690B - Automatic driving vehicle roll control method based on DDPG and iterative control

Info

Publication number: CN114228690B
Application number: CN202111353270.7A
Authority: CN
Inventors: 唐晓峰
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2023-05-23
Anticipated expiration: 2041-11-16
Also published as: CN114228690A

Abstract

The invention discloses an automatic driving vehicle roll control method based on DDPG and iterative control, which trains a DDPG algorithm on running maps of the automatic driving vehicle in different scenes, and the automatic driving vehicle generates real-time vehicle states through interaction with map environments in different scenes to determine the action behaviors of the vehicle; initializing an action space when action training is carried out, generating state space information by an online strategy network in an actor network, carrying out action output, and adding an action noise to obtain the action space with exploratory property; based on LSTM historic memory and road planning attributes, generating a predicted path of the state of the automatic driving vehicle, adopting a DDPG algorithm to realize tracking control of the path track under the normal running working condition and the extreme running working condition of the automatic driving vehicle, and adopting an iterative control method to realize compensation control of the automatic driving vehicle. The invention avoids the instability problem of the reinforcement learning algorithm under the extreme road environment driving condition of the vehicle, and improves the driving safety and the robustness of the vehicle.

Description

A roll control method for autonomous driving vehicles based on DDPG and iterative control

技术领域Technical Field

本发明属于智能车辆控制领域，具体涉及基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法。The present invention belongs to the field of intelligent vehicle control, and in particular relates to a roll control method for an autonomous driving vehicle based on DDPG and iterative control.

背景技术Background Art

随着人工智能技术的发展，当前自动驾驶技术有了很大发展，目前侧重于在封闭园区场景，如封闭式校园场景、物流产业园区场景等，尤其是具有结构化道路特征、行人和车辆较少的港湾道路环境，应用更加普遍。自动驾驶车辆采用环境感知、导航、地图定位、决策、运动规划和轨迹跟踪控制来实现车辆的智能化。然而自动驾驶车辆在跨海大桥等复杂天气和复杂行驶环境时，恶劣的气候环境会影响桥梁路况，导致车辆发生转向或侧滑和侧翻现象，如气候环境的雨、雪、风等会导致道路附着系数变化和轮胎打滑，从而改变路径跟踪、车道保持和车辆控制精度。此外，桥梁道路环境可能会因风力天气的影响而产生振动，产生车辆侧倾现象，从而导致无法控制的情况。因此，在总体控制设计中还考虑了道路振动特性、道路角度和空气动力学特性，当车辆具有湿滑路面引起的侧滑现象、桥梁振动引起的车辆偏航特性、高速车辆性能引起的车辆侧倾动力学现象等不确定性特征时，控制技术是一项复杂的任务。因此进行基于恶劣天下环境下的自动驾驶车辆侧倾控制的行驶安全性和稳定性研究，是一项重要的关键技术。强化学习是人工智能技术的应用范畴，智能体可以探索未知的动态环境，尝试不同的动作并与动态环境交互，而无需任何精确的车辆模型和给定的周围环境，强化学习可以学习未知环境，通过与环境交互的动作和状态实现复杂的车辆动力学，是适应动态道路环境认知和复杂车辆动力学性能的一种新的实现方法。因此，采用强化学习算法实现自动驾驶车辆在跨海大桥路况的行驶环境车辆控制有助于实现车辆智能化的安全性，实现自动驾驶车辆的规模产业化发展。With the development of artificial intelligence technology, the current autonomous driving technology has made great progress. At present, it focuses on closed campus scenes, such as closed campus scenes, logistics industrial park scenes, etc., especially in the port road environment with structured road characteristics and fewer pedestrians and vehicles, and its application is more common. Autonomous driving vehicles use environmental perception, navigation, map positioning, decision-making, motion planning and trajectory tracking control to realize the intelligence of vehicles. However, when autonomous driving vehicles are in complex weather and complex driving environments such as cross-sea bridges, the harsh climate environment will affect the bridge road conditions, causing the vehicle to turn or skid and roll over. For example, rain, snow, wind, etc. in the climate environment will cause changes in road adhesion coefficients and tire slippage, thereby changing path tracking, lane keeping and vehicle control accuracy. In addition, the bridge road environment may vibrate due to the influence of windy weather, causing vehicle roll, which leads to uncontrollable situations. Therefore, the road vibration characteristics, road angles and aerodynamic characteristics are also considered in the overall control design. When the vehicle has uncertain characteristics such as side slip caused by slippery roads, vehicle yaw characteristics caused by bridge vibration, and vehicle roll dynamics caused by high-speed vehicle performance, control technology is a complex task. Therefore, it is an important key technology to study the driving safety and stability of the roll control of autonomous vehicles in harsh environments. Reinforcement learning is the application scope of artificial intelligence technology. Intelligent agents can explore unknown dynamic environments, try different actions and interact with dynamic environments without any precise vehicle models and given surrounding environments. Reinforcement learning can learn unknown environments and realize complex vehicle dynamics through actions and states of interaction with the environment. It is a new implementation method to adapt to dynamic road environment cognition and complex vehicle dynamics performance. Therefore, the use of reinforcement learning algorithms to realize the vehicle control of the driving environment of autonomous vehicles on cross-sea bridge road conditions will help to achieve the safety of intelligent vehicles and realize the large-scale industrial development of autonomous vehicles.

发明内容Summary of the invention

发明目的：本发明提供了一种基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，使车辆可以在任意不同复杂级别的跨海大桥路口环境内安全行驶，并能通过瞬时侧倾角度，提高车辆的智能化水平。Purpose of the invention: The present invention provides a roll control method for an autonomous driving vehicle based on DDPG and iterative control, which enables the vehicle to drive safely in any cross-sea bridge intersection environment of different complexity levels, and can improve the intelligence level of the vehicle through the instantaneous roll angle.

技术方案：本发明提供一种基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，具体包括以下步骤：Technical solution: The present invention provides a roll control method for an autonomous driving vehicle based on DDPG and iterative control, which specifically includes the following steps:

(1)在自动驾驶车辆上安装激光雷达、视觉、毫米波雷达、超声波雷达传感器、定位系统和惯性导航系统；(1) Installing laser radar, vision, millimeter wave radar, ultrasonic radar sensors, positioning systems and inertial navigation systems on autonomous vehicles;

(2)使用视觉传感器、定位系统和惯性导航系统实现车辆分别在不同场景下的位置和地图，以生成不同场景下的自动驾驶车辆行驶地图，实现车辆行驶轨迹所需要的环境；(2) Use visual sensors, positioning systems, and inertial navigation systems to realize the position and map of the vehicle in different scenarios, so as to generate driving maps of autonomous vehicles in different scenarios and realize the environment required for the vehicle's driving trajectory;

(3)分别控制方向盘、油门和踏板，在跨海大桥上行驶，获取雨雪天、强风恶劣天气、晴天下相应的行驶轨迹，构建数据集；(3) Control the steering wheel, accelerator, and pedals respectively, drive on the cross-sea bridge, obtain the corresponding driving trajectories in rainy and snowy days, strong winds and bad weather, and sunny days, and construct a data set;

(4)在不同场景下的自动驾驶车辆行驶地图训练DDPG算法，用于恶劣天气下跨海大桥在不同复杂路况等级下的行驶状态；自动驾驶车辆通过与不同场景下的地图环境交互，产生实时的车辆状态，确定车辆的动作行为；在进行动作训练时，对动作空间进行初始化，演员网络中的online策略网络产生状态空间信息，进行动作输出，并增加一个动作噪声来获取具有探索性的动作空间；(4) The DDPG algorithm is trained on driving maps of autonomous vehicles in different scenarios, which is used to monitor the driving status of cross-sea bridges under different levels of complex road conditions in bad weather. Autonomous vehicles interact with map environments in different scenarios to generate real-time vehicle status and determine vehicle action behaviors. During action training, the action space is initialized, the online policy network in the actor network generates state space information, performs action output, and adds an action noise to obtain an exploratory action space.

(5)基于LSTM历史记忆和道路规划属性，生成自动驾驶车辆状态预测的路径，采用DDPG算法实现自动驾驶车辆正常行驶路况下和极端行驶路况下路径轨迹的跟踪控制，并采用迭代控制方法实现自动驾驶车辆补偿控制。(5) Based on the LSTM historical memory and road planning attributes, a path for predicting the state of the autonomous driving vehicle is generated. The DDPG algorithm is used to achieve tracking control of the path trajectory of the autonomous driving vehicle under normal driving conditions and extreme driving conditions, and an iterative control method is used to achieve compensation control of the autonomous driving vehicle.

进一步地，步骤(1)所述激光雷达传感器用来探测道路上的动静态障碍物，包括行人、摩托车和各种车辆等，以及可行使道路区域；所述视觉传感器用来感知车道线、行人和车辆检测工作，以及进行定位和同步地图创建工作；所述毫米波雷达传感器用来探测车辆与行人和行驶的车辆间距；所述超声波雷达用来探测近距离车辆间距；所述视觉传感器、定位系统和惯性导航系统用来实现车辆定位技术。Furthermore, in step (1), the laser radar sensor is used to detect dynamic and static obstacles on the road, including pedestrians, motorcycles and various vehicles, as well as drivable road areas; the visual sensor is used to sense lane lines, pedestrian and vehicle detection, as well as to perform positioning and synchronize map creation; the millimeter wave radar sensor is used to detect the distance between vehicles and pedestrians and moving vehicles; the ultrasonic radar is used to detect the distance between close-range vehicles; and the visual sensor, positioning system and inertial navigation system are used to implement vehicle positioning technology.

进一步地，步骤(2)所述的不同场景包括下雨雪天气下的跨海大桥路况、强风恶劣天气下的跨海大桥路况、晴天时桥梁振动时的路况、频繁多变天气下单车的行驶路况、频繁多变天气下多车的行驶路况五种场景。Furthermore, the different scenarios described in step (2) include five scenarios: road conditions of a cross-sea bridge in rainy and snowy weather, road conditions of a cross-sea bridge in strong wind and severe weather, road conditions when the bridge vibrates on a sunny day, road conditions of a single vehicle in frequent and changeable weather, and road conditions of multiple vehicles in frequent and changeable weather.

进一步地，步骤(3)所述数据集包括车速、行驶轨迹、车辆位置、航向角、滑移角、横摆角速度、侧倾角。Furthermore, the data set in step (3) includes vehicle speed, driving trajectory, vehicle position, heading angle, slip angle, yaw angular velocity, and roll angle.

进一步地，步骤(4)所述DDPG算法的网络设计如下：Furthermore, the network design of the DDPG algorithm in step (4) is as follows:

构建演员网络，以车辆状态和环境状态作为输入，输出是转向角、油门和制动信号组成的矢量，分别对应演员策略网络输出层的3个神经元，设定油门和制动的激活函数是Sigmoid，转向动作值的激活函数是Tanh，隐藏层的结构为：第一层是卷积大小是7*7，滤波器大小是48，步长是4，共200个神经元；第二层是卷积大小是5*5，滤波器大小是16，步长是2，激活函数是ReLu函数，共400个神经元；第三层增加了LSTM层100个神经元；第四层是128个单元的全连接层；第五层是全连接层，共128个单元；评论家网络输入为状态和动作空间，经过两层隐藏层，第一层200个神经元，第二层400个神经元，与激活函数ReLu拼接，最终得出Q值；定义h_i∈(S_t-T,S_t-T+1,…,S_t)，其中，S_t-T和S_t分别表示当前时刻和当前时刻的状态信息，则编码后的状态是：s＝f(h_i；β)，则变化后的演员网络的策略定义为：a＝μ(h_i/β,γ^π)+η。Construct an actor network, with vehicle state and environment state as input, and output a vector consisting of steering angle, throttle and brake signals, which correspond to the three neurons in the output layer of the actor strategy network respectively. Set the activation function of throttle and brake to Sigmoid, and the activation function of steering action value to Tanh. The structure of the hidden layer is as follows: the first layer has a convolution size of 7*7, a filter size of 48, a step size of 4, and a total of 200 neurons; the second layer has a convolution size of 5*5, a filter size of 16, a step size of 2, and an activation function of ReLu, with a total of 400 neurons; the third layer adds 100 neurons to the LSTM layer; the fourth layer is a fully connected layer with 128 units; the fifth layer is a fully connected layer with a total of 128 units; the critic network input is the state and action space, after two hidden layers, 200 neurons in the first layer and 400 neurons in the second layer, spliced with the activation function ReLu, and finally the Q value is obtained; define h _i ∈(S _tT ,S _t-T+1 ,…,S _t ), where S _tT and S _t represent the state information at the current moment and the current moment respectively, then the encoded state is: s = f( _hi ; β), then the strategy of the actor network after the change is defined as: a = μ( _hi /β, ^γπ ) + η.

进一步地，步骤(5)所述的实现对自动驾驶车辆正常行驶路况下路径轨迹的跟踪控制实现过程如下：Furthermore, the tracking control of the path trajectory of the autonomous driving vehicle under normal driving conditions in step (5) is implemented as follows:

正常行驶路况即晴天时桥梁振动时的路况，考虑车辆的侧倾、侧滑和横摆动力学特性，建立车辆动力学模型，并设置车辆状态约束条件，确定横向稳定性范围、最大转向角度范围和防止侧倾的可允许车辆控制的范围，以减少车辆的侧偏移误差：Normal driving conditions, i.e., road conditions when the bridge vibrates on a sunny day, consider the vehicle's roll, sideslip, and yaw dynamics characteristics, establish a vehicle dynamics model, set vehicle state constraints, determine the lateral stability range, maximum steering angle range, and allowable vehicle control range to prevent roll, so as to reduce the vehicle's lateral offset error:

ω_z-min≤ω_z≤ω_z-max,ω_x-min≤ω_x≤ω_x-max，u_x-min≤u_x≤u_x-max，e_r-x-min≤e_r≤e_r-x-max ω _z-min ≤ω _z ≤ω _z-max ,ω _x-min ≤ω _x ≤ω _x-max ,u _x-min ≤u _x ≤u _x-max ,e _rx-min ≤e _r ≤e _{rx- max}

式中，ω_z是横摆角速度；ω_x是车辆侧倾角；u_x是转向角；e_r是横向跟踪偏移误差；Where ω _z is the yaw rate; ω _x is the vehicle roll angle; u _x is the steering angle; _er is the lateral tracking offset error;

根据LSTM预测的道路状态信息，构建考虑转向角度、轮胎附着系数、侧倾角度误差和路径跟踪误差的目标函数，充分考虑到车辆允许误差最大的动力学约束条件下，确定横向跟踪偏移误差所确定的车辆的物理约束，以减少跟踪控制车辆的误差：According to the road state information predicted by LSTM, an objective function considering steering angle, tire adhesion coefficient, roll angle error and path tracking error is constructed. Taking into full account the dynamic constraint of the maximum allowable error of the vehicle, the physical constraint of the vehicle determined by the lateral tracking offset error is determined to reduce the error of tracking and controlling the vehicle:

式中，w₁,w₂,w₃,w₄分别是参数变量；μ_r是道路附着系数；Where w ₁ ,w ₂ ,w ₃ ,w ₄ are parameter variables; μ _r is the road adhesion coefficient;

采用迭代控制算法实现车辆侧倾的补偿控制，并设置参考车辆状态、参考控制输入和参考输出值，以确保在多约束条件下车辆物理约束和道路约束条件下的跟踪函数，增加对车辆行驶时的抗干扰性和降低模型误差率；Iterative control algorithm is used to realize the compensation control of vehicle roll, and reference vehicle state, reference control input and reference output values are set to ensure the tracking function under vehicle physical constraints and road constraints under multiple constraints, increase the anti-interference of vehicle driving and reduce the model error rate;

根据车辆行驶条件，构建DDPG算法所需要的状态空间、动作空间和奖励函数；其中动作空间主要包括方向盘转角、油门和制动信号，状态空间包括车辆横向跟踪误差及其变化率、车辆侧倾角度误差及其变化率和横摆角速度误差及其变化率；奖励函数的构建在跨海大桥振动情况下，车辆的实际轨迹会产生变化，道路产生一定倾角，因此奖励函数等于折扣因子和速度的变化的累乘。According to the vehicle driving conditions, the state space, action space and reward function required by the DDPG algorithm are constructed; the action space mainly includes the steering wheel angle, throttle and brake signals, and the state space includes the vehicle lateral tracking error and its rate of change, the vehicle roll angle error and its rate of change, and the yaw angular velocity error and its rate of change; the reward function is constructed in the case of vibration of the cross-sea bridge, the actual trajectory of the vehicle will change, and the road will have a certain inclination, so the reward function is equal to the cumulative product of the discount factor and the change in speed.

进一步地，步骤(5)所述的实现对自动驾驶车辆极端行驶路况下路径轨迹的跟踪控制实现过程如下：Furthermore, the process of implementing the tracking control of the path trajectory of the autonomous driving vehicle under extreme driving conditions in step (5) is as follows:

极端行驶路况下，即在恶劣天气及其他影响路况的因素下，考虑车辆容易产生湿滑和振动现象，会导致行驶轨迹发生变化，而影响车辆的实际运行速度，使实际车速与规划的车速产生偏差，因此，实际车速的变化设置如下：Under extreme driving conditions, i.e., bad weather and other factors that affect road conditions, the vehicle is prone to slipperiness and vibration, which will cause the driving trajectory to change, affecting the actual running speed of the vehicle, causing the actual speed to deviate from the planned speed. Therefore, the change in the actual speed is set as follows:

式中，v_ref是参考车速；v_act是实际车速；α是影响因子；Where, v _ref is the reference speed; v _act is the actual speed; α is the influencing factor;

当实际车速与参考车速误差在2KM/H之内，可以假设实际的车速等于参考车速；当实际车速与参考车速误差在[2 5]KM/H区间时，实际车速的车速等于参考车速以及二者车速的差；其中α是速度变化因子；当实际车速与参考车速误差大于5KM/H时，此时车辆需要瞬时制动，保证车辆的行驶安全；When the error between the actual speed and the reference speed is within 2 km/h, it can be assumed that the actual speed is equal to the reference speed; when the error between the actual speed and the reference speed is in the range of [2 5] km/h, the actual speed is equal to the reference speed and the difference between the two speeds; where α is the speed change factor; when the error between the actual speed and the reference speed is greater than 5 km/h, the vehicle needs to brake instantly to ensure the driving safety of the vehicle;

在t时刻，车辆的动作和状态集合表述如下：{v₁,…,v_i…v_n},i＝1,…,n，采用贝塞尔曲线实现路径规划，以产生无碰撞的预测轨迹；为判断车速规划的准确性，从DDPG算法的经验缓存器中取出小样本的车辆状态和动作作为参考值为：{v_r1,…,v_ri…v_rn},i＝1,…,n，并计算二者的误差为：At time t, the set of vehicle actions and states is expressed as follows: {v ₁ ,…,v _i …v _n }, i = 1,…,n. Bezier curves are used to implement path planning to generate collision-free predicted trajectories. To determine the accuracy of vehicle speed planning, a small sample of vehicle states and actions is taken from the experience buffer of the DDPG algorithm as reference values: {v _r1 ,…,v _ri …v _rn }, i = 1,…,n, and the error between the two is calculated as:

式中，v_i是车速；L是车速误差率；v_ri是从经验缓存器中获取的参考车速；Where, _vi is the vehicle speed; L is the vehicle speed error rate; _vri is the reference vehicle speed obtained from the experience buffer;

在极端行驶工况下，给定期望的状态参考轨迹为，S_r，输出误差为e_k(t)＝S_r(t)-S(t)，学习律：u_k+1(t)＝L(u_k(t),e_k(t))，得到补偿控制动作a_k；而在正常行驶下，采用DDPG算法实现车辆控制，其输出动作是a_π＝μ(h_i/β,γ^π)+η，则总的动作控制a＝a_k+a_π，参考公式如下：Under extreme driving conditions, given the desired state reference trajectory, S _r , the output error is e _k (t) = S _r (t) - S (t), the learning law: uk ₊₁ (t) = L ( _uk (t), e _k (t)), and the compensation control action a _k is obtained; under normal driving, the DDPG algorithm is used to realize vehicle control, and its output action is a _π = μ ( _hi / β, γ ^π ) + η, then the total action control a = a _k + a _π , the reference formula is as follows:

为判断路径规划的准确性，计算车辆的实际运行车辆的实际运行轨迹与参考轨迹产生侧倾角度变化时的轨迹误差如下：In order to determine the accuracy of path planning, the actual running trajectory of the vehicle is calculated. The trajectory error when the roll angle changes between the actual running trajectory of the vehicle and the reference trajectory is as follows:

式中，r_act是实际振动后的车辆轨迹；r_ref是参考轨迹；

是实际的轨迹与参考轨迹最大的角度差，μ是路面附着系数；σ是车辆产生横向侧滑时的偏离角度，Φ是车辆侧倾角度，d垂向振动距离，χ是桥梁垂向振动时的偏离角度，即，桥面产生的垂向倾角；Where, r _act is the actual vehicle trajectory after vibration; r _ref is the reference trajectory;

is the maximum angle difference between the actual trajectory and the reference trajectory, μ is the road adhesion coefficient; σ is the deviation angle when the vehicle produces lateral sideslip, Φ is the vehicle roll angle, d is the vertical vibration distance, and χ is the deviation angle when the bridge vibrates vertically, that is, the vertical inclination angle of the bridge deck;

桥梁振动产生的垂向倾角χ可以最大化设置为：The vertical inclination angle χ generated by bridge vibration can be maximized as:

根据视觉传感器所设计的高精度地图，寻找最优的道路行驶区域，并在最优的道路行驶区域内，设计参考的车辆状态、控制输入和参数输出值，同时设计多类状态的约束条件，并采用迭代学习控制算法实现车辆的侧倾控制；According to the high-precision map designed by the visual sensor, the optimal road driving area is found, and in the optimal road driving area, the reference vehicle state, control input and parameter output values are designed. At the same time, the constraints of multiple states are designed, and the iterative learning control algorithm is used to realize the vehicle's roll control;

视觉传感器所设计的高精度地图和LSTM的预测状态，寻找极限的道路行驶区域，并且设计分层的不确定性状态参数和动作参数的范围，当车辆行驶在极限的道路行驶区域时，采用DDPG和迭代控制算法实现车辆的侧倾控制，迭代控制算法起到一个补偿作用，并根据车辆状态的约束范围，构建奖励函数为：The high-precision map designed by the visual sensor and the predicted state of LSTM find the extreme road driving area and design the range of layered uncertainty state parameters and action parameters. When the vehicle is driving in the extreme road driving area, DDPG and iterative control algorithm are used to realize the roll control of the vehicle. The iterative control algorithm plays a compensation role and constructs the reward function according to the constraint range of the vehicle state:

R＝v·(R₁+R₂+…R₆)R＝v·(R ₁ +R ₂ +…R ₆ )

其中，R₁和R₂各代表侧向距离误差及其变化率；R₃和R₄代表了横向角速度及其变化率；R₅和R₆代表了侧倾角度及其变化率；x是角度值；v是车速；e_y是侧向距离，k_i，k_j分别是奖励因子。Among them, _R1 and _R2 represent the lateral distance error and its rate of change respectively; _R3 and _R4 represent the lateral angular velocity and its rate of change; _R5 and _R6 represent the roll angle and its rate of change; x is the angle value; v is the vehicle speed; _ey is the lateral distance, and k _i and k _j are reward factors respectively.

有益效果：与现有技术相比，本发明的有益效果：1、本发明通过基于强化学习算法(DDPG)设计了一种自动驾驶车辆侧倾的综合控制方法，通过强化学习对复杂道路环境下车辆侧倾实现控制，使得自动驾驶车辆在针对复杂路况和极端天气下，通过探索和利用方法，实现车辆的探索性行驶；2、针对极端行驶条件进行迭代学习控制对DDPG算法的补偿作用，实现车辆的综合控制效果，确保车辆最终安全行驶。Beneficial effects: Compared with the prior art, the present invention has the following beneficial effects: 1. The present invention designs a comprehensive control method for the roll of an autonomous driving vehicle based on a reinforcement learning algorithm (DDPG), and controls the roll of the vehicle in a complex road environment through reinforcement learning, so that the autonomous driving vehicle can achieve exploratory driving of the vehicle in complex road conditions and extreme weather through exploration and utilization methods; 2. Iterate learning control is performed for extreme driving conditions to compensate for the DDPG algorithm, thereby achieving a comprehensive control effect of the vehicle and ensuring the ultimate safe driving of the vehicle.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为DDPG网络架构示意图；Figure 1 is a schematic diagram of the DDPG network architecture;

图2车辆侧倾的综合控制原理图；Fig. 2 is a schematic diagram of the integrated control of vehicle roll;

图3基于自动驾驶车辆侧倾的综合控制流程图。FIG3 is a flow chart of comprehensive control based on the roll of an autonomous driving vehicle.

具体实施方式DETAILED DESCRIPTION

下面结合附图对本发明作进一步详细说明。The present invention will be further described in detail below in conjunction with the accompanying drawings.

本发明提出一种基于DDPG和迭代控制的自动驾驶车辆侧倾控制方法，具体包括以下步骤：The present invention proposes a roll control method for an autonomous driving vehicle based on DDPG and iterative control, which specifically includes the following steps:

步骤1：在自动驾驶车辆上安装激光雷达、视觉、毫米波雷达、超声波雷达传感器、定位系统和惯性导航系统。Step 1: Install lidar, vision, millimeter-wave radar, ultrasonic radar sensors, positioning system and inertial navigation system on the autonomous vehicle.

本发明是面向跨海大桥路况环境，目标是控制车辆以中低速(5-80KM/H)安全行驶，并能通过瞬时侧倾车辆控制行为，实现车辆的高水平智能化。为实现上述目标，本发明在自动驾驶车辆上安装多个几个激光雷达、机器视觉、毫米波雷达和超声波雷达传感器，并且安装定位系统和惯性导航系统(IMU)等设备。其中，激光雷达传感器用来探测道路上的动静态障碍物，包括行人、摩托车和各种车辆等，以及可行使道路区域；机器视觉传感器用来感知车道线、行人和车辆检测工作，以及进行定位和同步地图创建工作；毫米波雷达传感器用来探测车辆与行人和行驶的车辆间距；超声波雷达用来探测近距离车辆间距；定位系统和惯性导航系统(IMU)用来实现车辆定位技术。The present invention is aimed at the road conditions of cross-sea bridges, and its goal is to control the vehicle to travel safely at medium and low speeds (5-80KM/H), and to achieve a high level of intelligence of the vehicle through instantaneous tilt vehicle control behavior. To achieve the above goals, the present invention installs multiple laser radars, machine vision, millimeter wave radars and ultrasonic radar sensors on the autonomous driving vehicle, and installs positioning systems and inertial navigation systems (IMU) and other equipment. Among them, the laser radar sensor is used to detect dynamic and static obstacles on the road, including pedestrians, motorcycles and various vehicles, as well as drivable road areas; the machine vision sensor is used to sense lane lines, pedestrian and vehicle detection, as well as positioning and synchronous map creation; the millimeter wave radar sensor is used to detect the distance between the vehicle and pedestrians and moving vehicles; the ultrasonic radar is used to detect the distance between close-range vehicles; the positioning system and the inertial navigation system (IMU) are used to implement vehicle positioning technology.

步骤2：使用视觉传感器、定位系统和惯性导航系统实现车辆分别在不同场景下的位置和地图，以生成不同场景下的自动驾驶车辆行驶地图，实现车辆行驶轨迹所需要的环境。Step 2: Use visual sensors, positioning systems, and inertial navigation systems to realize the vehicle’s position and map in different scenarios, so as to generate driving maps for autonomous vehicles in different scenarios and realize the environment required for the vehicle’s driving trajectory.

使用视觉传感器和定位系统和惯性导航系统(IMU)实现车辆分别在晴天、雨、雪、雾和强风等恶劣天气下的位置和地图，以生成雨雪天气下的跨海大桥路况、强风恶劣天气下的跨海大桥路况、晴天时桥梁振动时的路况、频繁多变天气下单车的行驶路况、频繁多变天气下多车的行驶路况五种场景下的自动驾驶车辆行驶地图，用来实现车辆行驶轨迹所需要的环境。Use visual sensors, positioning systems and inertial navigation systems (IMU) to realize the position and map of the vehicle in severe weather conditions such as sunny days, rain, snow, fog and strong winds, so as to generate driving maps for autonomous driving vehicles in five scenarios: road conditions of cross-sea bridges in rainy and snowy weather, road conditions of cross-sea bridges in severe weather with strong winds, road conditions when the bridge vibrates on sunny days, driving conditions of a single vehicle in frequent and changeable weather, and driving conditions of multiple vehicles in frequent and changeable weather, to realize the environment required for the vehicle's driving trajectory.

步骤3：分别控制方向盘、油门和踏板，在跨海大桥上行驶，获取雨雪天、强风恶劣天气、晴天下相应的行驶轨迹，构建数据集。Step 3: Control the steering wheel, accelerator and pedals respectively, drive on the cross-sea bridge, obtain the corresponding driving trajectories in rainy and snowy days, strong wind and bad weather, and sunny days, and build a data set.

通过多位经验丰富的司机分别在五种场景下，通过控制方向盘、油门和踏板，在跨海大桥上行驶，并记录相应的行驶轨迹，以构建相应的数据集。数据集包括：车速、行驶轨迹、车辆位置、航向角、滑移角、横摆角速度、侧倾角，为训练数据集和评价车辆的控制性提供必要的参考数据。A number of experienced drivers drove on the cross-sea bridge in five scenarios by controlling the steering wheel, accelerator and pedals, and recorded the corresponding driving trajectories to build the corresponding data set. The data set includes: vehicle speed, driving trajectory, vehicle position, heading angle, slip angle, yaw rate, and roll angle, providing necessary reference data for training data sets and evaluating vehicle controllability.

步骤4：在不同场景下的自动驾驶车辆行驶地图训练DDPG算法，用于恶劣天气下跨海大桥在不同复杂路况等级下的行驶状态；自动驾驶车辆通过与不同场景下的地图环境交互，产生实时的车辆状态，确定车辆的动作行为；在进行动作训练时，对动作空间进行初始化，演员网络中的online策略网络产生状态空间信息，进行动作输出，并增加一个动作噪声来获取具有探索性的动作空间。Step 4: The DDPG algorithm is trained on the driving map of the autonomous vehicle in different scenarios, which is used for the driving status of the cross-sea bridge under different levels of complex road conditions in bad weather; the autonomous vehicle generates real-time vehicle status and determines the vehicle's action behavior by interacting with the map environment in different scenarios; when performing action training, the action space is initialized, the online policy network in the actor network generates state space information, performs action output, and adds an action noise to obtain an exploratory action space.

如图1所示，构建演员网络，以车辆状态和环境状态作为输入，输出是转向角、油门和制动组成的矢量，分别对应演员策略网络输出层的3个神经元，设定油门和制动的激活函数是Sigmoid，转向动作值的激活函数是Tanh，隐藏层的结构为：第一层是卷积大小是7*7，滤波器大小是48，步长是4，共200个神经元；第二层是卷积大小是5*5，滤波器大小是16，步长是2，激活函数是ReLu函数，共400个神经元；第三层增加了LSTM层100个神经元；第四层是128个单元的全连接层；第五层是全连接层，共128个单元；评论家网络输入为状态和动作空间，经过两层隐藏层，第一层200个神经元，第二层400个神经元，与激活函数ReLu拼接，最终得出Q值；定义h_i∈(S_t-T,S_t-T+1,…,S_t)，其中，S_t-T和S_t分别表示当前时刻和当前时刻的状态信息，则编码后的状态是：s＝f(h_i；β)，则变化后的演员网络的策略定义为：a＝μ(h_i/β,γ^π)+η。As shown in Figure 1, the actor network is constructed, with the vehicle state and environment state as input, and the output is a vector composed of steering angle, throttle and brake, which correspond to the three neurons of the output layer of the actor strategy network respectively. The activation function of the throttle and brake is set to Sigmoid, and the activation function of the steering action value is Tanh. The structure of the hidden layer is: the first layer is a convolution size of 7*7, a filter size of 48, a step size of 4, and a total of 200 neurons; the second layer is a convolution size of 5*5, a filter size of 16, a step size of 2, and an activation function of ReLu function, a total of 400 neurons; the third layer adds 100 neurons to the LSTM layer; the fourth layer is a fully connected layer of 128 units; the fifth layer is a fully connected layer, a total of 128 units; the critic network input is the state and action space, after two hidden layers, 200 neurons in the first layer and 400 neurons in the second layer, spliced with the activation function ReLu, and finally the Q value is obtained; define h _i ∈(S _tT ,S _t-T+1 ,…,S _t ), where S _tT and S _t represent the state information at the current moment and the current moment respectively, then the encoded state is: s = f( _hi ; β), then the strategy of the actor network after the change is defined as: a = μ( _hi /β, ^γπ ) + η.

设计车辆的动作空间，动作空间包括车辆的方向盘转角δ，制动信号

和油门信号

考虑到车辆行驶环境较为复杂，当道路环境较为复杂时，车辆是变速行驶，制动信号是当车辆在极端行驶条件下产生的动作，以防止车辆由于制动和路面湿滑，引起车辆的侧倾和侧翻运动，此时的动作空间应该设置为方向盘转角、制动信号和油门信号三类；当车辆在正常行驶工况下，假定车辆为匀速行驶，为防止车辆由于路面湿滑而产生侧倾现象，此时设置动作空间为方向盘转角和油门信号，并且根据这两种不同的行驶工况设置三种动作的约束范围，以确保车辆在可行驶的道路区域内可控行驶。Design the vehicle's action space, which includes the vehicle's steering wheel angle δ, brake signal

and throttle signal

Taking into account the complex driving environment of the vehicle, when the road environment is complex, the vehicle is driving at variable speeds, and the brake signal is the action generated when the vehicle is under extreme driving conditions to prevent the vehicle from rolling and tipping over due to braking and slippery road conditions. At this time, the action space should be set to three categories: steering wheel angle, brake signal and throttle signal; when the vehicle is under normal driving conditions, it is assumed that the vehicle is driving at a constant speed. In order to prevent the vehicle from tipping over due to slippery road conditions, the action space is set to steering wheel angle and throttle signal, and the constraint ranges of the three actions are set according to these two different driving conditions to ensure that the vehicle can be controlled within the drivable road area.

车辆通过与环境进行探索和利用的方式实现状态数据的获取，产生的数据状态通常包括横向距离及其变化率和车辆侧倾角及其变化率，这些状态数据通常都会包含在经验缓存器中。当采用迭代学习实现车辆控制时，需要设计参考车辆的状态、车辆轨迹等参数，这些参考状态和轨迹都可以从经验缓存器中获取，并且参考轨迹会随着不同的道路环境的复杂度而进行调整变化。The vehicle acquires state data by exploring and utilizing the environment. The data states generated usually include the lateral distance and its rate of change and the vehicle roll angle and its rate of change. These state data are usually included in the experience buffer. When iterative learning is used to achieve vehicle control, it is necessary to design reference vehicle states, vehicle trajectories and other parameters. These reference states and trajectories can be obtained from the experience buffer, and the reference trajectory will be adjusted and changed with the complexity of different road environments.

在雨雪天气下的跨海大桥路况、强风恶劣天气下的跨海大桥路况、晴天时桥梁振动时的路况、频繁多变天气下单车的行驶路况、频繁多变天气下多车的行驶路五种不同场景下，生成的不同轨迹作为参考路径，而车辆实际规划的路径与参考轨迹进行误差对比，并且参考的轨迹也需要增加各种满足车辆动力学特性的约束条件，进行修改和调整作为设定的实际路径，可以表述为：In five different scenarios, namely, the road conditions of the cross-sea bridge in rainy and snowy weather, the road conditions of the cross-sea bridge in strong wind and bad weather, the road conditions when the bridge vibrates on sunny days, the road conditions of a single vehicle in frequent and changeable weather, and the road conditions of multiple vehicles in frequent and changeable weather, different trajectories are generated as reference paths, and the error between the actual planned path of the vehicle and the reference trajectory is compared. In addition, the reference trajectory also needs to add various constraints that meet the vehicle dynamics characteristics, and be modified and adjusted as the actual path set, which can be expressed as:

式中，σ是路径影响因子，p_ref是参考轨迹；p_act实际轨迹；即当车辆在不同道路环境行驶，所获取的行驶轨迹，需要进行适当的修改和调整，以符合车辆在自动行驶时的车辆动力学特性，此时才能作为自动驾驶车辆的行驶轨迹。In the formula, σ is the path influence factor, p _ref is the reference trajectory, and p _{act is} the actual trajectory. That is, when the vehicle is driving in different road environments, the obtained driving trajectory needs to be appropriately modified and adjusted to conform to the vehicle dynamics characteristics of the vehicle during automatic driving. Only then can it be used as the driving trajectory of the autonomous driving vehicle.

步骤5：如图2所示，基于LSTM历史记忆和道路规划属性，生成自动驾驶车辆状态预测的路径，采用DDPG算法实现自动驾驶车辆正常行驶路况下和极端行驶路况下路径轨迹的跟踪控制，并采用迭代控制方法实现自动驾驶车辆补偿控制。Step 5: As shown in Figure 2, based on the LSTM historical memory and road planning attributes, a path for predicting the state of the autonomous driving vehicle is generated, and the DDPG algorithm is used to achieve tracking control of the path trajectory of the autonomous driving vehicle under normal driving conditions and extreme driving conditions, and an iterative control method is used to achieve compensation control of the autonomous driving vehicle.

在正常行驶路况下，即五种场景中的晴天时桥梁振动时的路况，为保证车辆的安全性和稳定性，在车辆动力学建模时，需要考虑车辆的侧倾、侧滑和横摆动力学特性，并设置车辆状态约束条件，确定横向稳定性范围、最大转向角度范围和防止侧倾的可允许车辆控制的范围，以减少车辆的侧偏移误差：Under normal driving conditions, that is, the road conditions when the bridge vibrates on a sunny day in the five scenarios, in order to ensure the safety and stability of the vehicle, it is necessary to consider the vehicle's roll, sideslip and yaw dynamic characteristics when modeling vehicle dynamics, and set vehicle state constraints to determine the lateral stability range, maximum steering angle range and allowable vehicle control range to prevent roll, so as to reduce the vehicle's lateral offset error:

式中，Ψ＝[v_y ω_z ω_x φ]^T是状态向量，u₁是控制输入，u₂是辅助控制输入；ω_z是横摆角速度；ω_x是车辆侧倾角；u_x是转向角；e_r是横向跟踪偏移误差。Where Ψ＝ _[vyωzωxφ _] ^T is the state vector, _u1 is the control input, _u2 is the _auxiliary control input; _ωz is the yaw rate; _ωx is the vehicle roll angle; _ux is the steering angle; and _er is the lateral tracking offset error.

式中，w₁,w₂,w₃,w₄分别是参数变量；μ_r是道路附着系数Where w ₁ ,w ₂ ,w ₃ ,w ₄ are parameter variables; μ _r is the road adhesion coefficient

如图3所示，采用迭代控制算法实现车辆侧倾的补偿控制，并设置参考车辆状态、参考控制输入和参考输出值，以确保在多约束条件下车辆物理约束和道路约束条件下的跟踪函数，增加对车辆行驶时的抗干扰性和降低模型误差率。首先采用DDPG算法进行网络模型的训练工作，通过车辆与动态跨海大桥路况进行交互训练，确保训练完成任务，如果任务完成，就保存已训练好的动作，如果训练任务完成效果不理想，就采用迭代控制算法补偿输出动作空间的参数，最终完成训练任务，实现较好的动作，最终实现较好的自动驾驶车辆侧倾控制。As shown in Figure 3, an iterative control algorithm is used to achieve compensation control of vehicle roll, and reference vehicle state, reference control input and reference output values are set to ensure the tracking function under vehicle physical constraints and road constraints under multiple constraints, increase anti-interference when the vehicle is driving and reduce the model error rate. First, the DDPG algorithm is used to train the network model, and interactive training is performed between the vehicle and the dynamic cross-sea bridge road conditions to ensure that the training completes the task. If the task is completed, the trained action is saved. If the training task is not completed well, the iterative control algorithm is used to compensate the parameters of the output action space, and finally the training task is completed, and better actions are achieved, and finally better roll control of the autonomous driving vehicle is achieved.

在极端行驶工况下，即五种场景下的雨雪天气下的跨海大桥路况、强风恶劣天气下的跨海大桥路况、频繁多变天气下单车的行驶路况、频繁多变天气下多车的行驶路；因恶劣天气影响，道路环境具有不确定性，干扰车辆的正常行驶。Under extreme driving conditions, namely, the road conditions of the cross-sea bridge in rainy and snowy weather in five scenarios, the road conditions of the cross-sea bridge in strong wind and severe weather, the driving conditions of a single vehicle in frequent and changeable weather, and the driving conditions of multiple vehicles in frequent and changeable weather; due to the influence of severe weather, the road environment is uncertain, which interferes with the normal driving of vehicles.

极端行驶路况下，车辆容易产生湿滑和振动现象，会导致行驶轨迹发生变化，而影响车辆的实际运行速度，使实际车速与规划的车速产生偏差，因此，实际车速的变化，可以设置如下：Under extreme driving conditions, the vehicle is prone to slippery and vibrating, which will cause the driving trajectory to change, affecting the actual running speed of the vehicle and causing the actual speed to deviate from the planned speed. Therefore, the change in the actual speed can be set as follows:

式中，v_ref是参考车速；v_act是实际车速；α是影响因子。Where, v _ref is the reference vehicle speed; v _act is the actual vehicle speed; α is the influencing factor.

当实际车速与参考车速误差在2KM/H之内，可以假设实际的车速等于参考车速；当实际车速与参考车速误差在[2 5]KM/H区间时，实际车速的车速等于参考车速以及二者车速的差，其中α是速度变化因子；当实际车速与参考车速误差大于5KM/H时，此时车辆需要瞬时制动，保证车辆的行驶安全。When the error between the actual vehicle speed and the reference speed is within 2 km/h, it can be assumed that the actual vehicle speed is equal to the reference speed; when the error between the actual vehicle speed and the reference speed is in the range of [2 5] km/h, the actual vehicle speed is equal to the reference speed and the difference between the two speeds, where α is the speed change factor; when the error between the actual vehicle speed and the reference speed is greater than 5 km/h, the vehicle needs to brake instantly to ensure the vehicle's driving safety.

自动驾驶车辆的路径预测通过LSTM历史记忆状态后，利用道路规划属性，生成预测的速度路径，在t时刻，车辆的动作和状态集合表述如下：{v₁,…,v_i…v_n},i＝1,…,n，采用贝塞尔曲线实现路径规划，以产生无碰撞的预测轨迹；为判断车速规划的准确性，从DDPG算法的经验缓存器中取出小样本的车辆状态和动作作为参考值为：{v_r1,…,v_ri…v_rn},i＝1,…,n，并计算二者的误差为：After the path prediction of the autonomous driving vehicle passes through the LSTM historical memory state, the predicted speed path is generated using the road planning attributes. At time t, the vehicle's action and state set are expressed as follows: {v ₁ ,…,v _i …v _n }, i = 1,…, n. Bezier curves are used to implement path planning to generate collision-free predicted trajectories. To judge the accuracy of the speed planning, a small sample of vehicle states and actions is taken from the experience buffer of the DDPG algorithm as a reference value: {v _r1 ,…,v _ri …v _rn }, i = 1,…, n, and the error between the two is calculated as:

式中，v_i是车速；L是车速误差率；v_ri是从经验缓存器中获取的参考车速。Where, _vi is the vehicle speed; L is the vehicle speed error rate; _vri is the reference vehicle speed obtained from the experience buffer.

在极端条件下，给定期望的状态参考轨迹为，S_r，输出误差为e_k(t)＝S_r(t)-S(t)，学习律：u_k+1(t)＝L(u_k(t),e_k(t))，得到补偿控制动作a_k；而在正常行驶下，采用DDPG算法实现车辆控制，其输出动作是a_π＝μ(h_i/β,γ^π)+η，则总的动作控制a＝a_k+a_π，参考公式如下：Under extreme conditions, given the desired state reference trajectory, S _r , the output error is e _k (t) = S _r (t) - S (t), the learning law: _{uk + 1} (t) = L ( _uk (t), e _k (t)), and the compensation control action a _k is obtained; under normal driving, the DDPG algorithm is used to realize vehicle control, and its output action is a _π = μ ( _hi / β, γ ^π ) + η, then the total action control a = a _k + a _π , the reference formula is as follows:

为判断路径规划的准确性，计算车辆的实际运行车辆的实际运行轨迹与参考轨迹产生侧倾角度变化时的轨迹误差如下：车辆出现侧倾运动现象，主要体现在两种情况，第一种运动是当桥梁产生振动时，引起自动驾驶车辆偏离所规划的路径，此时车辆容易发生侧倾运动，产生侧倾角；因此，车辆的实际运行轨迹与参考轨迹产生侧倾角度变化的轨迹误差可以表述为：In order to judge the accuracy of path planning, the actual operation of the vehicle is calculated. The trajectory error when the roll angle changes between the actual running trajectory of the vehicle and the reference trajectory is as follows: The vehicle rolls, which is mainly reflected in two situations. The first movement is when the bridge vibrates, causing the autonomous driving vehicle to deviate from the planned path. At this time, the vehicle is prone to roll motion and generates a roll angle; therefore, the trajectory error when the roll angle changes between the actual running trajectory of the vehicle and the reference trajectory can be expressed as:

式中，r_act是实际振动后的车辆轨迹；r_ref是参考轨迹；

是实际的轨迹与参考轨迹最大的角度差，σ是车辆产生横向侧滑时的偏离角度，Φ是车辆侧倾角度，d垂向振动距离，χ是桥梁垂向振动时的偏离角度，即，桥面产生的垂向倾角。Where, r _act is the actual vehicle trajectory after vibration; r _ref is the reference trajectory;

is the maximum angle difference between the actual trajectory and the reference trajectory, σ is the deviation angle when the vehicle produces lateral sideslip, Φ is the vehicle roll angle, d is the vertical vibration distance, and χ is the deviation angle when the bridge vibrates vertically, that is, the vertical inclination angle of the bridge deck.

车辆发生侧倾运动现象，第二种运动是由于恶劣天气引起路面产生湿滑现象，导致道路附着系数产生变化，引起车辆产生侧倾、侧滑和侧翻运动；因此，车辆的实际运行轨迹与参考轨迹产生侧倾角变化的轨迹误差，可以表述为：The vehicle rolls. The second type of movement is caused by the slippery road surface due to bad weather, which causes the road adhesion coefficient to change, causing the vehicle to roll, slide and roll over. Therefore, the trajectory error of the vehicle's actual running trajectory and the reference trajectory that produces a roll angle change can be expressed as:

式中，r_act是实际振动后的车辆轨迹；r_ref是参考轨迹；

是实际的轨迹与参考轨迹最大的角度差，μ是路面附着系数；σ是车辆产生横向侧滑时的偏离角度，Φ是车辆侧倾角度，d垂向振动距离，χ是桥梁垂向振动时的偏离角度，即，桥面产生的垂向倾角，μ是道路附着系数，其范围是[01]。Where, r _act is the actual vehicle trajectory after vibration; r _ref is the reference trajectory;

is the maximum angle difference between the actual trajectory and the reference trajectory, μ is the road adhesion coefficient; σ is the deviation angle when the vehicle produces lateral sideslip, Φ is the vehicle roll angle, d is the vertical vibration distance, χ is the deviation angle when the bridge vibrates vertically, that is, the vertical inclination angle generated by the bridge deck, μ is the road adhesion coefficient, and its range is [01].

根据视觉传感器所设计的高精度地图，寻找最优的道路行驶区域，并在最优的道路行驶区域内，设计参考的车辆状态、控制输入和参数输出值，同时设计多类状态的约束条件，并采用迭代学习控制算法实现车辆的侧倾控制，此时DDPG算法起到一个补偿作用，实现车辆在可控道路区域的安全行驶。根据视觉传感器所设计的高精度地图和LSTM的预测状态，寻找极限的道路行驶区域，并且设计分层的不确定性状态参数和动作参数的范围，当车辆行驶在极限的道路行驶区域时，采用DDPG和迭代控制算法实现车辆的侧倾控制，迭代控制算法起到一个补偿作用；当车辆在极限工况下，需要增加车辆状态的约束条件，构建奖励函数为：According to the high-precision map designed by the visual sensor, the optimal road driving area is found, and in the optimal road driving area, the reference vehicle state, control input and parameter output values are designed. At the same time, the constraints of multiple types of states are designed, and the iterative learning control algorithm is used to realize the roll control of the vehicle. At this time, the DDPG algorithm plays a compensatory role to realize the safe driving of the vehicle in the controllable road area. According to the high-precision map designed by the visual sensor and the predicted state of LSTM, the extreme road driving area is found, and the range of layered uncertainty state parameters and action parameters is designed. When the vehicle is driving in the extreme road driving area, DDPG and iterative control algorithm are used to realize the roll control of the vehicle. The iterative control algorithm plays a compensatory role; when the vehicle is under extreme working conditions, it is necessary to increase the constraints of the vehicle state, and construct the reward function as follows:

R＝v·(R₁+R₂+…R₆)R＝v·(R ₁ +R ₂ +…R ₆ )

其中，R₁和R₂各代表侧向距离误差及其变化率；R₃和R₄代表了横向角速度及其变化率；x是角度值；v是车速；R₅和R₆代表了侧倾角度及其变化率；e_y是侧向距离，k_i，k_j分别是奖励因子。Among them, _R1 and _R2 represent the lateral distance error and its rate of change respectively; _R3 and _R4 represent the lateral angular velocity and its rate of change; x is the angle value; v is the vehicle speed; _R5 and _R6 represent the roll angle and its rate of change; _ey is the lateral distance, and k _i and k _j are reward factors respectively.

以上说明仅仅是针对本发明的可行性实施方式的具体说明，它们并非用以限制本发明的保护范围，凡未脱离本发明技术所创的等效方式或变更均应包含在本发明的保护范围之内。The above descriptions are only specific descriptions of feasible implementation methods of the present invention. They are not intended to limit the protection scope of the present invention. All equivalent methods or changes that do not deviate from the technical creation of the present invention should be included in the protection scope of the present invention.

Claims

1. An automatic driving vehicle roll control method based on DDPG and iterative control, which is characterized by comprising the following steps:

(1) Installing a laser radar, a vision, a millimeter wave radar, an ultrasonic radar sensor, a positioning system and an inertial navigation system on an autopilot vehicle;

(2) The method comprises the steps of using a visual sensor, a positioning system and an inertial navigation system to realize positions and maps of vehicles in different scenes respectively so as to generate automatic driving vehicle running maps in different scenes and realize the environment required by vehicle running tracks;

(3) Respectively controlling a steering wheel, an accelerator and a pedal, driving on a cross-sea bridge, acquiring corresponding driving tracks in rainy and snowy days, strong wind and bad weather and sunny days, and constructing a data set;

(4) The DDPG algorithm is trained on the running map of the automatic driving vehicle under different scenes and is used for running states of the cross-sea bridge under different complex road condition grades in severe weather; the automatic driving vehicle generates real-time vehicle states through interaction with map environments in different scenes, and determines action behaviors of the vehicle; initializing an action space when action training is carried out, generating state space information by an online strategy network in an actor network, carrying out action output, and adding an action noise to obtain the action space with exploratory property;

(5) Based on LSTM historic memory and road planning attributes, generating a predicted path of the state of the automatic driving vehicle, adopting a DDPG algorithm to realize tracking control of the path track of the automatic driving vehicle under normal driving road conditions and extreme driving road conditions, and adopting an iterative control method to realize compensation control of the automatic driving vehicle;

the network design of the DDPG algorithm in the step (4) is as follows:

an actor network is constructed, the vehicle state and the environment state are taken as input, the output is a vector formed by steering angle, accelerator and brake signals, the vector corresponds to 3 neurons of an actor strategy network output layer respectively, the activation function of the accelerator and the brake is set to be Sigmoid, the activation function of the steering action value is Tanh, and the hidden layer has the structure that: the first layer is a convolution size 7*7, a filter size 48, a step size 4, and a total of 200 neurons; the second layer is a convolution size 5*5, a filter size 16, a step size 2, an activation function ReLu function, 400 neurons total; the third layer adds 100 neurons to the LSTM layer; the fourth layer is a 128-unit fully connected layer; the fifth layer is a fully connected layer, totaling 128 units; the input of the critic network is a state and action space, and the state and action space are spliced with an activation function ReLu through two hidden layers, namely 200 neurons in the first layer and 400 neurons in the second layer, so that a Q value is finally obtained; definition of h _i ∈(S _t-T ,S _t-T+1 ,…,S _t ) Wherein S is _t-T And S is _t State information respectively representing the current time and the current time, the encoded state is: s=f (h _i The method comprises the steps of carrying out a first treatment on the surface of the β), then the policy of the changed actor network is defined as action a=μ (h _i /β,γ ^π )+η；

The implementation process of the tracking control of the path track of the automatic driving vehicle under the normal driving road condition in the step (5) is as follows:

the method comprises the steps of establishing a vehicle dynamics model by considering roll, sideslip and yaw dynamics characteristics of a vehicle under normal running road conditions, namely road conditions when a bridge vibrates on a sunny day, setting vehicle state constraint conditions, and determining a lateral stability range, a maximum steering angle range and a range of allowable vehicle control for preventing roll so as to reduce a lateral deviation error of the vehicle:

ω _z-min ≤ω _z ≤ω _z-max ,ω _x-min ≤ω _x ≤ω _x-max ，u _x-min ≤u _x ≤u _x-max ，e _r-x-min ≤e _r ≤e _r-x-max

wherein omega is _z Is yaw rate; omega _x Is the roll angle of the vehicle; u (u) _x Is the steering angle; e, e _r Is a lateral tracking offset error;

according to LSTM predicted road state information, constructing an objective function considering steering angle, tire attachment coefficient, roll angle error and path tracking error, and determining physical constraint of the vehicle determined by transverse tracking offset error under the condition of full consideration of the dynamics constraint condition of maximum allowable error of the vehicle so as to reduce the error of tracking control of the vehicle:

wherein w is ₁ ,w ₂ ,w ₃ ,w ₄ Parameter variables respectively; mu (mu) _r Is the road adhesion coefficient;

adopting an iterative control algorithm to realize compensation control of vehicle roll, and setting a reference vehicle state, a reference control input and a reference output value to ensure a tracking function under the conditions of vehicle physical constraint and road constraint under the condition of multiple constraints, thereby improving the anti-interference performance of the vehicle during running and reducing a model error rate;

according to the running condition of the vehicle, constructing a state space, an action space and a reward function required by a DDPG algorithm; the motion space mainly comprises steering wheel rotation angle, throttle and braking signals, and the state space comprises vehicle transverse tracking error and change rate thereof, vehicle side-tipping angle error and change rate thereof, and yaw rate error and change rate thereof; construction of the bonus function in case of vibration of the cross-sea bridge, the actual track of the vehicle is changed, the road is inclined at a certain angle, and thus the bonus function is equal to the cumulative multiplication of the discount factor and the change of speed.

2. The DDPG and iterative control-based roll control method for an autonomous vehicle of claim 1, wherein the lidar sensor of step (1) is used to detect dynamic and static obstacles on roads, including pedestrians, motorcycles, and various vehicles, and movable road areas; the visual sensor is used for sensing lane lines, pedestrians and vehicles and performing positioning and synchronous map creation; the millimeter wave radar sensor is used for detecting the distance between the vehicle and the pedestrian and the distance between the vehicle and the traveling vehicle; the ultrasonic radar is used for detecting the distance between the short-distance vehicles; the vision sensor, the positioning system and the inertial navigation system are used for realizing the vehicle positioning technology.

3. The DDPG and iterative control-based automatic driving vehicle roll control method of claim 1, wherein the different scenes in the step (2) comprise five scenes including a cross-sea bridge road condition in rainy and snowy weather, a cross-sea bridge road condition in strong wind and severe weather, a road condition when a bridge vibrates on a sunny day, a driving road condition of a single vehicle in frequent variable weather, and a driving road condition of multiple vehicles in frequent variable weather.

4. The DDPG and iterative control-based automatic driving vehicle roll control method of claim 1, wherein the dataset of step (3) comprises vehicle speed, travel trajectory, vehicle position, heading angle, slip angle, yaw rate, roll angle.

5. The DDPG and iterative control-based roll control method for an autonomous vehicle according to claim 1, wherein the implementation of the step (5) for implementing the tracking control of the path trajectory of the autonomous vehicle under the extreme driving road condition is as follows:

under extreme driving road conditions, namely under severe weather and other factors affecting road conditions, the phenomenon that the vehicle easily generates wet skid and vibration is considered, the driving track is changed, the actual running speed of the vehicle is affected, and the actual vehicle speed and the planned vehicle speed are deviated, so that the change of the actual vehicle speed is set as follows:

in the formula, v _ref Is the reference vehicle speed; v _act Is the actual vehicle speed; alpha is an influencing factor;

when the error between the actual vehicle speed and the reference vehicle speed is within 2KM/H, the actual vehicle speed is equal to the reference vehicle speed; when the error between the actual vehicle speed and the reference vehicle speed is in the [2 ] KM/H interval, the vehicle speed of the actual vehicle speed is equal to the reference vehicle speed and the difference between the two vehicle speeds; wherein α is a speed change factor; when the error between the actual vehicle speed and the reference vehicle speed is greater than 5KM/H, the vehicle needs to be braked instantaneously at the moment, so that the running safety of the vehicle is ensured;

at time t, the vehicle's set of actions and states are expressed as follows: { v ₁ ,…,v _i …v _n I=1, …, n, path planning is implemented using bezier curves to produce collision-free predicted trajectories; in order to judge the accuracy of vehicle speed planning, the vehicle state and action of a small sample are taken out from an experience buffer of a DDPG algorithm as reference values: { v _r1 ,…,v _ri …v _rn I=1, …, n, and the error of both is calculated as:

in the formula, v _i Is the vehicle speed; l is the vehicle speed error rate; v _ri The reference vehicle speed is obtained from the experience buffer;

under the extreme driving condition, the expected state reference track is given as S _r The output error is e _k (t)＝S _r (t) -S (t), learning law:u _k+1 (t)＝L(u _k (t),e _k (t)) to obtain the compensation control action a _k The method comprises the steps of carrying out a first treatment on the surface of the Under normal running, the DDPG algorithm is adopted to realize vehicle control, and the output action is a _π ＝μ(h _i /β,γ ^π ) +η, total action control a=a _k +a _π The reference formula is as follows:

in order to judge the accuracy of path planning, the track error when the actual running track and the reference track of the vehicle generate the change of the roll angle is calculated as follows:

wherein r is _act Is the actual vibrating vehicle track; r is (r) _ref Is a reference track;

the angle difference between the actual track and the reference track is the largest, and mu is the road adhesion coefficient; sigma is the deviation angle when the vehicle generates lateral sideslip, phi is the vehicle side-tipping angle, d is the vertical vibration distance, χ is the deviation angle when the bridge vibrates vertically, namely, the vertical dip angle generated by the bridge deck;

vertical inclination χ maximize that bridge vibration produced sets up to:

searching an optimal road running area according to a high-precision map designed by the visual sensor, designing a reference vehicle state, a control input value and a parameter output value in the optimal road running area, designing constraint conditions of multiple types of states, and realizing roll control of the vehicle by adopting an iterative learning control algorithm;

the prediction states of the high-precision map and the LSTM designed by the visual sensor find a limited road driving area, and design a range of layered uncertainty state parameters and action parameters, when a vehicle is driven in the limited road driving area, the DDPG and an iterative control algorithm are adopted to realize the roll control of the vehicle, the iterative control algorithm plays a role in compensation, and a reward function is constructed according to the constraint range of the vehicle state, wherein the reward function is as follows:

R＝v·(R ₁ +R ₂ +…R ₆ )

wherein R is ₁ And R is ₂ Each representing a lateral distance error and a rate of change thereof; r is R ₃ And R is ₄ Representing the transverse angular velocity and its rate of change; r is R ₅ And R is ₆ Representing the roll angle and its rate of change; x is an angle value; v is the vehicle speed; e, e _y Is the lateral distance, k _i ，k _j The bonus factors, respectively.