CN114802248A

CN114802248A - Automatic driving vehicle lane change decision making system and method based on deep reinforcement learning

Info

Publication number: CN114802248A
Application number: CN202210443895.0A
Authority: CN
Inventors: 熊明强; 谯杰; 江萌; 刘铮
Original assignee: Cas Intelligent Network Technology Co ltd; China Automotive Engineering Research Institute Co Ltd
Current assignee: Cas Intelligent Network Technology Co ltd; China Automotive Engineering Research Institute Co Ltd
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2022-07-29

Abstract

The invention relates to the technical field of automatic driving control, and discloses a deep reinforcement learning-based decision-making system and method for changing lanes of an automatic driving vehicle. The lane strategy module uses the collected data information of the target vehicle and the collected operation data of the interfering vehicles near the target vehicle to analyze to obtain the current safe and automatic lane changing strategy of the target vehicle, so as to realize the fast and safe lane change. The invention has the beneficial effects of improving the accuracy of the automatic lane changing strategy of the automatic driving vehicle, ensuring the safety of the vehicle during the lane changing process, and reducing road congestion and the occurrence of collision accidents.

Description

Lane-changing decision-making system and method for autonomous vehicles based on deep reinforcement learning

技术领域technical field

本发明涉及自动驾驶控制技术领域，具体涉及一种基于深度强化学习的自动驾驶车辆换道决策系统及方法。The invention relates to the technical field of automatic driving control, in particular to a lane change decision system and method for an automatic driving vehicle based on deep reinforcement learning.

背景技术Background technique

近年来，世界范围内对自动驾驶格外关注，被认为是缓解交通拥堵，减少交通事故和环境污染的重要技术，目前一些自动驾驶已经进行了大规模的道路测试，比如谷歌自动驾驶和苹果自动驾驶。据研究，在当前的交通事故中，有超过30％的道路事故是不合理的换道行为引起的，因此，针对智能辅助驾驶技术中的变道辅助技术的研究显得尤为重要，而现阶段主流的基于规则算法都面临着数据量不足导致模型无法完全应对自动驾驶车辆在换道过程中的场景无限化问题，造成换道失败或者影响换道过程中的安全性。In recent years, the world has paid special attention to autonomous driving, which is considered to be an important technology to alleviate traffic congestion, reduce traffic accidents and environmental pollution. At present, some autonomous driving has undergone large-scale road tests, such as Google Autopilot and Apple Autopilot. . According to research, in the current traffic accidents, more than 30% of the road accidents are caused by unreasonable lane-changing behavior. Therefore, the research on the lane-changing assistance technology in the intelligent assisted driving technology is particularly important. All rule-based algorithms are faced with insufficient data, so that the model cannot fully cope with the problem of infinite scenarios during the lane changing process of autonomous driving vehicles, resulting in the failure of lane changing or affecting the safety during the lane changing process.

现有技术中，有一种基于分层强化学习的自动驾驶车辆换道决策控制方法，属于自动驾驶控制技术领域。解决了现有自动驾驶过程中存在安全性差/效率低的问题。本发明利用自动驾驶车辆实际驾驶场景中的速度及与周边环境内车辆的相对位置、相对速度信息建立带有3个隐含层的决策神经网络，并利用换道安全奖励函数对所述决策神经网络进行训练拟合Q估值函数，获取Q估值最大的动作；利用自动驾驶车辆的实际驾驶场景中的速度和周边环境车辆的相对位置信息与跟驰或换道动作对应的奖励函数，建立深度Q学习的加速度决策模型，获得换道或跟驰加速度信息，当换道时，采用5次多项式曲线生成一条参考换道轨迹。本发明适用于自动驾驶换道决策及控制。In the prior art, there is a lane change decision control method for an automatic driving vehicle based on hierarchical reinforcement learning, which belongs to the technical field of automatic driving control. It solves the problem of poor safety/low efficiency in the existing automatic driving process. The invention uses the speed in the actual driving scene of the automatic driving vehicle and the relative position and relative speed information of the vehicle in the surrounding environment to establish a decision-making neural network with three hidden layers, and uses the lane-changing safety reward function to adjust the decision-making neural network. The network is trained to fit the Q evaluation function to obtain the action with the largest Q evaluation; using the speed in the actual driving scene of the autonomous vehicle and the relative position information of the surrounding environment vehicle and the reward function corresponding to the car-following or lane-changing action, establish The acceleration decision-making model of deep Q-learning obtains lane-changing or car-following acceleration information. When changing lanes, a 5th-degree polynomial curve is used to generate a reference lane-changing trajectory. The present invention is suitable for automatic driving lane change decision and control.

虽然该方案能够针对换道进行模拟训练，并且针对场景和周边环境进行奖励函数的确定，保证对自动驾驶换道的训练成果，但是其针对具体的换道场景仍然存在数据量不足、换道场景受限的问题，最终导致其换道的准确率和安全性过低。Although this scheme can perform simulated training for lane changing and determine the reward function for the scene and surrounding environment to ensure the training results of automatic driving lane changing, it still has insufficient data and lane changing scenarios for specific lane changing scenarios. The limited problem eventually leads to the low accuracy and safety of its lane changing.

发明内容SUMMARY OF THE INVENTION

本发明意在提供一种基于深度强化学习的自动驾驶车辆换道决策系统及方法，以提高自动驾驶车辆换道策略的准确率，保证换道安全。The present invention aims to provide a lane-changing decision system and method for an autonomous driving vehicle based on deep reinforcement learning, so as to improve the accuracy of the lane-changing strategy of the autonomous driving vehicle and ensure the safety of lane-changing.

为达到上述目的，本发明采用如下技术方案：基于深度强化学习的自动驾驶车辆换道决策系统，包括处理器模块，以及分别与处理器模块连接的数据采集模块、数据分析模块和换道策略模块；In order to achieve the above purpose, the present invention adopts the following technical solutions: a deep reinforcement learning-based automatic driving vehicle lane change decision system, comprising a processor module, and a data acquisition module, a data analysis module and a lane change strategy module respectively connected with the processor module ;

数据采集模块，用于采集目标车辆的数据信息，以及采集目标车辆附近的干扰车辆的运行数据，然后形成第一数据集合并将第一数据集合发送至数据分析模块；a data collection module, used for collecting data information of the target vehicle and collecting operation data of interfering vehicles near the target vehicle, and then forming a first data set and sending the first data set to the data analysis module;

数据分析模块，用于对第一数据集合进行分析处理，并得到自动驾驶车辆的换道场景以及换道数据；a data analysis module, used for analyzing and processing the first data set, and obtaining the lane-changing scene and lane-changing data of the autonomous driving vehicle;

换道策略模块，用于根据得到的换道场景和换道数据生成第一换道策略，并将第一换道策略发送至处理器模块；a lane-changing strategy module, configured to generate a first lane-changing strategy according to the obtained lane-changing scene and lane-changing data, and send the first lane-changing strategy to the processor module;

处理器模块，包括数据存储单元和换道执行单元，所述数据存储单元，用于存储第一换道策略；所述换道执行单元，用于根据所述第一换道策略得到基于规则的换道轨迹执行模型并控制自动驾驶车辆进行车道变更。The processor module includes a data storage unit and a lane change execution unit, the data storage unit is used to store a first lane change strategy; the lane change execution unit is used to obtain a rule-based algorithm according to the first lane change policy. The lane change trajectory executes the model and controls the autonomous vehicle for lane changes.

本方案的原理及优点是：实际应用时，基于规则换道模型的基础上使用深度强化学习方法对换道模型进行训练和尝试，利用Actor-Critic算法对自动驾驶车辆的换道策略进行不断优化，使自动驾驶汽车能够准确应对换道过程中的场景无限化问题，从而提高自动驾驶车辆自动换道策略的准确性，保证车辆在换道过程中的安全性，减少道路拥挤情况和碰撞事故的发生。相比于现有技术，本发明的优点在于建立的深度学习模型能够对自动驾驶技术的换道策略进行更全面准确的测试，指导自动驾驶车辆快速安全地完成车道变更，得到的换道轨迹能够适用于自动驾驶车辆换道的场景，使车辆可以在有限的数据量条件下对新的交通场景做出正确合理的反应，保障驾驶安全。The principle and advantages of this scheme are: in practical application, the deep reinforcement learning method is used to train and try the lane-changing model based on the rule-based lane-changing model, and the Actor-Critic algorithm is used to continuously optimize the lane-changing strategy of the autonomous vehicle. , so that the self-driving car can accurately deal with the problem of infinite scenarios in the lane-changing process, thereby improving the accuracy of the automatic lane-changing strategy of the self-driving vehicle, ensuring the safety of the vehicle during the lane-changing process, and reducing road congestion and collision accidents. occur. Compared with the prior art, the advantage of the present invention lies in that the established deep learning model can conduct a more comprehensive and accurate test on the lane changing strategy of the automatic driving technology, guide the automatic driving vehicle to complete the lane change quickly and safely, and the obtained lane changing trajectory can be It is suitable for scenarios where autonomous vehicles change lanes, so that vehicles can make correct and reasonable responses to new traffic scenarios under the condition of limited data volume to ensure driving safety.

优选的，作为一种改进，第一数据集合包括当前时刻周边车辆信息、当前时刻周边道路信息、下一时刻周边道路信息、下一时刻周边车辆信息和本车车辆信息。Preferably, as an improvement, the first data set includes surrounding vehicle information at the current moment, surrounding road information at the current moment, surrounding road information at the next moment, surrounding vehicle information at the next moment, and vehicle information of the vehicle.

有益效果：通过采集周边的车辆信息和道路信息，能够准确提供换道所要参考的数据，从而进行换道安全判断，避免换道过程中目标车辆与周边干扰车辆发生碰撞安全事故，同时也能够极大程度上提高本换道模型的换道策略的准确性。Beneficial effects: By collecting the surrounding vehicle information and road information, the data to be referenced for lane changing can be accurately provided, so as to make a lane-changing safety judgment, avoid collision safety accidents between the target vehicle and surrounding interfering vehicles during the lane-changing process, and can also extremely To a great extent, the accuracy of the lane-changing strategy of this lane-changing model is improved.

优选的，作为一种改进，对第一数据集合进行分析处理为，利用预设的分析算法对有限的第一数据集合进行无限场景探索分析，并在得到对应的换道场景前，对分析过程进行深度强化学习。Preferably, as an improvement, the first data set is analyzed and processed by using a preset analysis algorithm to perform infinite scene exploration and analysis on the limited first data set, and before obtaining the corresponding lane changing scene, analyze the analysis process. Perform deep reinforcement learning.

有益效果：通过此过程对采集到的数据进行分析处理，能够有效地克服当前数据量不足的缺陷，在有限的数据量上对换道模型和换道策略进行无限场景的分析，能够极大程度提高换道场景的多样性以及换道策略的准确性，为自动驾驶车辆的自动安全换道行为提供可靠保障，从而保证行车安全。Beneficial effect: analyzing and processing the collected data through this process can effectively overcome the defect of insufficient data volume at present, and can analyze the lane changing model and the lane changing strategy in infinite scenarios on the limited data volume, which can greatly Improve the diversity of lane-changing scenarios and the accuracy of lane-changing strategies, and provide reliable guarantees for the automatic and safe lane-changing behavior of autonomous vehicles, thereby ensuring driving safety.

优选的，作为一种改进，预设的分析算法为Actor-Critic算法；所述深度强化学习为，利用马尔可夫决策过程对分析过程进行描述，形成一个六次元组M＝(S，A，P，r，ρ，γ)，其中S为状态空间，所述状态空间为所有状态的集合；A为动作空间，所述动作空间为所有动作的集合；P为状态转移概率；r为状态转移过程的奖励函数；γ为状态转移过程中的折扣系数。Preferably, as an improvement, the preset analysis algorithm is the Actor-Critic algorithm; the deep reinforcement learning is to use the Markov decision process to describe the analysis process to form a six-dimensional tuple M=(S, A, P, r, ρ, γ), where S is the state space, the state space is the set of all states; A is the action space, the action space is the set of all actions; P is the state transition probability; r is the state transition The reward function of the process; γ is the discount coefficient in the state transition process.

有益效果：通过利用Actor-Critic算法强化学习的数据与环境的交互，并且学习不是片面得到单步决策的最优策略，而是追求与环境交互获得的长期累积奖励，使对自动驾驶车辆的换道策略训练结果更准确，保证换道的安全性。Beneficial effect: By using the Actor-Critic algorithm to strengthen the interaction between the learning data and the environment, and learning is not one-sided to obtain the optimal strategy for single-step decision-making, but the pursuit of long-term cumulative rewards obtained by interacting with the environment, so that the replacement of self-driving vehicles can be achieved. The lane strategy training results are more accurate and ensure the safety of lane changing.

优选的，作为一种改进，奖励函数为

式中，v为车辆实时速度，v_min为车辆训练过程中采用的最小速度，v_max为车辆训练过程中采用的最大速度，a为对于换道过程中速度奖励值，b是对车辆发生碰撞的碰撞惩罚值，collision为仿真环境对于车辆发生碰撞的反馈结果。Preferably, as an improvement, the reward function is

In the formula, v is the real-time speed of the vehicle, v _min is the minimum speed used in the vehicle training process, v _max is the maximum speed used in the vehicle training process, a is the speed reward value for the lane-changing process, and b is the collision with the vehicle. The collision penalty value, collision is the feedback result of the simulation environment for the collision of the vehicle.

有益效果：在对换道模型训练过程中，利用奖励函数对训练过程进行累积奖励，对换道模型的参数数据进行修正，从而提高换道模型的准确度。Beneficial effects: In the course of training the lane-changing model, the reward function is used to accumulate rewards in the training process, and the parameter data of the lane-changing model is corrected, thereby improving the accuracy of the lane-changing model.

优选的，作为一种改进，数据分析模块在基于规则换道模型的基础上使用深度强化学习方法对换道模型进行训练和尝试，最后对模型进行验证。Preferably, as an improvement, the data analysis module uses a deep reinforcement learning method to train and try the lane-changing model based on the rule-based lane-changing model, and finally verifies the model.

有益效果：对于多次强化学习后得到的换道模型，为保证精准度，利用深度强化学习的方法进行训练和尝试，从而完成对换道模型的修正，保证换道模型参数的正确性，为自动驾驶车辆提供更精准的换道策略服务。Beneficial effects: For the lane-changing model obtained after multiple reinforcement learning, in order to ensure the accuracy, the method of deep reinforcement learning is used to train and try, so as to complete the correction of the lane-changing model and ensure the correctness of the parameters of the lane-changing model. Self-driving vehicles provide more precise lane-changing strategy services.

优选的，作为一种改进，换道策略模块在生成换道策略时利用基于规则的轨迹规划算法来辅助计算，所述基于规则的轨迹规划算法表达式为

其中θ_i为规划步长起点的航向角，

为终点横向坐标，x_n为车辆n的纵向位置，y_n为车辆n的横向位置。Preferably, as an improvement, the lane-changing strategy module uses a rule-based trajectory planning algorithm to assist calculation when generating the lane-changing strategy, and the rule-based trajectory planning algorithm is expressed as

where θ _i is the heading angle of the starting point of the planning step,

is the lateral coordinate of the end point, x _n is the longitudinal position of the vehicle n, and y _n is the lateral position of the vehicle n.

有益效果：通过此方式规划换道轨迹，从而保证自动驾驶车辆在换道过程中的安全性，不与其他车辆发生碰撞，并且能够有效减少车辆的换道时间，提高道路车辆通行效率，有效减缓道路拥挤情况。Beneficial effect: The lane-changing trajectory is planned in this way, so as to ensure the safety of the self-driving vehicle during the lane-changing process, without colliding with other vehicles, and can effectively reduce the lane-changing time of the vehicle, improve the traffic efficiency of road vehicles, and effectively slow down road congestion.

优选的，作为一种改进，利用马尔可夫决策过程对分析过程进行描述时，状态值函数定义如下，

其中a_t，r_t，s_t+1，a_t+1，r_t+1，...～π表示轨迹来自策略π与环境的交互。Preferably, as an improvement, when using Markov decision process to describe the analysis process, the state value function is defined as follows:

where at , r _t , s _t ₊₁ , at _t+1 , r _t+1 , ... ∼ π represent the trajectory from the interaction of policy π with the environment.

有益效果：通过不断更新道路环境数据，从而提供新的换道决策数据，保证新数据的不断更新，从而提高换道模型的实时性和准确性，保障自动驾驶车辆的换道安全。Beneficial effects: By continuously updating road environment data, new lane-changing decision-making data is provided, and new data is continuously updated, thereby improving the real-time and accuracy of the lane-changing model and ensuring the lane-changing safety of autonomous vehicles.

本发明还提供了一种基于深度强化学习的自动驾驶车辆换道决策方法，包括以下步骤：The present invention also provides a lane-changing decision method for an autonomous driving vehicle based on deep reinforcement learning, comprising the following steps:

步骤S1，采集目标车辆的数据信息和目标车辆附近的干扰车辆的运行数据；Step S1, collecting data information of the target vehicle and operation data of interfering vehicles near the target vehicle;

步骤S2，利用Actor-Critic算法对采集到的数据进行分析处理，并结合马尔可夫决策过程来描述强化学习问题，得到自动驾驶车辆的换道场景以及换道数据；Step S2, using the Actor-Critic algorithm to analyze and process the collected data, and combining with the Markov decision process to describe the reinforcement learning problem, to obtain the lane-changing scene and lane-changing data of the autonomous driving vehicle;

步骤S3，利用基于规则的轨迹规划算法计算得到换道轨迹并利用奖励函数对测试过程进行校正，最终得到换道策略；Step S3, using the rule-based trajectory planning algorithm to calculate the lane-changing trajectory and using the reward function to correct the test process, and finally obtain the lane-changing strategy;

步骤S4，利用自动驾驶仿真环境highway_env和基于动力学的仿真软件CarSim对本模型的输出结果进行验证。In step S4, the output result of the model is verified by using the automatic driving simulation environment highway_env and the dynamics-based simulation software CarSim.

有益效果：利用本方法实现自动驾驶车辆的自动换道模型的测试以及换道策略的生成，保证车辆的换道安全，提高换道的效率，减少道路拥挤情况，同时也能够极大程度上提高道路交通安全性。Beneficial effects: the method is used to realize the test of the automatic lane-changing model of the self-driving vehicle and the generation of the lane-changing strategy, so as to ensure the safety of the lane-changing of the vehicle, improve the efficiency of the lane-changing, reduce the road congestion, and at the same time, it can greatly improve the Road traffic safety.

优选的，作为一种改进，对本模型的输出结果进行验证为，利用自动驾驶仿真环境使用高速公路场景对车辆换道策略进行测试，并用平均绝对误差和平均绝对相对误差这两个统计量对模型进行误差统计。Preferably, as an improvement, the output result of this model is verified as: using the automatic driving simulation environment to test the vehicle lane changing strategy using the highway scene, and using the two statistics of mean absolute error and mean absolute relative error to test the model Perform error statistics.

有益效果：通过对模型的输出结果进行验证，能够最大程度保证模型对于变道策略的决策准确性，从而避免换道车辆和周边车辆发生碰撞，提高换道的安全性和效率，另一方面也提高了道路交通的通行效率，减缓了拥堵的情况。Beneficial effects: By verifying the output results of the model, the decision-making accuracy of the model for the lane-changing strategy can be ensured to the greatest extent, thereby avoiding the collision between lane-changing vehicles and surrounding vehicles, and improving the safety and efficiency of lane-changing. Improve the efficiency of road traffic and reduce congestion.

附图说明Description of drawings

图1为本发明基于深度强化学习的自动驾驶车辆换道决策系统实施例一的系统示意图。FIG. 1 is a system schematic diagram of Embodiment 1 of a lane change decision system for an autonomous driving vehicle based on deep reinforcement learning of the present invention.

图2为本发明基于深度强化学习的自动驾驶车辆换道决策系统实施例一的LSTM神经网络示意图。FIG. 2 is a schematic diagram of an LSTM neural network according to Embodiment 1 of the deep reinforcement learning-based lane-changing decision-making system for automatic driving vehicles of the present invention.

图3为本发明基于深度强化学习的自动驾驶车辆换道决策系统实施例一的LSTM神经网络构成示意图。FIG. 3 is a schematic diagram of the structure of an LSTM neural network according to Embodiment 1 of the deep reinforcement learning-based lane-changing decision-making system for automatic driving vehicles of the present invention.

图4为本发明基于深度强化学习的自动驾驶车辆换道决策方法实施例一的流程示意图。FIG. 4 is a schematic flowchart of Embodiment 1 of a method for lane changing decision-making of an autonomous driving vehicle based on deep reinforcement learning of the present invention.

图5为本发明基于深度强化学习的自动驾驶车辆换道决策方法实施例一的收益变化示意图。FIG. 5 is a schematic diagram of a change in revenue of Embodiment 1 of the method for decision-making on lane changing of an autonomous driving vehicle based on deep reinforcement learning of the present invention.

具体实施方式Detailed ways

下面通过具体实施方式进一步详细说明：The following is further described in detail by specific embodiments:

说明书附图中的标记包括：处理器模块1、数据采集模块2、数据分析模块3、换道策略模块4、数据存储单元5、换道执行单元6。The symbols in the drawings in the description include: processor module 1 , data acquisition module 2 , data analysis module 3 , lane change strategy module 4 , data storage unit 5 , and lane change execution unit 6 .

实施例一：Example 1:

本实施例基本如附图1所示：基于深度强化学习的自动驾驶车辆换道决策系统，包括处理器模块1，以及分别与处理器模块1连接的数据采集模块2、数据分析模块3和换道策略模块4；This embodiment is basically as shown in FIG. 1 : a lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning includes a processor module 1 , and a data acquisition module 2 , a data analysis module 3 and a change-over module respectively connected to the processor module 1 . Road strategy module 4;

数据采集模块2，用于采集目标车辆的数据信息，以及采集目标车辆附近的干扰车辆的运行数据，然后形成第一数据集合并将第一数据集合发送至数据分析模块3；The data collection module 2 is used to collect the data information of the target vehicle and the operation data of the interfering vehicles near the target vehicle, and then form a first data set and send the first data set to the data analysis module 3;

数据分析模块3，用于对第一数据集合进行分析处理，并得到自动驾驶车辆的换道场景以及换道数据；The data analysis module 3 is used for analyzing and processing the first data set, and obtaining the lane-changing scene and lane-changing data of the autonomous driving vehicle;

换道策略模块4，用于根据得到的换道场景和换道数据生成第一换道策略，并将第一换道策略发送至处理器模块1；A lane-changing strategy module 4 is used to generate a first lane-changing strategy according to the obtained lane-changing scene and lane-changing data, and send the first lane-changing strategy to the processor module 1;

处理器模块1，包括数据存储单元5和换道执行单元6，数据存储单元5，用于存储第一换道策略；换道执行单元6，用于根据所述第一换道策略得到基于规则的换道轨迹执行模型并控制自动驾驶车辆进行车道变更。The processor module 1 includes a data storage unit 5 and a lane change execution unit 6, the data storage unit 5 is used to store the first lane change strategy; The lane-changing trajectory of the model executes the model and controls the autonomous vehicle to make a lane change.

数据采集模块2采集的数据包括当前时刻周边车辆信息、当前时刻周边道路信息、下一时刻周边道路信息、下一时刻周边车辆信息和本车车辆信息；The data collected by the data collection module 2 includes the information of surrounding vehicles at the current moment, the surrounding road information at the current moment, the surrounding road information at the next moment, the surrounding vehicle information at the next moment and the vehicle information of the vehicle;

如附图2所示，利用Actor-Critic算法对第一数据集合进行处理，使用马尔可夫决策过程来描述强化学习问题，形成一个六次元组M＝(S，A，P，r，ρ，γ)，其中S为状态空间，即所有状态的集合；A为动作空间，即所有动作的集合；P为状态转移概率；r为状态转移过程的奖励函数；γ为状态转移过程中的折扣系数。马尔科夫决策过程M和策略π条件下，状态值函数定义如下，As shown in Figure 2, the Actor-Critic algorithm is used to process the first data set, and the Markov decision process is used to describe the reinforcement learning problem, forming a six-dimensional tuple M=(S, A, P, r, ρ, γ), where S is the state space, that is, the set of all states; A is the action space, that is, the set of all actions; P is the state transition probability; r is the reward function of the state transition process; γ is the discount coefficient in the state transition process . Under the condition of Markov decision process M and policy π, the state value function is defined as follows,

其中，a_t，r_t，s_t+1，a_t+1，r_t+1，...～π表示轨迹来自策略π与环境的交互，上式表示从状态st＝s出发，智能体使用策略π与环境交互得到的期望累积奖励，类似地，也可以定义状态动作值函数：Among them, at _t , r _t , s _t+1 , at _t+1 , r _t+1 , ... ~ π represent the trajectory from the interaction between the policy π and the environment, the above formula indicates that starting from the state st = s, the agent The expected cumulative reward for interacting with the environment using policy π, similarly, the state action value function can also be defined:

表示从状态st出发，执行动作at后，智能体使用策略π与交互得到的期望累积奖励，并且状态值函数和状态动作值函数可以互相转换，当策略π是概率性策略时，Indicates that starting from state st, after performing action at, the agent uses the policy π to interact with the expected cumulative reward, and the state value function and the state action value function can be converted to each other. When the policy π is a probabilistic strategy,

对于任意概率性策略π而言，有以下的贝尔曼期望方程，For any probabilistic policy π, we have the following Bellman expectation equation,

当智能体采取策略π，从状态st执行动作at转移到状态st+1并获得奖励rt后，直接进行如下的动态规划更新，When the agent adopts the strategy π, performs the action at from the state st to the state st+1 and obtains the reward rt, it directly performs the following dynamic programming update,

V_π(S_t)＝V_π(S_t)+α(r_t+γV(S_t+1)-V(S_t))V _π (S _t )=V _π (S _t )+α(r _t +γV(S _t+1 )-V(S _t ))

令r_t+γV(s_t+1)-V(s_t)＝TD-errorLet r _t +γV(s _t+1 )-V(s _t )=TD-error

得到以后神经网络会进行反向传播，Actor基于策略梯度，策略被参数化为神经网络，用θ表示，θ迭代的方向是最大化周期奖励的期望，目标函数表示为：After obtaining, the neural network will perform backpropagation. Actor is based on the policy gradient, and the policy is parameterized as a neural network, which is represented by θ. The direction of θ iteration is to maximize the expectation of the periodic reward. The objective function is expressed as:

其中，τ表示一个采样周期，π_θ(τ)表示序列出现的概率，求J(θ)的梯度可得：Among them, τ represents a sampling period, π _θ (τ) represents the probability of sequence occurrence, and the gradient of J(θ) can be obtained:

则：but:

最后更新神经网络参数：Finally update the neural network parameters:

如附图3所示，LSTM神经网络主要由输入层、隐藏层、输出层神经元构成，其中隐藏层神经元主要有三个门结构以及一个状态构成：遗忘门、输入门、输出门、细胞状态。后期对于换道策略的训练和修正，首先在新数据传入长短期记忆网络时要决定哪些旧数据需要从细胞状态中扔掉，而此部分则是由遗忘门决定的，它是一个sigmoid函数层，As shown in Figure 3, the LSTM neural network is mainly composed of input layer, hidden layer, and output layer neurons. The hidden layer neurons mainly have three gate structures and one state: forget gate, input gate, output gate, and cell state. . For the training and correction of the lane-changing strategy in the later stage, it is first necessary to decide which old data needs to be discarded from the cell state when new data is passed into the long-term and short-term memory network, and this part is determined by the forget gate, which is a sigmoid function. layer,

式中，W_f是遗忘门的权重矩阵，h_t-1是t-1时刻的细胞状态，x_t是环境输入数据，b_f是遗忘门的偏置项。where W _f is the weight matrix of the forget gate, h _t-1 is the cell state at time t-1, x _t is the environmental input data, and b _f is the bias term of the forget gate.

然后再经过一个sigmoid函数层，即输入门会决定哪些值需要被更新，然后一个tanh函数层会创建一个向量，作为加入到细胞状态的候选值：Then it goes through a sigmoid function layer, that is, the input gate will decide which values need to be updated, and then a tanh function layer will create a vector as a candidate value to add to the cell state:

式中，b_i是输入门的偏置项，

是准备用以更新的数据矩阵，W_c是准备用以更新的数据的权重矩阵。where b _i is the bias term of the input gate,

is the data matrix to be updated and W _c is the weight matrix of the data to be updated.

然后再更新上一时刻细胞状态，首先从细胞状态移除掉我们在忘记门决定的信息，再以决定对每一个状态值更新的比例来加入输入门计算出的候选值：Then update the cell state at the previous moment, first remove the information we decided on the forget gate from the cell state, and then add the candidate value calculated by the input gate at the ratio of the update of each state value:

最后决定将要输出的部分，输出是在细胞状态的基础上进行适当的处理，即通过一个sigmoid函数层来决定中有哪些部分需要被更新，然后将经过一个tanh函数处理，并将遗忘门里sigmoid层的输出相乘，从而决定输出：Finally, the part to be output is decided. The output is appropriately processed on the basis of the cell state, that is, a sigmoid function layer is used to determine which parts need to be updated, and then it will be processed by a tanh function, and the sigmoid in the forget gate will be processed. The outputs of the layers are multiplied to determine the output:

O_t＝σ(W_o[h_t-1，x_t]+b_o)O _t =σ(W _o [h _t-1 , x _t ]+b _o )

式中，W_o是输出门的权重矩阵，b_o是输出门的偏置项。In the formula, W _o is the weight matrix of the output gate, and b _o is the bias term of the output gate.

改进前细胞状态为：The cell state before improvement is:

s_t＝tanh(W_g[h_t-1，x_t]+b_g)·σ(W_i[h_t-1，x_t]+b_i)+s_t-1·σ(W_f[h_t-1，x_t]b_f))s _t =tanh(W _g [h _t-1 , x _t ]+b _g )·σ(W _i [h _t-1 , x _t ]+b _i )+s _t-1 ·σ(W _f [h _t-1 , x _t ]b _f ))

输出为：The output is:

h_t＝tanh(s_t)·σ(W_o[h_t-1，x_t]+b_o)h _t =tanh(s _t )·σ(W _o [h _t-1 , x _t ]+b _o )

则第p个输出的值和真实值之间的误差是：Then the error between the value of the pth output and the true value is:

选择形式较为简单的三次多项式作为基于规则的轨迹规划算法，其表达式如下：A cubic polynomial with a simpler form is selected as the rule-based trajectory planning algorithm, and its expression is as follows:

式中，θ_i为规划步长起点的航向角，

为终点横向坐标，xn为车辆n的纵向位置，yn为车辆n的横向位置。In the formula, θ _i is the heading angle of the starting point of the planning step,

is the lateral coordinate of the end point, xn is the longitudinal position of the vehicle n, and yn is the lateral position of the vehicle n.

如附图4所示，本发明还提供了一种应用于上述系统中的基于深度强化学习的自动驾驶车辆换道决策方法，包括以下步骤：As shown in FIG. 4 , the present invention also provides a deep reinforcement learning-based decision-making method for automatic driving vehicle lane change applied in the above system, comprising the following steps:

步骤S4，根据换道策略得到基于规则的换道轨迹执行模型，并利用自动驾驶仿真环境highway_env和基于动力学的仿真软件CarSim对本模型的输出结果进行验证。In step S4, a rule-based lane-changing trajectory execution model is obtained according to the lane-changing strategy, and the output results of the model are verified by using the automatic driving simulation environment highway_env and the dynamics-based simulation software CarSim.

在强化学习过程中，利用自动驾驶仿真环境使用高速公路场景对车辆换道策略进行测试，并用平均绝对误差和平均绝对相对误差这两个统计量对模型进行误差统计，In the reinforcement learning process, the vehicle lane changing strategy is tested using the highway scene in the autonomous driving simulation environment, and the error statistics of the model are carried out with the two statistics of mean absolute error and mean absolute relative error.

式中，N表示测试数据样本数，d_r，i表示第i辆车的名义值，d_s，i表示第i辆车的预测值。In the formula, N represents the number of test data samples, d _{r, i} represents the nominal value of the i-th vehicle, and d _{s, i} represents the predicted value of the i-th vehicle.

如附图5所示，本模型收益随着训练次数增加而变化的趋势，可知训练收益随着训练次数增加而快速上升，训练超过2000次时，收益值稳定并且趋于收敛。As shown in Fig. 5, the income of this model changes with the increase of training times. It can be seen that the training income increases rapidly with the increase of training times. When the training exceeds 2000 times, the income value is stable and tends to converge.

利用本系统，能够基于现在实际数据的基础上，完成对车辆换道策略的更新，从而为自动驾驶车辆提供自动换道策略和方法，从而保证换道过程的顺利进行，避免与周边车辆发生碰撞，提高了换道的安全性和效率，进一步保障了道路交通的通畅，减少了城市交通拥挤情况。Using this system, the vehicle lane changing strategy can be updated based on the current actual data, so as to provide automatic lane changing strategies and methods for autonomous vehicles, so as to ensure the smooth progress of the lane changing process and avoid collisions with surrounding vehicles. , improving the safety and efficiency of lane changing, further ensuring the smoothness of road traffic and reducing urban traffic congestion.

本实施例具体实施过程如下：The specific implementation process of this embodiment is as follows:

第一步，利用数据采集模块2采集目标车辆的数据信息和目标车辆附近的干扰车辆的运行数据，包括当前时刻周边车辆信息、当前时刻周边道路信息、下一时刻周边道路信息、下一时刻周边车辆信息和本车车辆信息，然后形成第一数据集合并将第一数据集合发送至数据分析模块3。The first step is to use the data collection module 2 to collect the data information of the target vehicle and the operation data of the interfering vehicles near the target vehicle, including the surrounding vehicle information at the current moment, the surrounding road information at the current moment, the surrounding road information at the next moment, and the surrounding road information at the next moment. The vehicle information and the vehicle information of the vehicle are then formed into a first data set and sent to the data analysis module 3 .

第二步，数据分析模块3接收到第一数据集合后，对第一数据集合进行分析处理，利用Actor-Critic算法对第一数据集合进行处理，使用马尔可夫决策过程来描述强化学习问题，形成一个六次元组M＝(S，A，P，r，ρ，γ)，然后得到自动驾驶车辆的换道场景和换道数据。In the second step, after the data analysis module 3 receives the first data set, it analyzes and processes the first data set, uses the Actor-Critic algorithm to process the first data set, and uses the Markov decision process to describe the reinforcement learning problem, A six-dimensional tuple M=(S, A, P, r, ρ, γ) is formed, and then the lane-changing scene and lane-changing data of the autonomous vehicle are obtained.

第三步，换道策略模块4根据得到的换道场景和换道数据生成第一换道策略，并将第一换道策略发送至处理器模块1，处理器模块1的数据存储单元5接收并存储第一换道策略，并由换道执行单元6根据第一换道策略得到基于规则的换道轨迹执行模型，并控制自动驾驶车辆进行车道变更。In the third step, the lane-changing strategy module 4 generates a first lane-changing strategy according to the obtained lane-changing scene and lane-changing data, and sends the first lane-changing strategy to the processor module 1, and the data storage unit 5 of the processor module 1 receives the The first lane-changing strategy is stored, and the lane-changing execution unit 6 obtains a rule-based lane-changing trajectory execution model according to the first lane-changing strategy, and controls the automatic driving vehicle to change lanes.

第四步，对换道轨迹执行模型进行训练和尝试，最后利用自动驾驶仿真环境highway_env和基于动力学的仿真软件CarSim对本模型的输出结果进行验证，利用自动驾驶仿真环境使用高速公路场景对车辆换道策略进行测试，并用平均绝对误差和平均绝对相对误差这两个统计量对模型进行误差统计，最终得到的换道轨迹和速度在动力学仿真中变化平稳，可以和目标轨迹保持较小误差的条件下被跟踪，车辆行驶稳定性良好。The fourth step is to train and try the lane-changing trajectory execution model, and finally use the automatic driving simulation environment highway_env and the dynamics-based simulation software CarSim to verify the output of this model, and use the automatic driving simulation environment to use the highway scene to change the vehicle. The lane strategy is tested, and the two statistics of mean absolute error and mean absolute relative error are used to perform error statistics on the model. The resulting lane-changing trajectory and speed change smoothly in the dynamic simulation, and can keep a small error with the target trajectory. It is tracked under conditions, and the vehicle has good driving stability.

本方案中，建立了深度强化学习模型，并综合考虑了目前主流换道模型的不足，从而引入基于规则的训练模型，对于有限数据下神经网络如何能适应和学习到更多驾驶技巧的问题提出了解决思路，利用采集到的目标车辆的数据信息，以及采集目标车辆附近的干扰车辆的运行数据来进行统计分析处理，利用Actor-Critic算法对第一数据集合进行处理，使用马尔可夫决策过程来描述强化学习问题，形成一个六次元组M＝(S，A，P，r，ρ，γ)，并对换道模型利用深度强化学习方法来进行训练和尝试，最后利用自动驾驶仿真环境highway_env和基于动力学的仿真软件CarSim对本模型的输出结果进行验证，整个验证过程中相较于以往，因为本模型的换道场景以及换道策略数据选择量多，且本方案提供的换道策略更加精准，因此相比于以往的模式，本方案中的换道行程时间反而更少，对比发现行程时间减少50％以上，碰撞的概率减少75％以上，进而有效地保证了本模型换道策略的真实性和准确性，指导自动驾驶车辆快速安全地完成车道变更，得到的换道轨迹能够适用于自动驾驶车辆换道的场景，使车辆可以在有限的数据量条件下对新的交通场景做出合理的反应，不仅可提供的备用换道策略数据量大，同时对换道过程中的周围环境监控数据也实时在反馈至本模型中，从而对换道过程中的换道策略进行及时干扰跟进，提高自动驾驶车辆换道的安全性，有效降低了道路交通拥挤情况和事故发生率，保障了驾乘人员生命安全。In this scheme, a deep reinforcement learning model is established, and the shortcomings of the current mainstream lane changing models are comprehensively considered, so a rule-based training model is introduced, and the problem of how the neural network can adapt to and learn more driving skills under limited data is raised. In order to solve the problem, the collected data information of the target vehicle and the operation data of the interfering vehicles near the target vehicle are used for statistical analysis and processing, the Actor-Critic algorithm is used to process the first data set, and the Markov decision process is used. To describe the reinforcement learning problem, form a six-dimensional tuple M=(S, A, P, r, ρ, γ), and use the deep reinforcement learning method to train and try the lane changing model, and finally use the autonomous driving simulation environment highway_env and the dynamics-based simulation software CarSim to verify the output results of this model. Compared with the past, this model has more choices of lane-changing scenarios and lane-changing strategy data, and the lane-changing strategy provided by this scheme is more efficient than before. Therefore, compared with the previous model, the travel time of lane changing in this scheme is less. It is found that the travel time is reduced by more than 50%, and the probability of collision is reduced by more than 75%, thus effectively ensuring the lane changing strategy of this model. Authenticity and accuracy, guide the autonomous vehicle to quickly and safely complete the lane change, and the obtained lane change trajectory can be applied to the scene of the autonomous vehicle changing lane, so that the vehicle can make decisions on new traffic scenarios under the condition of limited data volume. A reasonable response not only provides a large amount of data for the backup lane changing strategy, but also feeds back the monitoring data of the surrounding environment during the lane changing process to the model in real time, so as to conduct timely interference and tracking of the lane changing strategy during the lane changing process. It can improve the safety of lane changing of autonomous vehicles, effectively reduce road traffic congestion and accident rate, and ensure the safety of drivers and passengers.

实施例二：Embodiment 2:

本实施例基本与实施例一相同，区别在于：本系统还包括显示模块，在进行换道模型分析建立过程中，利用显示模块实时显示自动驾驶车辆的换道轨迹和车辆运行数据，能够更直观准确地得知车辆换道的具体情况。This embodiment is basically the same as the first embodiment, the difference is that the system also includes a display module. During the process of analyzing and establishing the lane-changing model, the display module is used to display the lane-changing trajectory and vehicle operation data of the autonomous driving vehicle in real time, which can be more intuitive. Accurately know the specific situation of the vehicle lane change.

本实施例的具体实施过程与实施例一相同，区别在于：The specific implementation process of this embodiment is the same as that of the first embodiment, and the difference is:

第四步，对换道轨迹执行模型进行训练和尝试，最后利用自动驾驶仿真环境highway_env和基于动力学的仿真软件CarSim对本模型的输出结果进行验证，利用自动驾驶仿真环境使用高速公路场景对车辆换道策略进行测试，并用平均绝对误差和平均绝对相对误差这两个统计量对模型进行误差统计，最后得到准确的自动驾驶换道策略，整个训练以及验证过程中，利用显示模块实时显示自动驾驶车辆的换道轨迹和车辆运行数据。The fourth step is to train and try the lane-changing trajectory execution model, and finally use the automatic driving simulation environment highway_env and the dynamics-based simulation software CarSim to verify the output of this model, and use the automatic driving simulation environment to use the highway scene to change the vehicle. Test the lane strategy, and use the two statistics of mean absolute error and mean absolute relative error to conduct error statistics on the model, and finally obtain an accurate automatic driving lane change strategy. During the entire training and verification process, the display module is used to display the automatic driving vehicle in real time. lane change trajectory and vehicle operation data.

提供显示换道轨迹和换道数据的功能，从而使自动驾驶车辆的换道测试更准确，也使操作人员能更直观地了解到整个测试过程，便于对不合格的地方进行更正，提高换道测试的测试效率。Provides the function of displaying lane-changing trajectory and lane-changing data, so that the lane-changing test of autonomous vehicles is more accurate, and the operator can understand the entire test process more intuitively, which is convenient for correcting unqualified places and improving lane-changing. Test efficiency of the test.

实施例三：Embodiment three:

本实施例基本与实施例一相同，区别在于：所述系统还包括换道轨迹修正模块，用于在换道过程中因为路况信息变化导致根据当前换道轨迹不能安全完成换道时，对自动驾驶车辆的换道轨迹进行修正，从而使自动驾驶车辆顺利完成换道，提高换道的安全保障，同时也能够极大程度上保障驾乘安全。This embodiment is basically the same as the first embodiment, the difference is that: the system further includes a lane-changing trajectory correction module, which is used for automatically changing lanes when the current lane-changing trajectory cannot be safely completed due to changes in road condition information during the lane-changing process. The lane-changing trajectory of the driving vehicle is corrected, so that the self-driving vehicle can successfully complete the lane-changing, improving the safety guarantee of lane-changing, and at the same time, it can also guarantee the driving safety to a great extent.

第三步，换道策略模块4根据得到的换道场景和换道数据生成第一换道策略，并将第一换道策略发送至处理器模块1，处理器模块1的数据存储单元5接收并存储第一换道策略，并由换道执行单元6根据第一换道策略得到基于规则的换道轨迹执行模型，并控制自动驾驶车辆进行车道变更；在换道过程中，若因一下意外情况导致路况信息变化，从而导致根据当前换道轨迹不能安全完成换道时，换道轨迹修正模块对自动驾驶车辆的换道轨迹进行实时修正，从而使自动驾驶车辆顺利完成换道。In the third step, the lane-changing strategy module 4 generates a first lane-changing strategy according to the obtained lane-changing scene and lane-changing data, and sends the first lane-changing strategy to the processor module 1, and the data storage unit 5 of the processor module 1 receives the And store the first lane-changing strategy, and the lane-changing execution unit 6 obtains a rule-based lane-changing trajectory execution model according to the first lane-changing strategy, and controls the autonomous driving vehicle to change lanes; The situation causes the road condition information to change, so that when the lane change cannot be completed safely according to the current lane change trajectory, the lane change trajectory correction module corrects the lane change trajectory of the automatic driving vehicle in real time, so that the automatic driving vehicle can successfully complete the lane change.

考虑到自动驾驶车辆周边道路环境的无规则变化以及周边车辆的紧急刹车或者突然加速等情况，在自动驾驶车辆按照当前换道策略进行车道变更过程中，若道路信息发送变化，由换道轨迹修正模块介入对换道轨迹进行实时修正，从而保证换道的安全顺利进行，不仅保障了换道的安全性，同时也能够保障驾乘人员的安全，减少交通事故的发生，改善交通情况。Considering the irregular changes of the road environment around the autonomous vehicle and the emergency braking or sudden acceleration of the surrounding vehicles, when the autonomous vehicle changes lanes according to the current lane-changing strategy, if the road information changes, the lane-changing trajectory will be used for correction. The module intervenes to correct the lane change trajectory in real time, so as to ensure the safe and smooth progress of the lane change.

以上所述的仅是本发明的实施例，方案中公知的具体技术方案和/或特性等常识在此未作过多描述。应当指出，对于本领域的技术人员来说，在不脱离本发明技术方案的前提下，还可以作出若干变形和改进，这些也应该视为本发明的保护范围，这些都不会影响本发明实施的效果和专利的实用性。本申请要求的保护范围应当以其权利要求的内容为准，说明书中的具体实施方式等记载可以用于解释权利要求的内容。The above are only examples of the present invention, and common knowledge such as well-known specific technical solutions and/or characteristics in the solutions are not described too much here. It should be pointed out that for those skilled in the art, some modifications and improvements can be made without departing from the technical solution of the present invention, which should also be regarded as the protection scope of the present invention, and these will not affect the implementation of the present invention. effect and the applicability of the patent. The scope of protection claimed in this application should be based on the content of the claims, and the descriptions of the specific implementation manners in the description can be used to interpret the content of the claims.

Claims

1. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning, is characterized in that: comprising a processor module, and a data acquisition module, a data analysis module and a lane-changing strategy module connected with the processor module respectively;

The data collection module is used to collect the data information of the target vehicle and the operation data of the interfering vehicles near the target vehicle, and then form a first data set and send the first data set to the data analysis module;

The data analysis module is used to analyze and process the first data set, and obtain the lane-changing scene and lane-changing data of the autonomous driving vehicle;

The lane-changing strategy module is configured to generate a first lane-changing strategy according to the obtained lane-changing scene and lane-changing data, and send the first lane-changing strategy to the processor module;

The processor module includes a data storage unit and a lane change execution unit, the data storage unit is used to store a first lane change strategy; the lane change execution unit is used to obtain a base The regular lane-changing trajectory executes the model and controls the autonomous vehicle for lane changes.

2 . The lane-changing decision system for autonomous driving vehicles based on deep reinforcement learning according to claim 1 , wherein the first data set includes information on surrounding vehicles at the current moment, surrounding road information at the current moment, and surrounding roads at the next moment. 3 . information, surrounding vehicle information at the next moment, and own vehicle information.

3. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning according to claim 1, wherein the analyzing and processing the first data set is to use a preset analysis algorithm to analyze the limited first data The collection is used for infinite scene exploration and analysis, and deep reinforcement learning is performed on the analysis process before the corresponding lane changing scene is obtained.

4. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning according to claim 3, wherein: the preset analysis algorithm is an Actor-Critic algorithm; the deep reinforcement learning is, using Markov The decision-making process describes the analysis process and forms a six-dimensional tuple M=(S, A, P, r, ρ, γ), where S is the state space, which is the set of all states; A is the action space, The action space is the set of all actions; P is the state transition probability; r is the reward function of the state transition process; γ is the discount coefficient in the state transition process.

5. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning according to claim 4, wherein the reward function is:

6. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning according to claim 1, wherein the data analysis module uses a deep reinforcement learning method to perform a lane-changing model on the basis of the rule-based lane-changing model. Train and try, and finally validate the model.

7. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning according to claim 1, wherein the lane-changing strategy module utilizes a rule-based trajectory planning algorithm to assist calculation when generating a lane-changing strategy, so that The expression of the rule-based trajectory planning algorithm is

where θ _i is the heading angle of the starting point of the planning step,

8. The lane-changing decision-making system for autonomous driving vehicles based on deep reinforcement learning according to claim 3, characterized in that: when the analysis process is described using the Markov decision-making process, the state value function is defined as follows,

where a _t , r _t , s _t+1 , at _t+1 , r _t+1 , ... ∼ π represent the trajectory from the interaction of policy π with the environment.

9. A lane-changing decision-making method for autonomous driving vehicles based on deep reinforcement learning, characterized in that it comprises the following steps:

Step S1, collecting data information of the target vehicle and operation data of interfering vehicles near the target vehicle;

Step S2, using the Actor-Critic algorithm to analyze and process the collected data, and combining with the Markov decision process to describe the reinforcement learning problem, to obtain the lane-changing scene and lane-changing data of the autonomous driving vehicle;

Step S3, using the rule-based trajectory planning algorithm to calculate the lane-changing trajectory and using the reward function to correct the test process, and finally obtain the lane-changing strategy;

In step S4, a rule-based lane-changing trajectory execution model is obtained according to the lane-changing strategy, and the output results of the model are verified by using the automatic driving simulation environment highway_env and the dynamics-based simulation software CarSim.

10. The deep reinforcement learning-based decision-making method for changing lanes of an autonomous vehicle according to claim 9, wherein the verification of the output result of the model is that the vehicle is changed lanes using a highway scene in an autonomous driving simulation environment. The strategy is tested and the model is errored using two statistics, Mean Absolute Error and Mean Absolute Relative Error.