CN115071758B

CN115071758B - A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning

Info

Publication number: CN115071758B
Application number: CN202210758672.3A
Authority: CN
Inventors: 陈慧勤; 朱嘉祺
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2023-03-21
Anticipated expiration: 2042-06-29
Also published as: CN115071758A

Abstract

The application discloses a reinforcement learning-based man-machine driving sharing control right switching method, which is suitable for distribution of a reinforcement learning-based man-machine driving sharing control right switching system to driving weights between a driver and a driving system, and comprises the following steps: calculating a driving operation action prediction index according to the driver information and the vehicle road prediction information; and inputting the driving operation action prediction index and the comprehensive driving operation action index into the control weight switching system, and calculating the driving weight between the driver and the driving system. Through the technical scheme in the application, the risk of longitudinal and transverse synthesis of the vehicle is effectively solved, the influence of uncertainty caused by a driver is weakened, and the driver is comprehensively considered at different angles, so that the judgment error of the driver is reduced.

Description

A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning

技术领域technical field

本申请涉及智能驾驶的技术领域，具体而言，涉及一种基于强化学习的人机共驾控制权切换方法。The present application relates to the technical field of intelligent driving, and in particular, relates to a control switching method for man-machine co-driving based on reinforcement learning.

背景技术Background technique

在现有自动驾驶技术中，通常采用控制权切换的方式，对驾驶员的驾驶行为进行修正，以提高车辆驾驶的安全性。In the existing automatic driving technology, the way of switching control rights is usually adopted to modify the driver's driving behavior to improve the safety of vehicle driving.

如专利CN 109795486 A根据驾驶员输入力矩Td和左右车轮到车道边界的时间TLC动态调整共驾系数(范围为0～1)，实现从驾驶员到辅助控制系的逐渐过渡，通过模糊控制确定此时的共驾系数。但这种方式虽然解决了驾驶横向偏离危险，但是并没有考虑驾驶过程中的纵向风险。For example, in patent CN 109795486 A, according to the driver input torque Td and the time TLC between the left and right wheels to the lane boundary, the co-driving coefficient (range is 0-1) is dynamically adjusted to realize the gradual transition from the driver to the auxiliary control system, which is determined by fuzzy control. co-driving coefficient. However, although this method solves the danger of driving lateral deviation, it does not consider the longitudinal risk in the driving process.

又如专利CN 108469806 A对当前驾驶环境、车辆和驾驶员的状态进行关键因子构建，对关键因子进行态势评估，以及同步评估自动驾驶系统和驾驶员的驾驶能力，判定是否能够进行驾驶权转移。该方案虽然考虑了多个的可能影响驾驶安全的因素，但是在驾驶权切换中对于驾驶能力的评估方式太过复杂，主观与随机因素较大，考虑数据过多，实时性与稳定性较差。Another example is patent CN 108469806 A, which constructs key factors of the current driving environment, vehicle and driver status, conducts situational assessment of key factors, and simultaneously evaluates the driving ability of the automatic driving system and the driver to determine whether the driving right can be transferred. Although this scheme considers multiple factors that may affect driving safety, the evaluation method for driving ability in the switching of driving rights is too complicated, with large subjective and random factors, too much data to consider, and poor real-time performance and stability. .

再如论文《基于驾驶人风险响应机制的人机共驾模型》对环境风险进行量化，通过拟合环境风险作用与驾驶人的行驶加速度，得到安全风险响应策略，通过策略偏差来进行人机共驾控制权柔性切换。其解决了驾驶人状态和环境安全的耦合问题，但是其安全策略建立在大量行车片段上，而这些片段并不能完全概括全部安全操作，并且只解决了高速路上跟车超车时的切换问题。同时以上控制权切换方式均只考虑了当前时刻的安全问题，并没有考虑未来时间段可能导致的交通危险。Another example is the paper "Human-Machine Co-Driving Model Based on Driver's Risk Response Mechanism" quantifies the environmental risk, obtains the safety risk response strategy by fitting the environmental risk effect and the driver's driving acceleration, and implements the human-machine co-driving strategy through the strategy deviation. Flexible switching of driving control. It solves the coupling problem of driver status and environmental safety, but its safety strategy is based on a large number of driving segments, which cannot fully summarize all safety operations, and only solves the switching problem when following and overtaking on the highway. At the same time, the above control right switching methods only consider the safety issues at the current moment, and do not consider the traffic hazards that may be caused in the future time period.

因此，现有自动驾驶中控制权切换方案的安全性、稳定性有待提高。Therefore, the safety and stability of the existing control right switching schemes in automatic driving need to be improved.

发明内容Contents of the invention

本申请的目的在于：如何有效解决车辆纵向与横向综合的风险，减少对驾驶人的判断误差以提高驾驶权切换的准确性与安全性。The purpose of this application is: how to effectively solve the risk of vehicle vertical and horizontal integration, reduce the judgment error of the driver, and improve the accuracy and safety of driving right switching.

本申请的技术方案是：提供了一种基于强化学习的人机共驾控制权切换方法，该方法包括：该方法适用于基于强化学习的人机共驾控制权切换系统对驾驶人与驾驶系统之间驾驶权重的分配，方法包括：根据驾驶人信息和车路预测信息，计算驾驶操作动作预测指数；将驾驶操作动作预测指数与综合驾驶操作动作指数，输入至控制权切换系统，计算驾驶人与驾驶系统之间的驾驶权重。The technical solution of the present application is to provide a method for switching control rights of human-machine co-driving based on reinforcement learning. The method includes: calculating the driving operation action prediction index according to the driver information and vehicle road prediction information; inputting the driving operation action prediction index and the comprehensive driving operation action index into the control right switching system to calculate the driver Driving weight between driving system and driving system.

上述任一项技术方案中，进一步地，驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格以及驾驶人潜意识驱动影响偏差，车路预测信息至少包括预测车路危险度以及预测车路危险阈值，In any of the above technical solutions, further, the driver information includes at least the driver's state, driver's intention, driver's style, and driver's subconscious driving influence deviation, and the vehicle road prediction information includes at least the predicted vehicle road risk and the predicted vehicle road risk. danger threshold,

驾驶操作动作预测指数的计算公式为：The calculation formula of the driving operation behavior prediction index is:

式中，

为驾驶操作动作预测指数，Z_t为驾驶人状态操作反应延时，σ为驾驶人潜意识驱动影响偏差，δ为驾驶人意图，S为驾驶人风格，v_risk为预测车路危险度，A_arisk为预测车路危险阈值。In the formula,

is the driving operation action prediction index, Z _t is the driver’s state operation reaction delay, σ is the driver’s subconscious driving influence deviation, δ is the driver’s intention, S is the driver’s style, v _risk is the predicted vehicle road risk, A _risk To predict the road hazard threshold.

上述任一项技术方案中，进一步地，述驾驶人潜意识驱动影响偏差σ的计算公式为：In any one of the above technical solutions, further, the formula for calculating the driver's subconscious driving influence deviation σ is:

R_d＝|d-q_ki|R _d = |dq _ki |

式中，σ为驾驶人潜意识驱动影响偏差，sum为采集交通场景数量，D_i为一次交通场景时间段内的一系列潜意识驱动强度，ρ′,τ,ω为待定参数，α为潜意识侧重权重，β为驾驶人个人安全倾向权重，d为车辆当前横向位置，q_ki为车辆在此场景(标签)下的拟合横向位置，a为车辆加速度，R_d为位置参数。In the formula, σ is the driver’s subconscious driving influence deviation, sum is the number of collected traffic scenes, D _i is a series of subconscious driving strengths in a traffic scene time period, ρ′, τ, ω are undetermined parameters, and α is the subconscious weight , β is the weight of the driver's personal safety tendency, d is the current lateral position of the vehicle, q _ki is the fitted lateral position of the vehicle in this scene (label), a is the vehicle acceleration, and R _d is the position parameter.

上述任一项技术方案中，进一步地，驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格，综合驾驶操作动作指数的计算过程具体包括：In any one of the above technical solutions, further, the driver information includes at least the driver's state, driver's intention, and driver's style, and the calculation process of the comprehensive driving operation action index specifically includes:

根据当前车辆在道路中的位置，确定当前车路信息，其中，当前车路信息至少包括当前车路危险度以及当前车路危险阈值；According to the position of the current vehicle on the road, determine the current vehicle road information, wherein the current vehicle road information at least includes the current vehicle road hazard degree and the current vehicle road hazard threshold;

根据驾驶人信息和当前车路信息，结合环境响应因子，采用分段函数的方式，确定综合驾驶操作动作指数，其中，综合驾驶操作动作指数的计算公式为：Based on driver information and current vehicle road information, combined with environmental response factors, the comprehensive driving operation index is determined by using a piecewise function. The formula for calculating the comprehensive driving operation index is:

式中，

为综合驾驶操作动作指数，z₁为驾驶人状态，γ为环境响应因子，H_x,y为当前车路危险度，σ为道路修正参数，a_pre为实时操作量化参数，risk为当前车路危险阈值。In the formula,

is the comprehensive driving operation action index, z ₁ is the state of the driver, γ is the environmental response factor, H _{x, y} is the current vehicle road risk, σ is the road correction parameter, a _pre is the real-time operation quantitative parameter, and risk is the current vehicle road danger threshold.

上述任一项技术方案中，进一步地，根据当前车辆在道路中的位置，确定当前车路信息，具体包括：In any one of the above technical solutions, further, according to the current position of the vehicle on the road, the current vehicle road information is determined, specifically including:

确定当前车辆在道路中的位置，至少包括当前车辆的与前车距离和当前车辆的横向位置；Determining the position of the current vehicle on the road, including at least the distance from the current vehicle to the vehicle in front and the lateral position of the current vehicle;

根据当前车辆的与前车距离，确定纵向车路危险值；According to the distance between the current vehicle and the vehicle in front, determine the risk value of the longitudinal vehicle road;

根据当前车辆的横向位置，确定横向车路危险值；According to the current lateral position of the vehicle, determine the risk value of the lateral roadway;

根据纵向车路危险值、横向车路危险值，计算当前车路危险度，对应的计算公式为：Calculate the current vehicle road risk according to the longitudinal vehicle road risk value and the transverse vehicle road risk value. The corresponding calculation formula is:

式中，H_x,y为当前车路危险度，

为不同路段危险距离影响因子，其取值范围为[1,10]，y₁为纵向车路危险值，y₂为横向车路危险值；In the formula, H _{x, y} is the current road hazard,

is the risk distance influencing factor of different road sections, and its value range is [1,10], y ₁ is the risk value of the longitudinal vehicle road, and y ₂ is the risk value of the transverse vehicle road;

根据当前车路危险度，计算不同场景的当前车路危险阈值，将当前车路危险阈值和当前车路危险度记作当前车路信息。According to the current vehicle road risk, calculate the current vehicle road risk threshold in different scenarios, and record the current vehicle road risk threshold and the current vehicle road risk degree as the current vehicle road information.

上述任一项技术方案中，进一步地，环境响应因子γ的计算公式为：In any of the above technical solutions, further, the calculation formula of the environmental response factor γ is:

式中，m为车辆质量，M为车辆类型及目的修正参数，k₁为动力学修正参数，

代表车辆的期望速度以及速度方向，v_limleast(t)为最小速度值，k₂为交通场景修正参数，

为车辆相互作用力参数，k₃为行人遵守交通规则程度的修正参数，

为行人相互作用力参数，k₄为周边物理环境复杂程度修正参数，

为环境相互作用力参数，k₅为交通规则影响程度的修正参数，

为规则参数。In the formula, m is the mass of the vehicle, M is the vehicle type and purpose correction parameter, _k1 is the dynamic correction parameter,

Represents the expected speed and direction of the vehicle, v _limleast (t) is the minimum speed value, k ₂ is the traffic scene correction parameter,

is the vehicle interaction force parameter, k ₃ is the correction parameter of the degree of pedestrian compliance with traffic rules,

is the pedestrian interaction force parameter, k ₄ is the correction parameter of the complexity of the surrounding physical environment,

is the environmental interaction force parameter, k ₅ is the correction parameter of the influence degree of traffic rules,

is a rule parameter.

上述任一项技术方案中，进一步地，计算驾驶人与驾驶系统之间的驾驶权重，具体包括：步骤9.1，使用Z-score标准化公式，对当前时刻的驾驶操作动作预测指数

与综合驾驶操作动作指数

进行标准化，分别计算这次驾驶中从开始到当前的驾驶操作动作预测指数

与综合驾驶操作动作指数

的均值与标准差；步骤9.2，将Z-Score标准化后的驾驶操作动作预测指数

与综合驾驶操作动作指数

以及当前分别相应的均值与标准差作为输入参数，输入进基于强化学习的人机共驾控制权切换系统，判断是否满足权重分配条件，若满足，执行步骤9.3，若不满足，重新获取驾驶人信息和车路预测信息；步骤9.3，基于Q学习算法，利用输入参数调整Q学习算法中的学习状态，并根据Q学习算法中下一个状态的价值最大值中的动作对驾驶人驾驶权权重进行赋值，其中，驾驶系统驾驶权权重为1与驾驶人驾驶权权重的差值。In any of the above technical solutions, further, the calculation of the driving weight between the driver and the driving system specifically includes: step 9.1, using the Z-score standardized formula to predict the driving operation action index at the current moment

and Comprehensive Driving Operation Action Index

Standardize and calculate the driving operation action prediction index from the beginning to the current driving in this driving respectively

and Comprehensive Driving Operation Action Index

mean and standard deviation of ; Step 9.2, Z-Score standardized driving action prediction index

and Comprehensive Driving Operation Action Index

And the current corresponding mean and standard deviation as input parameters, input into the human-machine co-driving control switching system based on reinforcement learning, and judge whether the weight distribution conditions are met. If yes, perform step 9.3. If not, obtain the driver again. Information and vehicle road prediction information; step 9.3, based on the Q learning algorithm, use the input parameters to adjust the learning state in the Q learning algorithm, and adjust the driver's driving right weight according to the action in the value maximum value of the next state in the Q learning algorithm Assignment, wherein, the weight of the driving right of the driving system is the difference between 1 and the weight of the driving right of the driver.

上述任一项技术方案中，进一步地，权重分配条件具体包括：连续5次第一参数

与第二参数

均小于或等于第一触发阈值；或者，连续3次第二参数

小于或等于第二触发阈值；或者，连续3次第一参数

小于或等于第二触发阈值，其中，第一参数

为当前输入的驾驶操作动作预测指数

与从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数

的均值相差标准差的数目，第二参数

为当前输入的综合驾驶操作动作指数

与从驾驶行为开始到当前时刻为止所有输入的综合驾驶操作动作指数

的均值相差标准差的数目。In any of the above technical solutions, further, the weight distribution conditions specifically include: 5 consecutive times of the first parameter

with the second parameter

All are less than or equal to the first trigger threshold; or, 3 consecutive times of the second parameter

Less than or equal to the second trigger threshold; or, 3 consecutive times of the first parameter

less than or equal to the second trigger threshold, where the first parameter

For the current input driving operation action prediction index

and all input driving operation behavior prediction indices from the beginning of driving behavior to the current moment

The number of standard deviations from the mean of , the second parameter

is the currently input comprehensive driving operation action index

and the comprehensive driving operation action index of all inputs from the beginning of driving behavior to the current moment

The number of standard deviations from the mean of .

本申请的有益效果是：The beneficial effect of this application is:

本申请中的技术方案有效解决了车辆纵向与横向综合的风险，弱化了驾驶人本身带来的不确定性的影响，对驾驶人进行不同角度的综合考虑从而减少了对驾驶人的判断误差，同时适用于多种交通场景，并且将未来时间段可能导致的交通危险也进行了综合考虑，进一步提高了驾驶权切换的准确性与安全性，最终将所有因素综合为两个指数输入进切换系统，数据量少且准确，实时性更高。The technical solution in this application effectively solves the risk of vehicle longitudinal and lateral integration, weakens the influence of uncertainty brought by the driver itself, and comprehensively considers the driver from different angles to reduce the judgment error of the driver. At the same time, it is applicable to a variety of traffic scenarios, and the traffic hazards that may be caused in the future time period are also considered comprehensively, which further improves the accuracy and safety of driving right switching, and finally integrates all factors into two indices and inputs them into the switching system , the amount of data is small and accurate, and the real-time performance is higher.

在本申请的优选实现方式中，考虑到了驾驶人通过经验与潜意识对驾驶的影响，减少了切换系统判断负担，实时性更好。并可以提前预知其他车辆可能对本车造成的风险，避免驾驶过程中遭遇追尾与碰撞。In the preferred implementation of the present application, the influence of the driver's experience and subconsciousness on driving is taken into consideration, reducing the judgment burden of switching systems, and achieving better real-time performance. And it can predict the risks that other vehicles may cause to the car in advance, so as to avoid rear-end collisions and collisions during driving.

附图说明Description of drawings

本申请的上述和/或附加方面的优点在结合下面附图对实施例的描述中将变得明显和容易理解，其中：The advantages of the above and/or additional aspects of the present application will become apparent and easily understood in the description of the embodiments in conjunction with the following drawings, in which:

图1是根据本申请的一个实施例的基于强化学习的人机共驾控制权切换方法的示意流程图；FIG. 1 is a schematic flow chart of a method for switching control rights of human-machine co-driving based on reinforcement learning according to an embodiment of the present application;

图2是根据本申请的一个实施例的道路相对位置及相对安全位置图；Fig. 2 is according to an embodiment of the present application the relative position of the road and relative safety position map;

图3是根据本申请的一个实施例的无模型强化学习学习过程示意图；Fig. 3 is a schematic diagram of a model-free reinforcement learning learning process according to an embodiment of the present application;

图4是根据本申请的一个实施例的基于强化学习的人机共驾控制权切换机制总体结构示意图；FIG. 4 is a schematic diagram of the overall structure of the human-machine co-driving control switching mechanism based on reinforcement learning according to an embodiment of the present application;

图5是根据本申请的一个实施例的强化学习中Q学习算法中的Q-table示意图。Fig. 5 is a schematic diagram of a Q-table in a Q-learning algorithm in reinforcement learning according to an embodiment of the present application.

具体实施方式Detailed ways

为了能够更清楚地理解本申请的上述目的、特征和优点，下面结合附图和具体实施方式对本申请进行进一步的详细描述。需要说明的是，在不冲突的情况下，本申请的实施例及实施例中的特征可以相互结合。In order to better understand the above-mentioned purpose, features and advantages of the present application, the present application will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

在下面的描述中，阐述了很多具体细节以便于充分理解本申请，但是，本申请还可以采用其他不同于在此描述的其他方式来实施，因此，本申请的保护范围并不受下面公开的具体实施例的限制。In the following description, a lot of specific details are set forth in order to fully understand the application, however, the application can also be implemented in other ways different from those described here, therefore, the protection scope of the application is not limited by the following disclosure Limitations of specific embodiments.

如图1所示，本实施例提供了一种基于强化学习的人机共驾控制权切换方法，包括：As shown in FIG. 1, this embodiment provides a method for switching control rights of man-machine co-driving based on reinforcement learning, including:

步骤1、构建基于真实车路环境的模拟器，并在模拟器中构建各种情形的车路场景；Step 1. Construct a simulator based on the real vehicle road environment, and construct various vehicle road scenes in the simulator;

进一步地，步骤1通过以下方式实现：Further, step 1 is realized in the following ways:

步骤1.1、模拟器硬件需要拥有对驾驶员进行人像采集的摄像头以及模仿真实车辆的驾驶操作环境；Step 1.1, the simulator hardware needs to have a camera for capturing the portrait of the driver and to simulate the driving operation environment of the real vehicle;

步骤1.2、构建大量的真实世界会遇到的典型交通环境，包括全类型路段跟车场景、全类型路段超车场景、道路路口场景、拥堵路段场景等；Step 1.2. Construct a large number of typical traffic environments encountered in the real world, including car-following scenarios on all types of road sections, overtaking scenarios on all types of road sections, road intersection scenes, congested road section scenarios, etc.;

步骤1.3、在不同典型交通环境场景构建中，插入一定数量危险交通场景以及事故模拟场景。Step 1.3. In the construction of different typical traffic environment scenarios, insert a certain number of dangerous traffic scenarios and accident simulation scenarios.

步骤2、持续采集周围道路车辆信息，当前驾驶人状态、驾驶人意图、驾驶人风格与动作信息，控制权权重分配信息以及自动驾驶系统相关信息，计算得到驾驶人潜意识驱动影响偏差；Step 2. Continuously collect surrounding road vehicle information, current driver status, driver intention, driver style and action information, control weight distribution information and information about the automatic driving system, and calculate the driver's subconscious driving influence deviation;

进一步地，步骤2中可以包括以下过程：Further, step 2 may include the following processes:

步骤2.1、驾驶员需要在不同场景内完成完整的驾驶过程；Step 2.1, the driver needs to complete the complete driving process in different scenarios;

步骤2.2、在无控制权切换系统介入的情况下，驾驶员需要先进行一定量的不同驾驶场景正常驾驶，收集记录驾驶过程中驾驶人的驾驶操作与道路状况，统计分析得到驾驶人风格，并且计算驾驶人潜意识驱动影响偏差(利用驾驶人在不同驾驶场景下积累的经验造成的加减速与道路横向位置的变化来刻画驾驶人潜意识驱动操作的影响)：Step 2.2. Without the intervention of the control right switching system, the driver needs to drive normally in a certain amount of different driving scenarios, collect and record the driver's driving operation and road conditions during the driving process, and obtain the driver's style through statistical analysis, and Calculate the driver's subconscious driving influence deviation (use the driver's accumulated experience in different driving scenarios to describe the influence of the driver's subconscious driving operation on acceleration, deceleration and changes in the lateral position of the road):

其中，驾驶人潜意识驱动影响偏差的计算公式为：Among them, the calculation formula of driver's subconscious driving influence deviation is:

R_d＝|d-q_ki|R _d = |dq _ki |

具体的，驾驶人潜意识驱动影响偏差不考虑其他交通参与者的影响，只从个人安全角度进行考虑。基于最大熵原理，建立与驾驶人潜意识相关的最大熵方法。Specifically, the driver's subconscious driving influence deviation does not consider the influence of other traffic participants, but only considers it from the perspective of personal safety. Based on the principle of maximum entropy, a maximum entropy method related to the driver's subconscious is established.

首先构建熵函数：First construct the entropy function:

其中H(x)为熵，即表示对事物不确定性的度量；p_k为概率分布；C为常数，取决于熵的衡量单位，此处取1。Among them, H(x) is entropy, which means the measurement of the uncertainty of things; p _k is the probability distribution; C is a constant, which depends on the measurement unit of entropy, and 1 is taken here.

在该熵函数中，需要的是驾驶人潜意识驱动影响偏差即目前所处环境潜意识对行为的影响如何，但由于概率分布p_k的取值为0到1的小数，使得log₂p_k的取值为负数，所以，本实施例中引入非负整数q_i替代熵函数中的概率分布p_k。In this entropy function, what is needed is the driver’s subconscious driving influence deviation, that is, the influence of the subconsciousness of the current environment on the behavior, but because the value of the probability distribution p _k is a decimal number from 0 to 1, the value of log ₂ p _k The value is a negative number, so in this embodiment, a non-negative integer q _i is introduced to replace the probability distribution p _k in the entropy function.

定义参数q_i为不同道路场景的相对安全位置，其中，相对位置如图2所示，从道路左侧为原点，建立横向坐标轴，采用道路单一车道一半宽度作为行驶位置，将道路分为八个区域，相对安全位置即车辆在正常行驶时一半以上处于该位置。The parameter q _i is defined as the relative safe position of different road scenarios. The relative position is shown in Figure 2. The horizontal coordinate axis is established from the left side of the road as the origin, and half the width of a single lane of the road is used as the driving position. The road is divided into eight A relatively safe position means that more than half of the vehicle is in this position during normal driving.

不同场景道路差别较大，不能得到准确的可用于完全表示道路路径的具体位置，所以采用ln代替log以2为底的形式，且q_i主要值为大于一的整数，所以需将原熵函数负号去除，这种差别可用以下修正熵表示：The roads in different scenarios are quite different, and it is impossible to obtain an accurate specific location that can be used to fully represent the road path. Therefore, ln is used to replace log with the base 2, and the main value of q _i is an integer greater than one, so the original entropy function needs to be The negative sign is removed, and this difference can be represented by the following modified entropy:

其次建立修正熵的约束条件：第一，道路状况约束，每一个驾驶者都会选择道路状况良好的一侧。第二，交通规则的约束，驾驶人更倾向按交通规则规定驾驶。第三，交通需求的约束，即驾驶人在道路场景中需求为超车、跟车还是直行，对应公式如下：Secondly, the constraint conditions of the modified entropy are established: first, the road condition is constrained, and each driver will choose the side with good road condition. Second, due to the constraints of traffic rules, drivers are more inclined to drive according to traffic rules. Third, the constraints of traffic demand, that is, whether the driver needs to overtake, follow the car or go straight in the road scene, the corresponding formula is as follows:

约束1：

Constraint 1:

约束2：

Constraint 2:

约束3：b(q_i)∈SConstraint 3: b(q _i )∈S

式中，A_min、A_max为道路通行能力评分的下限与上限；

为不同路段陌生程度干扰系数；b为交通需求影响权重；B为交通规则最大边界；b(q_i)为交通需求的判定，即已知交通需求如超车、跟车还是直行，通过需求来预估q_i；S为交通需求集合，集合内为所有正常驾驶行为位置结果。In the formula, A _min and A _max are the lower limit and upper limit of road capacity score;

is the interference coefficient of unfamiliarity of different road sections; b is the influence weight of traffic demand; B is _the maximum boundary of traffic rules; Estimate q _i ; S is the traffic demand set, and the set contains all normal driving behavior location results.

通过设置三种约束条件以及不同道路场景，利用修正熵进行计算，得到修正熵E取值最大的时候即为不同道路场景的相对安全位置q_i，对相对安全位置q_i进行聚类，对每类标定标签(如超车、跟车、直行)，接着对相对安全位置q_i进行拟合，得到拟合横向位置q_ki，该位置为在不同的标签下，此驾驶人最倾向走的安全位置。By setting three kinds of constraint conditions and different road scenarios, and using the modified entropy to calculate, it is obtained that when the value of the modified entropy E is the largest, it is the relative safe position q _i of different road scenarios, and the relative safe position q _i is clustered, and each Class calibration labels (such as overtaking, following, and going straight), and then fit the relative safe position q _i to get the fitted lateral position q _ki , which is the safe position that the driver is most inclined to go under different labels .

综上所述，拟合横向位置q_ki为在约束条件下修正熵E取值最大时的相对安全位置q_i。To sum up, the fitted lateral position q _ki is the relative safe position q _i when the corrected entropy E takes the maximum value under the constraints.

D_i为一次交通场景时间段内的一系列潜意识驱动强度，其计算公式为：D _i is a series of subconscious driving strengths within a time period of a traffic scene, and its calculation formula is:

R_d＝|d-q_ki|R _d = |dq _ki |

式中，d为车辆当前横向位置，q_ki为车辆在此场景(标签)下的拟合横向位置，a为车辆加速度，α为潜意识侧重权重，β为驾驶人个人安全倾向权重，ρ′,,为待定参数，其取值应该在不同交通场景下满足潜意识驱动强度变化趋势，其变化趋势为：In the formula, d is the current lateral position of the vehicle, q _ki is the fitted lateral position of the vehicle in this scene (label), a is the acceleration of the vehicle, α is the subconscious weight, β is the weight of the driver’s personal safety tendency, ρ′, , is an undetermined parameter, and its value should meet the changing trend of subconscious drive strength in different traffic scenarios, and its changing trend is:

当R_d≥Z(Z为安全值，为设定值，不同道路取值不一样)、潜意识驱动强度D_i的值较大时，待定参数ρ′,τ,ω的取值随着R_d以及|a|的增加而增大，即越来越偏向不安全，这时候潜意识驱动操作动作的强度就越大；When R _d ≥ Z (Z is a safe value, which is a set value, and the value of different roads is different), and the value of the subconscious drive strength D _i is relatively large, the values of the undetermined parameters ρ′, τ, ω are as R _d And the increase of |a|, that is, it is more and more unsafe, and the intensity of the operation action driven by the subconscious will be greater at this time;

当R_d<Z、D_i的值较小时，待定参数ρ′,τ,ω的取值随着R_d以及|a|的减小而减小，即越来越偏向安全，这时候潜意识驱动操作动作的强度就越小。When R _d < Z and the value of D _i is small, the values of the undetermined parameters ρ′, τ, ω decrease with the decrease of R _d and |a|, that is, they tend to be more and more safe. At this time, the subconscious drives The less intense the action is.

sum为采集交通场景数量，这种强度取均值的结果σ即为驾驶人潜意识驱动影响偏差。sum is the number of traffic scenes collected, and the result of taking the mean value of this intensity σ is the driver's subconscious driving influence deviation.

步骤2.3、驾驶人模拟真实驾驶过程中可能遇到的状况，如疲劳、情绪激动、分心等危险状态以及正常驾驶；Step 2.3, the driver simulates the conditions that may be encountered during real driving, such as fatigue, emotional agitation, distraction and other dangerous states and normal driving;

步骤2.4、收集数据，得到周围车辆速度、距离、路面信息，自身车辆刹车、油门、方向盘数据、驾驶权重分配以及驾驶系统驾驶人意图与操作数据，通过数据统计处理的方式得到驾驶人状态、驾驶人意图信息。Step 2.4, collect data, obtain surrounding vehicle speed, distance, road surface information, own vehicle brake, accelerator, steering wheel data, driving weight distribution, and driver intention and operation data of the driving system. information about human intentions.

步骤3、根据所采集的周围道路信息与车辆状态，得到与各周边单元的相互作用力与环境响应因子γ；Step 3. According to the collected surrounding road information and vehicle state, the interaction force with each surrounding unit and the environmental response factor γ are obtained;

进一步地，步骤3通过以下方式实现：Further, step 3 is realized in the following ways:

步骤3.1、环境响应因子γ为车辆与车路环境交互下共同作用影响下的相互作用力，尤其反应对不同单位的响应。采用下式对环境响应因子γ进行计算：Step 3.1, the environmental response factor γ is the interaction force under the influence of the interaction between the vehicle and the vehicle-road environment, especially the response to different units. The following formula is used to calculate the environmental response factor γ:

公式：formula:

v_limleast(t)为当前时间段场景中限速与车辆速度的最小速度值；v _limleast (t) is the minimum speed value of the speed limit and vehicle speed in the scene of the current time period;

m为本车辆质量；m is the mass of the vehicle;

M为车辆类型及目的修正参数；M is the modification parameter of vehicle type and purpose;

代表车辆的期望速度以及速度方向，

由牛顿第二定律与运动学公式推出。

Represents the desired speed and direction of the vehicle,

Introduced by Newton's second law and kinematics formula.

k₁为动力学修正参数；k ₁ is a kinetic correction parameter;

k₂为交通场景修正参数，如高速路段，拥堵路段等，

为与其他车辆之间的相互作用力，其中，车辆相互作用力参数

为：k ₂ is the traffic scene correction parameter, such as high-speed road section, congested road section, etc.,

is the interaction force with other vehicles, where the vehicle interaction force parameter

for:

θ_1l为该车行进方向与其他车辆行进方向夹角，Δv_1l/Δμ_1l为速度差与距离差的比值，u为安全距离，ρ为与其他车辆距离，这一式子表明，大于安全距离表现为吸引力，越靠近安全距离吸引力越小，当小于安全距离，吸引力转变为排斥力，越靠近其他车辆排斥力越大。横向平行位置且行进方向平行车辆之间不存在相互作用力，纵向同一车道相互作用力绝对值最大。θ _1l is the angle between the traveling direction of the vehicle and other vehicles, Δv _1l /Δμ _1l is the ratio of the speed difference to the distance difference, u is the safety distance, and ρ is the distance from other vehicles. Attractive force, the closer to the safe distance, the smaller the attractive force, when it is smaller than the safe distance, the attractive force turns into repulsive force, and the closer to other vehicles, the greater the repulsive force. There is no interaction force between vehicles in parallel positions in the transverse direction and parallel to the direction of travel, and the absolute value of the interaction force in the same lane in the longitudinal direction is the largest.

k₃为行人遵守交通规则程度的修正参数，

为与行人的相互作用力，其中，行人相互作用力参数

为：k ₃ is the correction parameter of the degree of pedestrian compliance with traffic rules,

is the interaction force with pedestrians, where, the parameter of pedestrian interaction force

for:

v为车辆当前速度，θ_1j为该车车头中心与行人夹角，r_1j为距离差，t_1j为相遇预估时间差，这一式子表明当车辆速度为0时，与行人没有相互作用力，当车辆与行人距离越近，角度差越小，相遇预估时间越短，车辆速度越大，均会导致排斥力加大。v is the current speed of the vehicle, θ _1j is the angle between the center of the front of the vehicle and the pedestrian, r _1j is the distance difference, and t _1j is the estimated time difference between encounters. This formula shows that when the vehicle speed is 0, there is no interaction force with the pedestrian. When the distance between the vehicle and the pedestrian is closer, the angle difference is smaller, the estimated encounter time is shorter, and the vehicle speed is higher, all of which will lead to an increase in the repulsive force.

k₄为周边物理环境复杂程度修正参数，

为与周围物理环境如建筑物等非移动物体的相互作用力，其中，环境相互作用力参数

为：k ₄ is the correction parameter of the complexity of the surrounding physical environment,

is the interaction force with the surrounding physical environment such as buildings and other non-moving objects, where the environmental interaction force parameter

for:

T为非移动物体体积，体积越大排斥力越大，当体积小于等于车辆可通行大小时，相互作用力为吸引力，在体积大于车辆可通行大小时，当碰撞时间t_1R越小时，排斥力越大，当车辆质量越大时，排斥力越大，车辆速度越大时，排除力越大，速度为0时，不存在相互作用力。T is the volume of a non-moving object. The larger the volume, the greater the repulsive force. When _the volume is less than or equal to the passable size of the vehicle, the interaction force is attraction. The greater the force, the greater the repulsion force when the mass of the vehicle is greater, the greater the repulsion force when the vehicle speed is greater, and there is no interaction force when the speed is 0.

k₅为交通规则影响程度的修正参数即反应车辆对交通规则重视程度，

为交通规则的阻力作用，其中，规则参数

为：k ₅ is the correction parameter of the degree of influence of traffic rules, which reflects the degree of importance that vehicles attach to traffic rules.

is the resistance effect of traffic rules, where the rule parameters

for:

v_lim为交规及交通标志限制最大速度，限制速度越低，阻力越大，当交规及交通标志规定需要停车时如红灯情况，阻力为无限大。v _lim is the maximum speed limited by traffic regulations and traffic signs. The lower the speed limit, the greater the resistance. When traffic regulations and traffic signs require parking, such as a red light, the resistance is infinite.

步骤4、根据采集到的当前的动作信息，即刹车、油门与方向盘，进行归一化预处理，得到实时操作量化参数a_pre；Step 4. Perform normalized preprocessing according to the collected current action information, namely the brake, accelerator and steering wheel, to obtain the real-time operation quantization parameter a _pre ;

进一步地，步骤4通过以下方式实现：Further, step 4 is realized in the following ways:

步骤4.1、通过传感器提取刹车力度，油门力度，方向盘转角数据；Step 4.1, extract brake force, accelerator force, and steering wheel angle data through sensors;

步骤4.2、对这三种数据使用min-max标准化进行归一化处理：Step 4.2, normalize these three kinds of data using min-max standardization:

刹车：brake:

油门：accelerator:

方向盘转角：Steering wheel angle:

*为当前值，min为最小值，max为最大值；* is the current value, min is the minimum value, and max is the maximum value;

由操作规范可知油门与刹车为互斥操作，所以将归一化结果合并得：It can be seen from the operation specification that the accelerator and the brake are mutually exclusive operations, so the normalized results are combined to get:

纵向操作区间：[-1:1]；Vertical operation interval: [-1:1];

横向操作区间：[-1:1]；Horizontal operation interval: [-1:1];

步骤4.3、对纵向操作区间与横向操作区间：[-1:1]*[-1:1]进行降维，构造[-1:1]*[-1:1]到[-1:1]的双射：Step 4.3. Perform dimensionality reduction on the vertical and horizontal operation intervals: [-1:1]*[-1:1], and construct [-1:1]*[-1:1] to [-1:1] The bijection:

纵向值为(0.a₁a₂a₃a₄…)，横向值为(0.b₁b₂b₃b₄…)，构建交叉法，将这两类小数进行分段，在所有非0数位后分割，将分割出的各段进行交叉重组，得到一维的实时操作量化参数a_pre。The vertical value is (0.a ₁ a ₂ a ₃ a ₄ ...), and the horizontal value is (0.b ₁ b ₂ b ₃ b ₄ ...), and the cross method is constructed to divide these two types of decimals into segments. After the 0 digits are divided, each divided segment is cross-recombined to obtain a one-dimensional real-time operation quantization parameter a _pre .

步骤5、据当前车辆在道路中的位置，确定当前车路信息，其中，当前车路信息至少包括当前车路危险度与当前车路危险阈值。Step 5. According to the position of the current vehicle on the road, the current vehicle road information is determined, wherein the current vehicle road information at least includes the current vehicle road hazard degree and the current vehicle road hazard threshold.

进一步地，步骤5通过以下方式实现：Further, step 5 is realized in the following ways:

步骤5.1、确定当前车辆在道路中的位置，至少包括所述当前车辆的与前车距离和所述当前车辆的横向位置，并根据所述当前车辆的与前车距离，确定纵向车路危险值。Step 5.1. Determine the position of the current vehicle on the road, including at least the distance between the current vehicle and the vehicle in front and the lateral position of the current vehicle, and determine the longitudinal roadway risk value according to the distance between the current vehicle and the vehicle in front .

纵向位置的风险与距离前车车尾的距离成反比，即距离前车车尾的距离越近风险越大，建立纵向车路危险函数，以前车车尾为原点设定坐标轴，规定正常安全距离为ζ₁。设定一个最短安全距离η₁，η₁的设定为最大减速度制动到前车刚好不碰撞距离。The risk of the longitudinal position is inversely proportional to the distance from the rear of the vehicle in front, that is, the closer the distance to the rear of the vehicle in front, the greater the risk. Establish a longitudinal vehicle road hazard function, set the coordinate axis as the origin at the rear of the vehicle in front, and specify the normal safety The distance is ζ ₁ . Set a shortest safety distance η ₁ , and the setting of η ₁ is the maximum deceleration braking distance to the vehicle in front just before collision.

y₁为纵向车路危险值，y ₁ is the longitudinal vehicle road hazard value,

x₁为与前车距离；x ₁ is the distance from the vehicle in front;

步骤5.2、根据所述当前车辆的横向位置，确定横向车路危险值：Step 5.2, according to the lateral position of the current vehicle, determine the lateral traffic road risk value:

以车辆车首中心点为原点，建立横向车路危险函数：Taking the center point of the vehicle head as the origin, establish the transverse vehicle road hazard function:

y₂＝0.5cos[(π/T)x₂]-0.5,－T≤x₂≤Ty ₂ ＝0.5cos[(π/T)x ₂ ]-0.5,－T≤x ₂ ≤T

y₂为横向车路危险值，y ₂ is the danger value of the transverse road,

x₂为当前横向位置；x ₂ is the current horizontal position;

T为车道中线到边线距离；T is the distance from the lane centerline to the sideline;

步骤5.3、计算得到当前车路危险度H_x,y：Step 5.3. Calculate the current vehicle road hazard H _x,y :

为不同路段危险距离影响因子，其取值范围为[1,10]，当其值取1时代表当前路段与驾驶状态均为交规规定下标准通行路段与驾驶环境。当其值取10时则表明当前驾驶环境恶劣，道路通行能力极差，周边多发追尾事故的情况，如大雾，结冰路段。

is the influence factor of the dangerous distance of different road sections, and its value range is [1,10]. When the value is 1, it means that the current road section and driving state are all standard traffic sections and driving environment under the traffic regulations. When its value is 10, it indicates that the current driving environment is bad, the road traffic capacity is extremely poor, and there are frequent rear-end collision accidents in the surrounding area, such as heavy fog and icy road sections.

步骤5.4、计算不同场景的当前车路危险阈值：Step 5.4, calculate the current vehicle road hazard thresholds in different scenarios:

risk＝ωγH_x,y risk=ωγH _x,y

ω为场景影响参数。环境响应因子γω is the scene influence parameter. environmental response factor gamma

步骤6、根据所采集的驾驶人状态、驾驶人意图与实时操作量化参数a_pre，结合环境响应因子γ、当前车路危险度与当前车路危险阈值，采用分段函数的方式，确定得到自身的综合驾驶操作动作指数

对应的计算公式为：Step 6. According to the collected driver's state, driver's intention and real-time operation quantitative parameter a _pre , combined with the environmental response factor γ, the current vehicle road hazard degree and the current vehicle road hazard threshold, determine the self by using a piecewise function Comprehensive driving behavior index

The corresponding calculation formula is:

z₁为驾驶人状态，状态不同，环境响应程度不同，z ₁ is the state of the driver, different states have different degrees of environmental response,

γ为环境响应因子，γ is the environmental response factor,

δ为驾驶人意图，代表当前操作与识别驾驶人意图一致程度，δ is the driver's intention, which represents the degree of consistency between the current operation and the recognition of the driver's intention.

H_x,y为当前车路危险度，H _{x, y} is the current road hazard,

σ为道路修正参数，σ is the road correction parameter,

a_pre为实时操作量化参数，a _pre is the quantization parameter for real-time operation,

risk为当前车路危险阈值。risk is the current road hazard threshold.

步骤7、根据相互作用力增长速率，得到预测车路危险度与预测车路危险阈值；Step 7. According to the growth rate of the interaction force, the predicted vehicle road hazard degree and the predicted vehicle road hazard threshold are obtained;

具体的，上述步骤3中的相互作用力与距离成反比，相互作用力增长加速度越快，越容易发生危险，因此可得出如下推导公式：Specifically, the interaction force in the above step 3 is inversely proportional to the distance, the faster the acceleration of the interaction force growth, the more dangerous it is, so the following derivation formula can be obtained:

A_arisk＝ρa_risk A _risk = ρa _risk

式中，v_f为单个单位相互作用力增长速度，v_risk为预测车路危险度，a_f为单个单位相互作用力增长加速度，a_risk为所有周边单位相互作用力增长加速度总和，A_arisk为预测车路危险阈值，ρ为车路危险影响因子，由当前道路复杂程度决定，取值范围为[0,1]。In the formula, v _f is the growth rate of the interaction force of a single unit, v _risk is the predicted vehicle road risk, a _f is the growth acceleration of the interaction force of a single unit, a _risk is the sum of the growth acceleration of the interaction force of all surrounding units, and A _risk is Predict the road hazard threshold, ρ is the influence factor of road hazard, which is determined by the complexity of the current road, and the value range is [0,1].

步骤8、根据驾驶人信息和车路预测信息，计算驾驶操作动作预测指数

其中，驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格以及驾驶人潜意识驱动影响偏差，所述车路预测信息至少包括预测车路危险度以及预测车路危险阈值，所述驾驶操作动作预测指数的计算公式

为：Step 8. Calculate the driving operation action prediction index according to the driver information and vehicle road prediction information

Wherein, the driver information includes at least the driver's state, driver's intention, driver's style, and driver's subconscious driving influence deviation, the vehicle road prediction information includes at least the predicted vehicle road risk degree and the predicted vehicle road risk threshold, and the driving operation Calculation formula of motion prediction index

for:

式中，σ为驾驶人潜意识驱动影响偏差，通过对比交通场景得到最相似场景下历史驾驶人潜意识驱动影响偏差；In the formula, σ is the driver's subconscious driving influence deviation, and the historical driver's subconscious driving influence deviation in the most similar scene is obtained by comparing traffic scenes;

S为驾驶人风格，即通过驾驶人风格测试得到[0,10]的驾驶人风格量化评估，小于1为极不适合的驾驶人风格，影响驾驶人意图与驾驶状态操作反应延时。S is the driver's style, that is, the quantitative evaluation of the driver's style [0,10] is obtained through the driver's style test, and less than 1 is an extremely unsuitable driver's style, which affects the driver's intention and driving state operation reaction delay.

Z_t为驾驶人状态操作反应延时，为设定值，延时越大，驾驶操作动作预测指数越小；Z _t is the response delay of the driver's state operation, which is a set value. The greater the delay, the smaller the driving operation action prediction index;

δ为驾驶人意图，不同驾驶人意图路径对驾驶人潜意识驱动影响偏差影响较大。δ is the driver's intention, and different driver's intention paths have a greater influence on the bias of the driver's subconscious drive.

步骤9、将驾驶操作动作预测指数

与综合驾驶操作动作指数

输入进基于强化学习的人机共驾控制权切换系统，计算调整驾驶人与驾驶系统分别所需的驾驶权权重。Step 9. The driving operation action prediction index

and Comprehensive Driving Operation Action Index

Input it into the human-machine co-driving control right switching system based on reinforcement learning, and calculate the driving right weights required to adjust the driver and the driving system respectively.

具体的，如图3和图4所示，驾驶操作动作预测指数

代表未来时间段的各风险因素影响操作安全程度，偏重自身操作后的其他单位对车辆安全的影响。综合驾驶操作动作指数

代表当前时间各风险因素影响操作安全程度，偏重当前位置是否安全以及当前状态能否有效驾驶；Specifically, as shown in Figure 3 and Figure 4, the driving operation action prediction index

Each risk factor representing the future time period affects the degree of operational safety, focusing on the impact of other units on vehicle safety after its own operation. Comprehensive driving behavior index

Represents the influence of various risk factors at the current time on the degree of operational safety, focusing on whether the current location is safe and whether the current state can be effectively driven;

步骤9.1，使用Z-score标准化公式，对当前时刻的驾驶操作动作预测指数

与综合驾驶操作动作指数

与综合驾驶操作动作指数

的均值与标准差。Step 9.1, use the Z-score standardized formula to predict the driving operation action index at the current moment

and Comprehensive Driving Operation Action Index

mean and standard deviation of .

步骤9.2，将Z-Score标准化后的驾驶操作动作预测指数

与综合驾驶操作动作指数

以及当前分别相应的均值与标准差作为输入参数，输入进基于强化学习的人机共驾控制权切换系统，判断是否满足权重分配条件，若满足，执行步骤9.3，若不满足，重新获取驾驶人信息和车路预测信息。Step 9.2, the driving operation action prediction index after Z-Score normalization

and Comprehensive Driving Operation Action Index

And the current corresponding mean and standard deviation as input parameters, input into the human-machine co-driving control switching system based on reinforcement learning, and judge whether the weight distribution conditions are met. If yes, perform step 9.3. If not, obtain the driver again. information and traffic forecast information.

具体的，参数Z-Score表示抽样样本值与数据均值相差几个标准差的数目，以驾驶操作动作预测指数

为例，第一参数

为当前输入的驾驶操作动作预测指数

(抽样样本值)与从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数

的均值(数据均值)相差标准差的数目，该标准差为从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数

的标准差。第二参数

类似，不再赘述。Specifically, the parameter Z-Score represents the number of standard deviations between the sampling sample value and the data mean value, and the driving operation action prediction index

For example, the first parameter

For the current input driving operation action prediction index

(sampling sample value) and all input driving operation behavior prediction indexes from the beginning of driving behavior to the current moment

The number of standard deviations from the mean (data mean) of , which is the prediction index of all input driving operation actions from the beginning of the driving behavior to the current moment

standard deviation of . second parameter

Similar, no more details.

本实施例中，控制权切换系统中权重分配条件有三种：In this embodiment, there are three weight distribution conditions in the control right switching system:

(1)连续5次第一参数

与第二参数

均小于或等于第一触发阈值；(1) 5 times the first parameter in a row

with the second parameter

are less than or equal to the first trigger threshold;

具体的，判断第一参数

与第二参数

是否均小于或等于第一触发阈值，该第一触发阈值的取值可以为-3，即当输入的驾驶操作动作预测指数

与综合驾驶操作动作指数

是否均比当前的相应均值小于或等于3个标准差，对应的公式为：Specifically, determine the first parameter

with the second parameter

Whether they are all less than or equal to the first trigger threshold, the value of the first trigger threshold can be -3, that is, when the input driving operation prediction index

and Comprehensive Driving Operation Action Index

Whether it is less than or equal to 3 standard deviations than the current corresponding mean value, the corresponding formula is:

且

and

此类情况说明现在状态不满足安全状态且未来也不满足安全状态，因此，触发控制权切换系统开始工作。在这一条件下，当连续五次输入的指数(

和

)均满足

且

时，需要调整驾驶人与驾驶系统分别所需的驾驶权权重。This kind of situation shows that the current state does not satisfy the safe state and the future will not satisfy the safe state, therefore, the control right switching system is triggered to start working. Under this condition, when the exponent (

and

) are satisfied

and

When , it is necessary to adjust the driving right weights required by the driver and the driving system respectively.

(2)连续3次第二参数

小于或等于第二触发阈值；(2) 3 consecutive second parameters

less than or equal to the second trigger threshold;

具体的，第二触发阈值的取值可以为-4。当输入的综合驾驶操作动作指数

当前的均值小于或等于4个标准差，即第二参数

时，说明当前状态极不满足安全状态需要驾驶系统紧急介入，触发控制权切换系统开始工作。在这一条件下，当连续三次输入的指数

满足第二参数

时，调整驾驶人与驾驶系统分别所需的驾驶权权重。Specifically, the value of the second trigger threshold may be -4. When the input comprehensive driving operation behavior index

The current mean is less than or equal to 4 standard deviations, the second parameter

, it means that the current state is extremely unsatisfactory and the driving system needs to intervene urgently, triggering the control right switching system to start working. Under this condition, when three consecutive input indices

satisfy the second parameter

When , adjust the driving right weights required by the driver and the driving system respectively.

(3)连续3次第一参数

小于或等于第二触发阈值；(3) The first parameter for 3 consecutive times

less than or equal to the second trigger threshold;

具体的，当输入的驾驶操作动作预测指数

当前的均值小于或等于4个标准差，即第一参数

时，说明未来状态极不满足安全状态且驾驶员介入也不能自行修正之后状态为安全，触发控制权切换系统开始工作。在这一条件下，当连续三次输入的指数

满足第一参数

时，调整驾驶人与驾驶系统分别所需的驾驶权权重。Specifically, when the input driving operation action prediction index

The current mean is less than or equal to 4 standard deviations, the first parameter

When , it means that the future state is extremely unsatisfactory to the safe state and the state is safe after the driver intervenes and cannot be corrected by himself, triggering the control right switching system to start working. Under this condition, when three consecutive input indices

satisfy the first parameter

步骤9.3，基于Q学习算法，利用输入参数调整Q学习算法中的学习状态，并根据Q学习算法中下一个状态的价值最大值中的动作对驾驶人驾驶权权重进行赋值，其中，驾驶系统驾驶权权重为1与驾驶人驾驶权权重的差值。Step 9.3, based on the Q learning algorithm, use the input parameters to adjust the learning state in the Q learning algorithm, and assign a value to the driver's driving right weight according to the action in the value maximum value of the next state in the Q learning algorithm, wherein the driving system driving The weight is the difference between 1 and the driver's driving right weight.

本实施例中，控制权切换系统中对驾驶人与驾驶系统分别所需的驾驶权权重算法为Q学习算法，具体的训练过程如下：In this embodiment, the driving right weight algorithm required by the driver and the driving system in the control right switching system is the Q learning algorithm, and the specific training process is as follows:

(1)Q学习的过渡规则为：(1) The transition rule of Q-learning is:

Q(state,action)＝R(state,action)+Gamma*MaxQ(next state,all actions)Q(state,action)＝R(state,action)+Gamma*MaxQ(next state,all actions)

即Q(状态，动作)＝R(状态，动作)+Gamma*最大[Q(下一个状态，所有动作)That is, Q(state, action)=R(state, action)+Gamma*max[Q(next state, all actions)

Gamma为折扣因子(discount factor)，折扣因子越大，MaxQ所起到的作用就越大。在这里，可以理解为眼前价值(R)，和记忆中的价值。MaxQ指的便是记忆中的价值，它是指记忆里下一个状态的动作中价值的最大值。Gamma is the discount factor. The larger the discount factor, the greater the effect of MaxQ. Here, it can be understood as the immediate value (R) and the value in memory. MaxQ refers to the value in memory, which refers to the maximum value of the action in the next state in memory.

(2)添加一个“矩阵Q”作为强化学习智能体即驾驶权切换系统的大脑，即通过经验学到的东西。“矩阵Q”的行表示驾驶权切换系统当前的状态，列表示下一个状态(节点之间的链接)的可能动作。驾驶权切换系统初始值为0，即“矩阵Q”被初始化为零。“矩阵Q”只能从一个元素开始。如果找到新状态，则更新“矩阵Q”，这被称为无监督学习。(2) Add a "matrix Q" as the brain of the reinforcement learning agent, the driving-right switching system, that is, what is learned through experience. The rows of the "matrix Q" represent the current state of the driving authority switching system, and the columns represent possible actions for the next state (links between nodes). The initial value of the driving right switching system is 0, that is, the "matrix Q" is initialized to zero. A "matrix Q" can only start with one element. If a new state is found, the "matrix Q" is updated, which is known as unsupervised learning.

(3)驾驶人驾驶权权重为power(driver)，驾驶系统驾驶权权重为power(system)＝1-power(driver)，控制权切换系统使用强化学习Q学习算法调整驾驶人驾驶权权重power(driver)，Q学习动作为直接给驾驶人驾驶权权重power(driver)赋值，权重值取值范围为[0,1]，步长为0.05。(3) The driver's driving right weight is power(driver), the driving system's driving right weight is power(system)=1-power(driver), and the control right switching system uses the reinforcement learning Q-learning algorithm to adjust the driver's driving right weight power( driver), the Q learning action is to directly assign a value to the driver’s driving weight power(driver), the value range of the weight value is [0,1], and the step size is 0.05.

(4)Q学习状态设置为0，1，2，3，4，5。其中：(4) The Q learning state is set to 0, 1, 2, 3, 4, 5. in:

0代表

且

0 means

and

1代表

1 represents

2代表

2 representatives

3代表

3 representatives

4代表

4 representatives

5代表

且

5 for

and

(5)触发控制权切换系统开始工作后，得到初始状态为0，1，2中一种，当状态通过(3)中的动作(调整权重)依旧状态为0，1，2中一种时，奖励为-1，更新“矩阵Q”，将该状态与使用动作对应矩阵中的元素赋值为-1；(5) After triggering the control right switching system to start working, the initial state is one of 0, 1, and 2. When the state passes the action in (3) (adjusting the weight), the state is still one of 0, 1, and 2. , the reward is -1, update the "matrix Q", and assign -1 to the elements in the matrix corresponding to the state and the action;

当状态通过(3)中的动作达到状态3，4时，奖励为1，更新“矩阵Q”，将该状态与使用动作对应矩阵中的元素赋值为1；When the state reaches state 3 and 4 through the action in (3), the reward is 1, and the "matrix Q" is updated, and the element in the matrix corresponding to the state and the action is assigned a value of 1;

当状态通过(3)中的动作达到状态5时，奖励为100，更新“矩阵Q”，将该状态与使用动作对应矩阵中的元素赋值为100，状态5为目标状态，最终得到Q-table如图5(元素未赋值)所示。When the state reaches state 5 through the action in (3), the reward is 100, update the "matrix Q", assign the element in the matrix corresponding to the state and the action to 100, state 5 is the target state, and finally get the Q-table As shown in Figure 5 (element is not assigned).

(6)选择道路环境，应用(1)至(5)，得到初始“矩阵Q”的Q-table作为Q学习Q(state,action)＝R(state,action)+Gamma*MaxQ(next state,all actions)中的MaxQ(next state,all actions)的选择，其中，Gamma根据道路相似程度在[0,1]进行选择，R(state,action)为当前道路环境得到的状态的价值：状态为0，1，2时，奖励为-1，状态为3，4时，奖励为1，状态为5时，奖励为100。(6) Select the road environment, apply (1) to (5), and get the Q-table of the initial "matrix Q" as Q learning Q(state,action)=R(state,action)+Gamma*MaxQ(next state, all actions) MaxQ(next state, all actions) selection, where Gamma selects in [0,1] according to the road similarity, R(state,action) is the value of the state obtained by the current road environment: the state is When the state is 0, 1, 2, the reward is -1, when the state is 3, 4, the reward is 1, and when the state is 5, the reward is 100.

控制权切换系统在计算驾驶权重时，先根据相似路段的Q-table预先计算，可以调整的权重以及达到的状态所得的奖励，奖励为上式中的MaxQ(next state,all actions)。所以Q(state,action)为当前道路环境下的R(state,action)值加上MaxQ(next state,allactions)。When the control right switching system calculates the driving weight, it first pre-calculates according to the Q-table of similar road sections, the weight that can be adjusted and the reward obtained by the achieved state. The reward is MaxQ(next state, all actions) in the above formula. So Q(state,action) is the value of R(state,action) in the current road environment plus MaxQ(next state,allactions).

当Q(state,action)为最大时，MaxQ(next state,all actions)中的动作nextstate，即为需要调整的权重，记为驾驶人驾驶权权重power(driver)。When Q(state,action) is the maximum, the action nextstate in MaxQ(next state,all actions) is the weight that needs to be adjusted, which is recorded as the driver's driving right weight power(driver).

(7)根据计算得到的Q(state,action)值更新Q-table。(7) Update the Q-table according to the calculated Q(state,action) value.

(8)当达到状态5后停止切换权重。(8) Stop switching weights when state 5 is reached.

以上结合附图详细说明了本申请的技术方案，本申请提出了一种基于强化学习的人机共驾控制权切换方法，该方法适用于基于强化学习的人机共驾控制权切换系统对驾驶人与驾驶系统之间驾驶权重的分配，该方法包括：根据驾驶人信息和车路预测信息，计算驾驶操作动作预测指数；将所述驾驶操作动作预测指数与综合驾驶操作动作指数，输入至所述控制权切换系统，计算所述驾驶人与所述驾驶系统之间的所述驾驶权重。通过本申请中的技术方案，有效解决了车辆纵向与横向综合的风险，弱化了驾驶人本身带来的不确定性的影响，对驾驶人进行不同角度的综合考虑从而减少了对驾驶人的判断误差。The technical solution of the present application has been described in detail above in conjunction with the accompanying drawings. This application proposes a method for switching control rights of human-machine co-driving based on reinforcement learning. The distribution of driving weight between human and driving system, the method includes: calculating the driving operation action prediction index according to the driver information and vehicle road prediction information; inputting the driving operation action prediction index and the comprehensive driving operation action index into the The control right switching system calculates the driving weight between the driver and the driving system. Through the technical solution in this application, the risk of vehicle vertical and horizontal integration is effectively solved, the influence of uncertainty brought by the driver itself is weakened, and the comprehensive consideration of different angles for the driver reduces the judgment of the driver error.

本申请中的步骤可根据实际需求进行顺序调整、合并和删减。The steps in this application can be adjusted, combined and deleted according to actual needs.

本申请装置中的单元可根据实际需求进行合并、划分和删减。Units in the device of the present application can be combined, divided and deleted according to actual needs.

尽管参考附图详地公开了本申请，但应理解的是，这些描述仅仅是示例性的，并非用来限制本申请的应用。本申请的保护范围由附加权利要求限定，并可包括在不脱离本申请保护范围和精神的情况下针对发明所作的各种变型、改型及等效方案。While the present application has been disclosed in detail with reference to the accompanying drawings, it should be understood that these descriptions are illustrative only and are not intended to limit the application of the present application. The protection scope of the present application is defined by the appended claims, and may include various changes, modifications and equivalent solutions for the invention without departing from the protection scope and spirit of the present application.

Claims

1. A human-machine joint driving control power switching method based on reinforcement learning, characterized in that the method is applicable to the distribution of driving weights between the driver and the driving system by the human-machine joint driving control power switching system based on reinforcement learning, The methods include:

Calculate the driving operation action prediction index according to the driver information and vehicle road prediction information;

inputting the driving operation behavior prediction index and the comprehensive driving operation behavior index into the control right switching system, and calculating the driving weight between the driver and the driving system;

The driver information includes at least the driver's state, driver's intention, driver's style, and driver's subconscious driving influence deviation, and the vehicle road prediction information includes at least the predicted vehicle road risk degree and the predicted vehicle road risk threshold;

The driving operation action prediction index represents the impact of various risk factors in the future time period on the degree of operation safety, and focuses on the impact of other units on vehicle safety after one's own operation;

The comprehensive driving operation action index represents the current time and various risk factors affect the safety of the operation, focusing on whether the current position is safe and whether the current state can be effectively driven;

The formula for calculating the driving operation prediction index is:

In the formula,

is the driving operation action prediction index, Z _t is the driver’s state operation reaction delay, σ is the driver’s subconscious driving influence deviation, δ is the driver’s intention, S is the driver’s style, and v _risk is The predicted vehicle road risk, A _risk is the predicted vehicle road risk threshold.

2. The method for man-machine co-driving control switching based on reinforcement learning according to claim 1, characterized in that, the calculation formula of the driver's subconscious driving influence deviation σ is:

R _d = |dq _ki |

In the formula, σ is the driver’s subconscious driving influence deviation, sum is the number of collected traffic scenes, D _i is a series of subconscious driving strengths in a traffic scene time period, ρ′,, is an undetermined parameter, α is the subconscious weight, β d is the current lateral position of the vehicle, q _ki is the fitted lateral position of the vehicle in this scene, a is the vehicle acceleration, and R _d is the position parameter.

3. The method for switching control rights of man-machine co-driving based on reinforcement learning according to claim 1, wherein the driver information at least includes driver status, driver intention, driver style, and the comprehensive driving operation The calculation process of the action index specifically includes:

Determine the current vehicle road information according to the current vehicle position on the road, wherein the current vehicle road information at least includes the current vehicle road hazard degree and the current vehicle road hazard threshold;

According to the driver information and the current vehicle road information, combined with the environmental response factor, the comprehensive driving operation behavior index is determined by adopting a piecewise function, wherein the calculation formula of the comprehensive driving operation behavior index is:

In the formula,

is the comprehensive driving operation action index, z ₁ is the state of the driver, γ is the environmental response factor, H _{x, y} is the current vehicle road risk, σ is the road correction parameter, and a _pre is the real-time operation A quantitative parameter, risk is the current vehicle road hazard threshold.

4. The method for switching control rights of man-machine co-driving based on reinforcement learning according to claim 3, wherein said determining the current vehicle road information according to the position of the current vehicle on the road specifically comprises:

determining the position of the current vehicle on the road, including at least the distance between the current vehicle and the vehicle in front and the lateral position of the current vehicle;

Determine the longitudinal vehicle road risk value according to the distance between the current vehicle and the preceding vehicle;

determining a lateral roadway risk value according to the current lateral position of the vehicle;

According to the longitudinal vehicle road risk value and the transverse vehicle road risk value, the current vehicle road risk degree is calculated, and the corresponding calculation formula is:

In the formula, H _{x, y} is the current road hazard,

According to the current vehicle road risk degree, calculate the current vehicle road risk threshold in different scenarios, and record the current vehicle road risk threshold and the current vehicle road risk degree as the current vehicle road information.

5. The method for man-machine co-driving control switching based on reinforcement learning according to claim 3, characterized in that, the calculation formula of the environmental response factor γ is:

In the formula, m is the mass of the vehicle, M is the vehicle type and purpose correction parameter, _k1 is the dynamic correction parameter,

is a rule parameter.

6. The method for switching control rights of man-machine co-driving based on reinforcement learning according to any one of claims 1 to 5, wherein the calculation of the driving force between the driver and the driving system weight, including:

Step 9.1, use the Z-score standardized formula to predict the driving operation action index at the current moment

and Comprehensive Driving Operation Action Index

and Comprehensive Driving Operation Action Index

The mean and standard deviation of ;

Step 9.2, the driving operation action prediction index after Z-Score normalization

and Comprehensive Driving Operation Action Index

And the current corresponding mean and standard deviation as input parameters, input into the human-machine co-driving control switching system based on reinforcement learning, and judge whether the weight distribution conditions are met. If yes, perform step 9.3. If not, obtain the driver again. Information and road forecast information;

Step 9.3, based on the Q learning algorithm, use the input parameters to adjust the learning state in the Q learning algorithm, and assign a value to the driver's driving right weight according to the action in the value maximum value of the next state in the Q learning algorithm, wherein the driving system driving The weight is the difference between 1 and the driver's driving right weight.

7. The method for switching control rights of man-machine co-driving based on reinforcement learning according to claim 6, wherein the weight distribution conditions specifically include:

5 consecutive first parameter