CN115071758B - A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning - Google Patents

A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning Download PDF

Info

Publication number
CN115071758B
CN115071758B CN202210758672.3A CN202210758672A CN115071758B CN 115071758 B CN115071758 B CN 115071758B CN 202210758672 A CN202210758672 A CN 202210758672A CN 115071758 B CN115071758 B CN 115071758B
Authority
CN
China
Prior art keywords
driving
driver
vehicle
current
road
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210758672.3A
Other languages
Chinese (zh)
Other versions
CN115071758A (en
Inventor
陈慧勤
朱嘉祺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202210758672.3A priority Critical patent/CN115071758B/en
Publication of CN115071758A publication Critical patent/CN115071758A/en
Application granted granted Critical
Publication of CN115071758B publication Critical patent/CN115071758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/005Handover processes
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/005Handover processes
    • B60W60/0059Estimation of the risk associated with autonomous or manual driving, e.g. situation too complex, sensor failure or driver incapacity

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

The application discloses a reinforcement learning-based man-machine driving sharing control right switching method, which is suitable for distribution of a reinforcement learning-based man-machine driving sharing control right switching system to driving weights between a driver and a driving system, and comprises the following steps: calculating a driving operation action prediction index according to the driver information and the vehicle road prediction information; and inputting the driving operation action prediction index and the comprehensive driving operation action index into the control weight switching system, and calculating the driving weight between the driver and the driving system. Through the technical scheme in the application, the risk of longitudinal and transverse synthesis of the vehicle is effectively solved, the influence of uncertainty caused by a driver is weakened, and the driver is comprehensively considered at different angles, so that the judgment error of the driver is reduced.

Description

一种基于强化学习的人机共驾控制权切换方法A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning

技术领域technical field

本申请涉及智能驾驶的技术领域,具体而言,涉及一种基于强化学习的人机共驾控制权切换方法。The present application relates to the technical field of intelligent driving, and in particular, relates to a control switching method for man-machine co-driving based on reinforcement learning.

背景技术Background technique

在现有自动驾驶技术中,通常采用控制权切换的方式,对驾驶员的驾驶行为进行修正,以提高车辆驾驶的安全性。In the existing automatic driving technology, the way of switching control rights is usually adopted to modify the driver's driving behavior to improve the safety of vehicle driving.

如专利CN 109795486 A根据驾驶员输入力矩Td和左右车轮到车道边界的时间TLC动态调整共驾系数(范围为0~1),实现从驾驶员到辅助控制系的逐渐过渡,通过模糊控制确定此时的共驾系数。但这种方式虽然解决了驾驶横向偏离危险,但是并没有考虑驾驶过程中的纵向风险。For example, in patent CN 109795486 A, according to the driver input torque Td and the time TLC between the left and right wheels to the lane boundary, the co-driving coefficient (range is 0-1) is dynamically adjusted to realize the gradual transition from the driver to the auxiliary control system, which is determined by fuzzy control. co-driving coefficient. However, although this method solves the danger of driving lateral deviation, it does not consider the longitudinal risk in the driving process.

又如专利CN 108469806 A对当前驾驶环境、车辆和驾驶员的状态进行关键因子构建,对关键因子进行态势评估,以及同步评估自动驾驶系统和驾驶员的驾驶能力,判定是否能够进行驾驶权转移。该方案虽然考虑了多个的可能影响驾驶安全的因素,但是在驾驶权切换中对于驾驶能力的评估方式太过复杂,主观与随机因素较大,考虑数据过多,实时性与稳定性较差。Another example is patent CN 108469806 A, which constructs key factors of the current driving environment, vehicle and driver status, conducts situational assessment of key factors, and simultaneously evaluates the driving ability of the automatic driving system and the driver to determine whether the driving right can be transferred. Although this scheme considers multiple factors that may affect driving safety, the evaluation method for driving ability in the switching of driving rights is too complicated, with large subjective and random factors, too much data to consider, and poor real-time performance and stability. .

再如论文《基于驾驶人风险响应机制的人机共驾模型》对环境风险进行量化,通过拟合环境风险作用与驾驶人的行驶加速度,得到安全风险响应策略,通过策略偏差来进行人机共驾控制权柔性切换。其解决了驾驶人状态和环境安全的耦合问题,但是其安全策略建立在大量行车片段上,而这些片段并不能完全概括全部安全操作,并且只解决了高速路上跟车超车时的切换问题。同时以上控制权切换方式均只考虑了当前时刻的安全问题,并没有考虑未来时间段可能导致的交通危险。Another example is the paper "Human-Machine Co-Driving Model Based on Driver's Risk Response Mechanism" quantifies the environmental risk, obtains the safety risk response strategy by fitting the environmental risk effect and the driver's driving acceleration, and implements the human-machine co-driving strategy through the strategy deviation. Flexible switching of driving control. It solves the coupling problem of driver status and environmental safety, but its safety strategy is based on a large number of driving segments, which cannot fully summarize all safety operations, and only solves the switching problem when following and overtaking on the highway. At the same time, the above control right switching methods only consider the safety issues at the current moment, and do not consider the traffic hazards that may be caused in the future time period.

因此,现有自动驾驶中控制权切换方案的安全性、稳定性有待提高。Therefore, the safety and stability of the existing control right switching schemes in automatic driving need to be improved.

发明内容Contents of the invention

本申请的目的在于:如何有效解决车辆纵向与横向综合的风险,减少对驾驶人的判断误差以提高驾驶权切换的准确性与安全性。The purpose of this application is: how to effectively solve the risk of vehicle vertical and horizontal integration, reduce the judgment error of the driver, and improve the accuracy and safety of driving right switching.

本申请的技术方案是:提供了一种基于强化学习的人机共驾控制权切换方法,该方法包括:该方法适用于基于强化学习的人机共驾控制权切换系统对驾驶人与驾驶系统之间驾驶权重的分配,方法包括:根据驾驶人信息和车路预测信息,计算驾驶操作动作预测指数;将驾驶操作动作预测指数与综合驾驶操作动作指数,输入至控制权切换系统,计算驾驶人与驾驶系统之间的驾驶权重。The technical solution of the present application is to provide a method for switching control rights of human-machine co-driving based on reinforcement learning. The method includes: calculating the driving operation action prediction index according to the driver information and vehicle road prediction information; inputting the driving operation action prediction index and the comprehensive driving operation action index into the control right switching system to calculate the driver Driving weight between driving system and driving system.

上述任一项技术方案中,进一步地,驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格以及驾驶人潜意识驱动影响偏差,车路预测信息至少包括预测车路危险度以及预测车路危险阈值,In any of the above technical solutions, further, the driver information includes at least the driver's state, driver's intention, driver's style, and driver's subconscious driving influence deviation, and the vehicle road prediction information includes at least the predicted vehicle road risk and the predicted vehicle road risk. danger threshold,

驾驶操作动作预测指数的计算公式为:The calculation formula of the driving operation behavior prediction index is:

Figure BDA0003720378570000021
Figure BDA0003720378570000021

式中,

Figure BDA0003720378570000022
为驾驶操作动作预测指数,Zt为驾驶人状态操作反应延时,σ为驾驶人潜意识驱动影响偏差,δ为驾驶人意图,S为驾驶人风格,vrisk为预测车路危险度,Aarisk为预测车路危险阈值。In the formula,
Figure BDA0003720378570000022
is the driving operation action prediction index, Z t is the driver’s state operation reaction delay, σ is the driver’s subconscious driving influence deviation, δ is the driver’s intention, S is the driver’s style, v risk is the predicted vehicle road risk, A risk To predict the road hazard threshold.

上述任一项技术方案中,进一步地,述驾驶人潜意识驱动影响偏差σ的计算公式为:In any one of the above technical solutions, further, the formula for calculating the driver's subconscious driving influence deviation σ is:

Figure BDA0003720378570000023
Figure BDA0003720378570000023

Figure BDA0003720378570000024
Figure BDA0003720378570000024

Rd=|d-qki|R d = |dq ki |

式中,σ为驾驶人潜意识驱动影响偏差,sum为采集交通场景数量,Di为一次交通场景时间段内的一系列潜意识驱动强度,ρ′,τ,ω为待定参数,α为潜意识侧重权重,β为驾驶人个人安全倾向权重,d为车辆当前横向位置,qki为车辆在此场景(标签)下的拟合横向位置,a为车辆加速度,Rd为位置参数。In the formula, σ is the driver’s subconscious driving influence deviation, sum is the number of collected traffic scenes, D i is a series of subconscious driving strengths in a traffic scene time period, ρ′, τ, ω are undetermined parameters, and α is the subconscious weight , β is the weight of the driver's personal safety tendency, d is the current lateral position of the vehicle, q ki is the fitted lateral position of the vehicle in this scene (label), a is the vehicle acceleration, and R d is the position parameter.

上述任一项技术方案中,进一步地,驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格,综合驾驶操作动作指数的计算过程具体包括:In any one of the above technical solutions, further, the driver information includes at least the driver's state, driver's intention, and driver's style, and the calculation process of the comprehensive driving operation action index specifically includes:

根据当前车辆在道路中的位置,确定当前车路信息,其中,当前车路信息至少包括当前车路危险度以及当前车路危险阈值;According to the position of the current vehicle on the road, determine the current vehicle road information, wherein the current vehicle road information at least includes the current vehicle road hazard degree and the current vehicle road hazard threshold;

根据驾驶人信息和当前车路信息,结合环境响应因子,采用分段函数的方式,确定综合驾驶操作动作指数,其中,综合驾驶操作动作指数的计算公式为:Based on driver information and current vehicle road information, combined with environmental response factors, the comprehensive driving operation index is determined by using a piecewise function. The formula for calculating the comprehensive driving operation index is:

Figure BDA0003720378570000031
Figure BDA0003720378570000031

式中,

Figure BDA0003720378570000032
为综合驾驶操作动作指数,z1为驾驶人状态,γ为环境响应因子,Hx,y为当前车路危险度,σ为道路修正参数,apre为实时操作量化参数,risk为当前车路危险阈值。In the formula,
Figure BDA0003720378570000032
is the comprehensive driving operation action index, z 1 is the state of the driver, γ is the environmental response factor, H x, y is the current vehicle road risk, σ is the road correction parameter, a pre is the real-time operation quantitative parameter, and risk is the current vehicle road danger threshold.

上述任一项技术方案中,进一步地,根据当前车辆在道路中的位置,确定当前车路信息,具体包括:In any one of the above technical solutions, further, according to the current position of the vehicle on the road, the current vehicle road information is determined, specifically including:

确定当前车辆在道路中的位置,至少包括当前车辆的与前车距离和当前车辆的横向位置;Determining the position of the current vehicle on the road, including at least the distance from the current vehicle to the vehicle in front and the lateral position of the current vehicle;

根据当前车辆的与前车距离,确定纵向车路危险值;According to the distance between the current vehicle and the vehicle in front, determine the risk value of the longitudinal vehicle road;

根据当前车辆的横向位置,确定横向车路危险值;According to the current lateral position of the vehicle, determine the risk value of the lateral roadway;

根据纵向车路危险值、横向车路危险值,计算当前车路危险度,对应的计算公式为:Calculate the current vehicle road risk according to the longitudinal vehicle road risk value and the transverse vehicle road risk value. The corresponding calculation formula is:

Figure BDA0003720378570000033
Figure BDA0003720378570000033

式中,Hx,y为当前车路危险度,

Figure BDA0003720378570000034
为不同路段危险距离影响因子,其取值范围为[1,10],y1为纵向车路危险值,y2为横向车路危险值;In the formula, H x, y is the current road hazard,
Figure BDA0003720378570000034
is the risk distance influencing factor of different road sections, and its value range is [1,10], y 1 is the risk value of the longitudinal vehicle road, and y 2 is the risk value of the transverse vehicle road;

根据当前车路危险度,计算不同场景的当前车路危险阈值,将当前车路危险阈值和当前车路危险度记作当前车路信息。According to the current vehicle road risk, calculate the current vehicle road risk threshold in different scenarios, and record the current vehicle road risk threshold and the current vehicle road risk degree as the current vehicle road information.

上述任一项技术方案中,进一步地,环境响应因子γ的计算公式为:In any of the above technical solutions, further, the calculation formula of the environmental response factor γ is:

Figure BDA0003720378570000041
Figure BDA0003720378570000041

式中,m为车辆质量,M为车辆类型及目的修正参数,k1为动力学修正参数,

Figure BDA0003720378570000042
代表车辆的期望速度以及速度方向,vlimleast(t)为最小速度值,k2为交通场景修正参数,
Figure BDA0003720378570000043
为车辆相互作用力参数,k3为行人遵守交通规则程度的修正参数,
Figure BDA0003720378570000044
为行人相互作用力参数,k4为周边物理环境复杂程度修正参数,
Figure BDA0003720378570000045
为环境相互作用力参数,k5为交通规则影响程度的修正参数,
Figure BDA0003720378570000046
为规则参数。In the formula, m is the mass of the vehicle, M is the vehicle type and purpose correction parameter, k1 is the dynamic correction parameter,
Figure BDA0003720378570000042
Represents the expected speed and direction of the vehicle, v limleast (t) is the minimum speed value, k 2 is the traffic scene correction parameter,
Figure BDA0003720378570000043
is the vehicle interaction force parameter, k 3 is the correction parameter of the degree of pedestrian compliance with traffic rules,
Figure BDA0003720378570000044
is the pedestrian interaction force parameter, k 4 is the correction parameter of the complexity of the surrounding physical environment,
Figure BDA0003720378570000045
is the environmental interaction force parameter, k 5 is the correction parameter of the influence degree of traffic rules,
Figure BDA0003720378570000046
is a rule parameter.

上述任一项技术方案中,进一步地,计算驾驶人与驾驶系统之间的驾驶权重,具体包括:步骤9.1,使用Z-score标准化公式,对当前时刻的驾驶操作动作预测指数

Figure BDA0003720378570000047
与综合驾驶操作动作指数
Figure BDA0003720378570000048
进行标准化,分别计算这次驾驶中从开始到当前的驾驶操作动作预测指数
Figure BDA0003720378570000049
与综合驾驶操作动作指数
Figure BDA00037203785700000410
的均值与标准差;步骤9.2,将Z-Score标准化后的驾驶操作动作预测指数
Figure BDA00037203785700000411
与综合驾驶操作动作指数
Figure BDA00037203785700000412
以及当前分别相应的均值与标准差作为输入参数,输入进基于强化学习的人机共驾控制权切换系统,判断是否满足权重分配条件,若满足,执行步骤9.3,若不满足,重新获取驾驶人信息和车路预测信息;步骤9.3,基于Q学习算法,利用输入参数调整Q学习算法中的学习状态,并根据Q学习算法中下一个状态的价值最大值中的动作对驾驶人驾驶权权重进行赋值,其中,驾驶系统驾驶权权重为1与驾驶人驾驶权权重的差值。In any of the above technical solutions, further, the calculation of the driving weight between the driver and the driving system specifically includes: step 9.1, using the Z-score standardized formula to predict the driving operation action index at the current moment
Figure BDA0003720378570000047
and Comprehensive Driving Operation Action Index
Figure BDA0003720378570000048
Standardize and calculate the driving operation action prediction index from the beginning to the current driving in this driving respectively
Figure BDA0003720378570000049
and Comprehensive Driving Operation Action Index
Figure BDA00037203785700000410
mean and standard deviation of ; Step 9.2, Z-Score standardized driving action prediction index
Figure BDA00037203785700000411
and Comprehensive Driving Operation Action Index
Figure BDA00037203785700000412
And the current corresponding mean and standard deviation as input parameters, input into the human-machine co-driving control switching system based on reinforcement learning, and judge whether the weight distribution conditions are met. If yes, perform step 9.3. If not, obtain the driver again. Information and vehicle road prediction information; step 9.3, based on the Q learning algorithm, use the input parameters to adjust the learning state in the Q learning algorithm, and adjust the driver's driving right weight according to the action in the value maximum value of the next state in the Q learning algorithm Assignment, wherein, the weight of the driving right of the driving system is the difference between 1 and the weight of the driving right of the driver.

上述任一项技术方案中,进一步地,权重分配条件具体包括:连续5次第一参数

Figure BDA00037203785700000413
与第二参数
Figure BDA00037203785700000414
均小于或等于第一触发阈值;或者,连续3次第二参数
Figure BDA00037203785700000415
小于或等于第二触发阈值;或者,连续3次第一参数
Figure BDA0003720378570000051
小于或等于第二触发阈值,其中,第一参数
Figure BDA0003720378570000052
为当前输入的驾驶操作动作预测指数
Figure BDA0003720378570000053
与从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数
Figure BDA0003720378570000054
的均值相差标准差的数目,第二参数
Figure BDA0003720378570000055
为当前输入的综合驾驶操作动作指数
Figure BDA0003720378570000056
与从驾驶行为开始到当前时刻为止所有输入的综合驾驶操作动作指数
Figure BDA0003720378570000057
的均值相差标准差的数目。In any of the above technical solutions, further, the weight distribution conditions specifically include: 5 consecutive times of the first parameter
Figure BDA00037203785700000413
with the second parameter
Figure BDA00037203785700000414
All are less than or equal to the first trigger threshold; or, 3 consecutive times of the second parameter
Figure BDA00037203785700000415
Less than or equal to the second trigger threshold; or, 3 consecutive times of the first parameter
Figure BDA0003720378570000051
less than or equal to the second trigger threshold, where the first parameter
Figure BDA0003720378570000052
For the current input driving operation action prediction index
Figure BDA0003720378570000053
and all input driving operation behavior prediction indices from the beginning of driving behavior to the current moment
Figure BDA0003720378570000054
The number of standard deviations from the mean of , the second parameter
Figure BDA0003720378570000055
is the currently input comprehensive driving operation action index
Figure BDA0003720378570000056
and the comprehensive driving operation action index of all inputs from the beginning of driving behavior to the current moment
Figure BDA0003720378570000057
The number of standard deviations from the mean of .

本申请的有益效果是:The beneficial effect of this application is:

本申请中的技术方案有效解决了车辆纵向与横向综合的风险,弱化了驾驶人本身带来的不确定性的影响,对驾驶人进行不同角度的综合考虑从而减少了对驾驶人的判断误差,同时适用于多种交通场景,并且将未来时间段可能导致的交通危险也进行了综合考虑,进一步提高了驾驶权切换的准确性与安全性,最终将所有因素综合为两个指数输入进切换系统,数据量少且准确,实时性更高。The technical solution in this application effectively solves the risk of vehicle longitudinal and lateral integration, weakens the influence of uncertainty brought by the driver itself, and comprehensively considers the driver from different angles to reduce the judgment error of the driver. At the same time, it is applicable to a variety of traffic scenarios, and the traffic hazards that may be caused in the future time period are also considered comprehensively, which further improves the accuracy and safety of driving right switching, and finally integrates all factors into two indices and inputs them into the switching system , the amount of data is small and accurate, and the real-time performance is higher.

在本申请的优选实现方式中,考虑到了驾驶人通过经验与潜意识对驾驶的影响,减少了切换系统判断负担,实时性更好。并可以提前预知其他车辆可能对本车造成的风险,避免驾驶过程中遭遇追尾与碰撞。In the preferred implementation of the present application, the influence of the driver's experience and subconsciousness on driving is taken into consideration, reducing the judgment burden of switching systems, and achieving better real-time performance. And it can predict the risks that other vehicles may cause to the car in advance, so as to avoid rear-end collisions and collisions during driving.

附图说明Description of drawings

本申请的上述和/或附加方面的优点在结合下面附图对实施例的描述中将变得明显和容易理解,其中:The advantages of the above and/or additional aspects of the present application will become apparent and easily understood in the description of the embodiments in conjunction with the following drawings, in which:

图1是根据本申请的一个实施例的基于强化学习的人机共驾控制权切换方法的示意流程图;FIG. 1 is a schematic flow chart of a method for switching control rights of human-machine co-driving based on reinforcement learning according to an embodiment of the present application;

图2是根据本申请的一个实施例的道路相对位置及相对安全位置图;Fig. 2 is according to an embodiment of the present application the relative position of the road and relative safety position map;

图3是根据本申请的一个实施例的无模型强化学习学习过程示意图;Fig. 3 is a schematic diagram of a model-free reinforcement learning learning process according to an embodiment of the present application;

图4是根据本申请的一个实施例的基于强化学习的人机共驾控制权切换机制总体结构示意图;FIG. 4 is a schematic diagram of the overall structure of the human-machine co-driving control switching mechanism based on reinforcement learning according to an embodiment of the present application;

图5是根据本申请的一个实施例的强化学习中Q学习算法中的Q-table示意图。Fig. 5 is a schematic diagram of a Q-table in a Q-learning algorithm in reinforcement learning according to an embodiment of the present application.

具体实施方式Detailed ways

为了能够更清楚地理解本申请的上述目的、特征和优点,下面结合附图和具体实施方式对本申请进行进一步的详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互结合。In order to better understand the above-mentioned purpose, features and advantages of the present application, the present application will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

在下面的描述中,阐述了很多具体细节以便于充分理解本申请,但是,本申请还可以采用其他不同于在此描述的其他方式来实施,因此,本申请的保护范围并不受下面公开的具体实施例的限制。In the following description, a lot of specific details are set forth in order to fully understand the application, however, the application can also be implemented in other ways different from those described here, therefore, the protection scope of the application is not limited by the following disclosure Limitations of specific embodiments.

如图1所示,本实施例提供了一种基于强化学习的人机共驾控制权切换方法,包括:As shown in FIG. 1, this embodiment provides a method for switching control rights of man-machine co-driving based on reinforcement learning, including:

步骤1、构建基于真实车路环境的模拟器,并在模拟器中构建各种情形的车路场景;Step 1. Construct a simulator based on the real vehicle road environment, and construct various vehicle road scenes in the simulator;

进一步地,步骤1通过以下方式实现:Further, step 1 is realized in the following ways:

步骤1.1、模拟器硬件需要拥有对驾驶员进行人像采集的摄像头以及模仿真实车辆的驾驶操作环境;Step 1.1, the simulator hardware needs to have a camera for capturing the portrait of the driver and to simulate the driving operation environment of the real vehicle;

步骤1.2、构建大量的真实世界会遇到的典型交通环境,包括全类型路段跟车场景、全类型路段超车场景、道路路口场景、拥堵路段场景等;Step 1.2. Construct a large number of typical traffic environments encountered in the real world, including car-following scenarios on all types of road sections, overtaking scenarios on all types of road sections, road intersection scenes, congested road section scenarios, etc.;

步骤1.3、在不同典型交通环境场景构建中,插入一定数量危险交通场景以及事故模拟场景。Step 1.3. In the construction of different typical traffic environment scenarios, insert a certain number of dangerous traffic scenarios and accident simulation scenarios.

步骤2、持续采集周围道路车辆信息,当前驾驶人状态、驾驶人意图、驾驶人风格与动作信息,控制权权重分配信息以及自动驾驶系统相关信息,计算得到驾驶人潜意识驱动影响偏差;Step 2. Continuously collect surrounding road vehicle information, current driver status, driver intention, driver style and action information, control weight distribution information and information about the automatic driving system, and calculate the driver's subconscious driving influence deviation;

进一步地,步骤2中可以包括以下过程:Further, step 2 may include the following processes:

步骤2.1、驾驶员需要在不同场景内完成完整的驾驶过程;Step 2.1, the driver needs to complete the complete driving process in different scenarios;

步骤2.2、在无控制权切换系统介入的情况下,驾驶员需要先进行一定量的不同驾驶场景正常驾驶,收集记录驾驶过程中驾驶人的驾驶操作与道路状况,统计分析得到驾驶人风格,并且计算驾驶人潜意识驱动影响偏差(利用驾驶人在不同驾驶场景下积累的经验造成的加减速与道路横向位置的变化来刻画驾驶人潜意识驱动操作的影响):Step 2.2. Without the intervention of the control right switching system, the driver needs to drive normally in a certain amount of different driving scenarios, collect and record the driver's driving operation and road conditions during the driving process, and obtain the driver's style through statistical analysis, and Calculate the driver's subconscious driving influence deviation (use the driver's accumulated experience in different driving scenarios to describe the influence of the driver's subconscious driving operation on acceleration, deceleration and changes in the lateral position of the road):

其中,驾驶人潜意识驱动影响偏差的计算公式为:Among them, the calculation formula of driver's subconscious driving influence deviation is:

Figure BDA0003720378570000071
Figure BDA0003720378570000071

Figure BDA0003720378570000072
Figure BDA0003720378570000072

Rd=|d-qki|R d = |dq ki |

式中,σ为驾驶人潜意识驱动影响偏差,sum为采集交通场景数量,Di为一次交通场景时间段内的一系列潜意识驱动强度,ρ′,τ,ω为待定参数,α为潜意识侧重权重,β为驾驶人个人安全倾向权重,d为车辆当前横向位置,qki为车辆在此场景(标签)下的拟合横向位置,a为车辆加速度,Rd为位置参数。In the formula, σ is the driver’s subconscious driving influence deviation, sum is the number of collected traffic scenes, D i is a series of subconscious driving strengths in a traffic scene time period, ρ′, τ, ω are undetermined parameters, and α is the subconscious weight , β is the weight of the driver's personal safety tendency, d is the current lateral position of the vehicle, q ki is the fitted lateral position of the vehicle in this scene (label), a is the vehicle acceleration, and R d is the position parameter.

具体的,驾驶人潜意识驱动影响偏差不考虑其他交通参与者的影响,只从个人安全角度进行考虑。基于最大熵原理,建立与驾驶人潜意识相关的最大熵方法。Specifically, the driver's subconscious driving influence deviation does not consider the influence of other traffic participants, but only considers it from the perspective of personal safety. Based on the principle of maximum entropy, a maximum entropy method related to the driver's subconscious is established.

首先构建熵函数:First construct the entropy function:

Figure BDA0003720378570000073
Figure BDA0003720378570000073

其中H(x)为熵,即表示对事物不确定性的度量;pk为概率分布;C为常数,取决于熵的衡量单位,此处取1。Among them, H(x) is entropy, which means the measurement of the uncertainty of things; p k is the probability distribution; C is a constant, which depends on the measurement unit of entropy, and 1 is taken here.

在该熵函数中,需要的是驾驶人潜意识驱动影响偏差即目前所处环境潜意识对行为的影响如何,但由于概率分布pk的取值为0到1的小数,使得log2pk的取值为负数,所以,本实施例中引入非负整数qi替代熵函数中的概率分布pkIn this entropy function, what is needed is the driver’s subconscious driving influence deviation, that is, the influence of the subconsciousness of the current environment on the behavior, but because the value of the probability distribution p k is a decimal number from 0 to 1, the value of log 2 p k The value is a negative number, so in this embodiment, a non-negative integer q i is introduced to replace the probability distribution p k in the entropy function.

定义参数qi为不同道路场景的相对安全位置,其中,相对位置如图2所示,从道路左侧为原点,建立横向坐标轴,采用道路单一车道一半宽度作为行驶位置,将道路分为八个区域,相对安全位置即车辆在正常行驶时一半以上处于该位置。The parameter q i is defined as the relative safe position of different road scenarios. The relative position is shown in Figure 2. The horizontal coordinate axis is established from the left side of the road as the origin, and half the width of a single lane of the road is used as the driving position. The road is divided into eight A relatively safe position means that more than half of the vehicle is in this position during normal driving.

不同场景道路差别较大,不能得到准确的可用于完全表示道路路径的具体位置,所以采用ln代替log以2为底的形式,且qi主要值为大于一的整数,所以需将原熵函数负号去除,这种差别可用以下修正熵表示:The roads in different scenarios are quite different, and it is impossible to obtain an accurate specific location that can be used to fully represent the road path. Therefore, ln is used to replace log with the base 2, and the main value of q i is an integer greater than one, so the original entropy function needs to be The negative sign is removed, and this difference can be represented by the following modified entropy:

Figure BDA0003720378570000081
Figure BDA0003720378570000081

其次建立修正熵的约束条件:第一,道路状况约束,每一个驾驶者都会选择道路状况良好的一侧。第二,交通规则的约束,驾驶人更倾向按交通规则规定驾驶。第三,交通需求的约束,即驾驶人在道路场景中需求为超车、跟车还是直行,对应公式如下:Secondly, the constraint conditions of the modified entropy are established: first, the road condition is constrained, and each driver will choose the side with good road condition. Second, due to the constraints of traffic rules, drivers are more inclined to drive according to traffic rules. Third, the constraints of traffic demand, that is, whether the driver needs to overtake, follow the car or go straight in the road scene, the corresponding formula is as follows:

约束1:

Figure BDA0003720378570000082
Constraint 1:
Figure BDA0003720378570000082

约束2:

Figure BDA0003720378570000083
Constraint 2:
Figure BDA0003720378570000083

约束3:b(qi)∈SConstraint 3: b(q i )∈S

式中,Amin、Amax为道路通行能力评分的下限与上限;

Figure BDA0003720378570000084
为不同路段陌生程度干扰系数;b为交通需求影响权重;B为交通规则最大边界;b(qi)为交通需求的判定,即已知交通需求如超车、跟车还是直行,通过需求来预估qi;S为交通需求集合,集合内为所有正常驾驶行为位置结果。In the formula, A min and A max are the lower limit and upper limit of road capacity score;
Figure BDA0003720378570000084
is the interference coefficient of unfamiliarity of different road sections; b is the influence weight of traffic demand; B is the maximum boundary of traffic rules; Estimate q i ; S is the traffic demand set, and the set contains all normal driving behavior location results.

通过设置三种约束条件以及不同道路场景,利用修正熵进行计算,得到修正熵E取值最大的时候即为不同道路场景的相对安全位置qi,对相对安全位置qi进行聚类,对每类标定标签(如超车、跟车、直行),接着对相对安全位置qi进行拟合,得到拟合横向位置qki,该位置为在不同的标签下,此驾驶人最倾向走的安全位置。By setting three kinds of constraint conditions and different road scenarios, and using the modified entropy to calculate, it is obtained that when the value of the modified entropy E is the largest, it is the relative safe position q i of different road scenarios, and the relative safe position q i is clustered, and each Class calibration labels (such as overtaking, following, and going straight), and then fit the relative safe position q i to get the fitted lateral position q ki , which is the safe position that the driver is most inclined to go under different labels .

综上所述,拟合横向位置qki为在约束条件下修正熵E取值最大时的相对安全位置qiTo sum up, the fitted lateral position q ki is the relative safe position q i when the corrected entropy E takes the maximum value under the constraints.

Di为一次交通场景时间段内的一系列潜意识驱动强度,其计算公式为:D i is a series of subconscious driving strengths within a time period of a traffic scene, and its calculation formula is:

Figure BDA0003720378570000091
Figure BDA0003720378570000091

Rd=|d-qki|R d = |dq ki |

式中,d为车辆当前横向位置,qki为车辆在此场景(标签)下的拟合横向位置,a为车辆加速度,α为潜意识侧重权重,β为驾驶人个人安全倾向权重,ρ′,,为待定参数,其取值应该在不同交通场景下满足潜意识驱动强度变化趋势,其变化趋势为:In the formula, d is the current lateral position of the vehicle, q ki is the fitted lateral position of the vehicle in this scene (label), a is the acceleration of the vehicle, α is the subconscious weight, β is the weight of the driver’s personal safety tendency, ρ′, , is an undetermined parameter, and its value should meet the changing trend of subconscious drive strength in different traffic scenarios, and its changing trend is:

当Rd≥Z(Z为安全值,为设定值,不同道路取值不一样)、潜意识驱动强度Di的值较大时,待定参数ρ′,τ,ω的取值随着Rd以及|a|的增加而增大,即越来越偏向不安全,这时候潜意识驱动操作动作的强度就越大;When R d ≥ Z (Z is a safe value, which is a set value, and the value of different roads is different), and the value of the subconscious drive strength D i is relatively large, the values of the undetermined parameters ρ′, τ, ω are as R d And the increase of |a|, that is, it is more and more unsafe, and the intensity of the operation action driven by the subconscious will be greater at this time;

当Rd<Z、Di的值较小时,待定参数ρ′,τ,ω的取值随着Rd以及|a|的减小而减小,即越来越偏向安全,这时候潜意识驱动操作动作的强度就越小。When R d < Z and the value of D i is small, the values of the undetermined parameters ρ′, τ, ω decrease with the decrease of R d and |a|, that is, they tend to be more and more safe. At this time, the subconscious drives The less intense the action is.

Figure BDA0003720378570000092
Figure BDA0003720378570000092

sum为采集交通场景数量,这种强度取均值的结果σ即为驾驶人潜意识驱动影响偏差。sum is the number of traffic scenes collected, and the result of taking the mean value of this intensity σ is the driver's subconscious driving influence deviation.

步骤2.3、驾驶人模拟真实驾驶过程中可能遇到的状况,如疲劳、情绪激动、分心等危险状态以及正常驾驶;Step 2.3, the driver simulates the conditions that may be encountered during real driving, such as fatigue, emotional agitation, distraction and other dangerous states and normal driving;

步骤2.4、收集数据,得到周围车辆速度、距离、路面信息,自身车辆刹车、油门、方向盘数据、驾驶权重分配以及驾驶系统驾驶人意图与操作数据,通过数据统计处理的方式得到驾驶人状态、驾驶人意图信息。Step 2.4, collect data, obtain surrounding vehicle speed, distance, road surface information, own vehicle brake, accelerator, steering wheel data, driving weight distribution, and driver intention and operation data of the driving system. information about human intentions.

步骤3、根据所采集的周围道路信息与车辆状态,得到与各周边单元的相互作用力与环境响应因子γ;Step 3. According to the collected surrounding road information and vehicle state, the interaction force with each surrounding unit and the environmental response factor γ are obtained;

进一步地,步骤3通过以下方式实现:Further, step 3 is realized in the following ways:

步骤3.1、环境响应因子γ为车辆与车路环境交互下共同作用影响下的相互作用力,尤其反应对不同单位的响应。采用下式对环境响应因子γ进行计算:Step 3.1, the environmental response factor γ is the interaction force under the influence of the interaction between the vehicle and the vehicle-road environment, especially the response to different units. The following formula is used to calculate the environmental response factor γ:

公式:formula:

Figure BDA0003720378570000101
Figure BDA0003720378570000101

vlimleast(t)为当前时间段场景中限速与车辆速度的最小速度值;v limleast (t) is the minimum speed value of the speed limit and vehicle speed in the scene of the current time period;

m为本车辆质量;m is the mass of the vehicle;

M为车辆类型及目的修正参数;M is the modification parameter of vehicle type and purpose;

Figure BDA0003720378570000102
代表车辆的期望速度以及速度方向,
Figure BDA0003720378570000103
Figure BDA0003720378570000104
由牛顿第二定律与运动学公式推出。
Figure BDA0003720378570000102
Represents the desired speed and direction of the vehicle,
Figure BDA0003720378570000103
Figure BDA0003720378570000104
Introduced by Newton's second law and kinematics formula.

k1为动力学修正参数;k 1 is a kinetic correction parameter;

k2为交通场景修正参数,如高速路段,拥堵路段等,

Figure BDA0003720378570000105
为与其他车辆之间的相互作用力,其中,车辆相互作用力参数
Figure BDA0003720378570000106
为:k 2 is the traffic scene correction parameter, such as high-speed road section, congested road section, etc.,
Figure BDA0003720378570000105
is the interaction force with other vehicles, where the vehicle interaction force parameter
Figure BDA0003720378570000106
for:

Figure BDA0003720378570000107
Figure BDA0003720378570000107

θ1l为该车行进方向与其他车辆行进方向夹角,Δv1l/Δμ1l为速度差与距离差的比值,u为安全距离,ρ为与其他车辆距离,这一式子表明,大于安全距离表现为吸引力,越靠近安全距离吸引力越小,当小于安全距离,吸引力转变为排斥力,越靠近其他车辆排斥力越大。横向平行位置且行进方向平行车辆之间不存在相互作用力,纵向同一车道相互作用力绝对值最大。θ 1l is the angle between the traveling direction of the vehicle and other vehicles, Δv 1l /Δμ 1l is the ratio of the speed difference to the distance difference, u is the safety distance, and ρ is the distance from other vehicles. Attractive force, the closer to the safe distance, the smaller the attractive force, when it is smaller than the safe distance, the attractive force turns into repulsive force, and the closer to other vehicles, the greater the repulsive force. There is no interaction force between vehicles in parallel positions in the transverse direction and parallel to the direction of travel, and the absolute value of the interaction force in the same lane in the longitudinal direction is the largest.

k3为行人遵守交通规则程度的修正参数,

Figure BDA0003720378570000111
为与行人的相互作用力,其中,行人相互作用力参数
Figure BDA0003720378570000112
为:k 3 is the correction parameter of the degree of pedestrian compliance with traffic rules,
Figure BDA0003720378570000111
is the interaction force with pedestrians, where, the parameter of pedestrian interaction force
Figure BDA0003720378570000112
for:

Figure BDA0003720378570000113
Figure BDA0003720378570000113

v为车辆当前速度,θ1j为该车车头中心与行人夹角,r1j为距离差,t1j为相遇预估时间差,这一式子表明当车辆速度为0时,与行人没有相互作用力,当车辆与行人距离越近,角度差越小,相遇预估时间越短,车辆速度越大,均会导致排斥力加大。v is the current speed of the vehicle, θ 1j is the angle between the center of the front of the vehicle and the pedestrian, r 1j is the distance difference, and t 1j is the estimated time difference between encounters. This formula shows that when the vehicle speed is 0, there is no interaction force with the pedestrian. When the distance between the vehicle and the pedestrian is closer, the angle difference is smaller, the estimated encounter time is shorter, and the vehicle speed is higher, all of which will lead to an increase in the repulsive force.

k4为周边物理环境复杂程度修正参数,

Figure BDA0003720378570000114
为与周围物理环境如建筑物等非移动物体的相互作用力,其中,环境相互作用力参数
Figure BDA0003720378570000115
为:k 4 is the correction parameter of the complexity of the surrounding physical environment,
Figure BDA0003720378570000114
is the interaction force with the surrounding physical environment such as buildings and other non-moving objects, where the environmental interaction force parameter
Figure BDA0003720378570000115
for:

Figure BDA0003720378570000116
Figure BDA0003720378570000116

T为非移动物体体积,体积越大排斥力越大,当体积小于等于车辆可通行大小时,相互作用力为吸引力,在体积大于车辆可通行大小时,当碰撞时间t1R越小时,排斥力越大,当车辆质量越大时,排斥力越大,车辆速度越大时,排除力越大,速度为0时,不存在相互作用力。T is the volume of a non-moving object. The larger the volume, the greater the repulsive force. When the volume is less than or equal to the passable size of the vehicle, the interaction force is attraction. The greater the force, the greater the repulsion force when the mass of the vehicle is greater, the greater the repulsion force when the vehicle speed is greater, and there is no interaction force when the speed is 0.

k5为交通规则影响程度的修正参数即反应车辆对交通规则重视程度,

Figure BDA0003720378570000117
为交通规则的阻力作用,其中,规则参数
Figure BDA0003720378570000118
为:k 5 is the correction parameter of the degree of influence of traffic rules, which reflects the degree of importance that vehicles attach to traffic rules.
Figure BDA0003720378570000117
is the resistance effect of traffic rules, where the rule parameters
Figure BDA0003720378570000118
for:

Figure BDA0003720378570000119
Figure BDA0003720378570000119

vlim为交规及交通标志限制最大速度,限制速度越低,阻力越大,当交规及交通标志规定需要停车时如红灯情况,阻力为无限大。v lim is the maximum speed limited by traffic regulations and traffic signs. The lower the speed limit, the greater the resistance. When traffic regulations and traffic signs require parking, such as a red light, the resistance is infinite.

步骤4、根据采集到的当前的动作信息,即刹车、油门与方向盘,进行归一化预处理,得到实时操作量化参数apreStep 4. Perform normalized preprocessing according to the collected current action information, namely the brake, accelerator and steering wheel, to obtain the real-time operation quantization parameter a pre ;

进一步地,步骤4通过以下方式实现:Further, step 4 is realized in the following ways:

步骤4.1、通过传感器提取刹车力度,油门力度,方向盘转角数据;Step 4.1, extract brake force, accelerator force, and steering wheel angle data through sensors;

步骤4.2、对这三种数据使用min-max标准化进行归一化处理:Step 4.2, normalize these three kinds of data using min-max standardization:

刹车:brake:

Figure BDA0003720378570000121
Figure BDA0003720378570000121

油门:accelerator:

Figure BDA0003720378570000122
Figure BDA0003720378570000122

方向盘转角:Steering wheel angle:

Figure BDA0003720378570000123
Figure BDA0003720378570000123

*为当前值,min为最小值,max为最大值;* is the current value, min is the minimum value, and max is the maximum value;

由操作规范可知油门与刹车为互斥操作,所以将归一化结果合并得:It can be seen from the operation specification that the accelerator and the brake are mutually exclusive operations, so the normalized results are combined to get:

纵向操作区间:[-1:1];Vertical operation interval: [-1:1];

横向操作区间:[-1:1];Horizontal operation interval: [-1:1];

步骤4.3、对纵向操作区间与横向操作区间:[-1:1]*[-1:1]进行降维,构造[-1:1]*[-1:1]到[-1:1]的双射:Step 4.3. Perform dimensionality reduction on the vertical and horizontal operation intervals: [-1:1]*[-1:1], and construct [-1:1]*[-1:1] to [-1:1] The bijection:

纵向值为(0.a1a2a3a4…),横向值为(0.b1b2b3b4…),构建交叉法,将这两类小数进行分段,在所有非0数位后分割,将分割出的各段进行交叉重组,得到一维的实时操作量化参数apreThe vertical value is (0.a 1 a 2 a 3 a 4 ...), and the horizontal value is (0.b 1 b 2 b 3 b 4 ...), and the cross method is constructed to divide these two types of decimals into segments. After the 0 digits are divided, each divided segment is cross-recombined to obtain a one-dimensional real-time operation quantization parameter a pre .

步骤5、据当前车辆在道路中的位置,确定当前车路信息,其中,当前车路信息至少包括当前车路危险度与当前车路危险阈值。Step 5. According to the position of the current vehicle on the road, the current vehicle road information is determined, wherein the current vehicle road information at least includes the current vehicle road hazard degree and the current vehicle road hazard threshold.

进一步地,步骤5通过以下方式实现:Further, step 5 is realized in the following ways:

步骤5.1、确定当前车辆在道路中的位置,至少包括所述当前车辆的与前车距离和所述当前车辆的横向位置,并根据所述当前车辆的与前车距离,确定纵向车路危险值。Step 5.1. Determine the position of the current vehicle on the road, including at least the distance between the current vehicle and the vehicle in front and the lateral position of the current vehicle, and determine the longitudinal roadway risk value according to the distance between the current vehicle and the vehicle in front .

纵向位置的风险与距离前车车尾的距离成反比,即距离前车车尾的距离越近风险越大,建立纵向车路危险函数,以前车车尾为原点设定坐标轴,规定正常安全距离为ζ1。设定一个最短安全距离η1,η1的设定为最大减速度制动到前车刚好不碰撞距离。The risk of the longitudinal position is inversely proportional to the distance from the rear of the vehicle in front, that is, the closer the distance to the rear of the vehicle in front, the greater the risk. Establish a longitudinal vehicle road hazard function, set the coordinate axis as the origin at the rear of the vehicle in front, and specify the normal safety The distance is ζ 1 . Set a shortest safety distance η 1 , and the setting of η 1 is the maximum deceleration braking distance to the vehicle in front just before collision.

Figure BDA0003720378570000131
Figure BDA0003720378570000131

y1为纵向车路危险值,y 1 is the longitudinal vehicle road hazard value,

x1为与前车距离;x 1 is the distance from the vehicle in front;

步骤5.2、根据所述当前车辆的横向位置,确定横向车路危险值:Step 5.2, according to the lateral position of the current vehicle, determine the lateral traffic road risk value:

以车辆车首中心点为原点,建立横向车路危险函数:Taking the center point of the vehicle head as the origin, establish the transverse vehicle road hazard function:

y2=0.5cos[(π/T)x2]-0.5,-T≤x2≤Ty 2 =0.5cos[(π/T)x 2 ]-0.5,-T≤x 2 ≤T

y2为横向车路危险值,y 2 is the danger value of the transverse road,

x2为当前横向位置;x 2 is the current horizontal position;

T为车道中线到边线距离;T is the distance from the lane centerline to the sideline;

步骤5.3、计算得到当前车路危险度Hx,yStep 5.3. Calculate the current vehicle road hazard H x,y :

Figure BDA0003720378570000132
Figure BDA0003720378570000132

Figure BDA0003720378570000133
为不同路段危险距离影响因子,其取值范围为[1,10],当其值取1时代表当前路段与驾驶状态均为交规规定下标准通行路段与驾驶环境。当其值取10时则表明当前驾驶环境恶劣,道路通行能力极差,周边多发追尾事故的情况,如大雾,结冰路段。
Figure BDA0003720378570000133
is the influence factor of the dangerous distance of different road sections, and its value range is [1,10]. When the value is 1, it means that the current road section and driving state are all standard traffic sections and driving environment under the traffic regulations. When its value is 10, it indicates that the current driving environment is bad, the road traffic capacity is extremely poor, and there are frequent rear-end collision accidents in the surrounding area, such as heavy fog and icy road sections.

步骤5.4、计算不同场景的当前车路危险阈值:Step 5.4, calculate the current vehicle road hazard thresholds in different scenarios:

risk=ωγHx,y risk=ωγH x,y

ω为场景影响参数。环境响应因子γω is the scene influence parameter. environmental response factor gamma

步骤6、根据所采集的驾驶人状态、驾驶人意图与实时操作量化参数apre,结合环境响应因子γ、当前车路危险度与当前车路危险阈值,采用分段函数的方式,确定得到自身的综合驾驶操作动作指数

Figure BDA0003720378570000141
对应的计算公式为:Step 6. According to the collected driver's state, driver's intention and real-time operation quantitative parameter a pre , combined with the environmental response factor γ, the current vehicle road hazard degree and the current vehicle road hazard threshold, determine the self by using a piecewise function Comprehensive driving behavior index
Figure BDA0003720378570000141
The corresponding calculation formula is:

Figure BDA0003720378570000142
Figure BDA0003720378570000142

z1为驾驶人状态,状态不同,环境响应程度不同,z 1 is the state of the driver, different states have different degrees of environmental response,

γ为环境响应因子,γ is the environmental response factor,

δ为驾驶人意图,代表当前操作与识别驾驶人意图一致程度,δ is the driver's intention, which represents the degree of consistency between the current operation and the recognition of the driver's intention.

Hx,y为当前车路危险度,H x, y is the current road hazard,

σ为道路修正参数,σ is the road correction parameter,

apre为实时操作量化参数,a pre is the quantization parameter for real-time operation,

risk为当前车路危险阈值。risk is the current road hazard threshold.

步骤7、根据相互作用力增长速率,得到预测车路危险度与预测车路危险阈值;Step 7. According to the growth rate of the interaction force, the predicted vehicle road hazard degree and the predicted vehicle road hazard threshold are obtained;

具体的,上述步骤3中的相互作用力与距离成反比,相互作用力增长加速度越快,越容易发生危险,因此可得出如下推导公式:Specifically, the interaction force in the above step 3 is inversely proportional to the distance, the faster the acceleration of the interaction force growth, the more dangerous it is, so the following derivation formula can be obtained:

Figure BDA0003720378570000143
Figure BDA0003720378570000143

Figure BDA0003720378570000144
Figure BDA0003720378570000144

Figure BDA0003720378570000145
Figure BDA0003720378570000145

Figure BDA0003720378570000146
Figure BDA0003720378570000146

Aarisk=ρarisk A risk = ρa risk

式中,vf为单个单位相互作用力增长速度,vrisk为预测车路危险度,af为单个单位相互作用力增长加速度,arisk为所有周边单位相互作用力增长加速度总和,Aarisk为预测车路危险阈值,ρ为车路危险影响因子,由当前道路复杂程度决定,取值范围为[0,1]。In the formula, v f is the growth rate of the interaction force of a single unit, v risk is the predicted vehicle road risk, a f is the growth acceleration of the interaction force of a single unit, a risk is the sum of the growth acceleration of the interaction force of all surrounding units, and A risk is Predict the road hazard threshold, ρ is the influence factor of road hazard, which is determined by the complexity of the current road, and the value range is [0,1].

步骤8、根据驾驶人信息和车路预测信息,计算驾驶操作动作预测指数

Figure BDA0003720378570000151
其中,驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格以及驾驶人潜意识驱动影响偏差,所述车路预测信息至少包括预测车路危险度以及预测车路危险阈值,所述驾驶操作动作预测指数的计算公式
Figure BDA0003720378570000152
为:Step 8. Calculate the driving operation action prediction index according to the driver information and vehicle road prediction information
Figure BDA0003720378570000151
Wherein, the driver information includes at least the driver's state, driver's intention, driver's style, and driver's subconscious driving influence deviation, the vehicle road prediction information includes at least the predicted vehicle road risk degree and the predicted vehicle road risk threshold, and the driving operation Calculation formula of motion prediction index
Figure BDA0003720378570000152
for:

Figure BDA0003720378570000153
Figure BDA0003720378570000153

式中,σ为驾驶人潜意识驱动影响偏差,通过对比交通场景得到最相似场景下历史驾驶人潜意识驱动影响偏差;In the formula, σ is the driver's subconscious driving influence deviation, and the historical driver's subconscious driving influence deviation in the most similar scene is obtained by comparing traffic scenes;

S为驾驶人风格,即通过驾驶人风格测试得到[0,10]的驾驶人风格量化评估,小于1为极不适合的驾驶人风格,影响驾驶人意图与驾驶状态操作反应延时。S is the driver's style, that is, the quantitative evaluation of the driver's style [0,10] is obtained through the driver's style test, and less than 1 is an extremely unsuitable driver's style, which affects the driver's intention and driving state operation reaction delay.

Zt为驾驶人状态操作反应延时,为设定值,延时越大,驾驶操作动作预测指数越小;Z t is the response delay of the driver's state operation, which is a set value. The greater the delay, the smaller the driving operation action prediction index;

δ为驾驶人意图,不同驾驶人意图路径对驾驶人潜意识驱动影响偏差影响较大。δ is the driver's intention, and different driver's intention paths have a greater influence on the bias of the driver's subconscious drive.

步骤9、将驾驶操作动作预测指数

Figure BDA0003720378570000154
与综合驾驶操作动作指数
Figure BDA0003720378570000155
输入进基于强化学习的人机共驾控制权切换系统,计算调整驾驶人与驾驶系统分别所需的驾驶权权重。Step 9. The driving operation action prediction index
Figure BDA0003720378570000154
and Comprehensive Driving Operation Action Index
Figure BDA0003720378570000155
Input it into the human-machine co-driving control right switching system based on reinforcement learning, and calculate the driving right weights required to adjust the driver and the driving system respectively.

具体的,如图3和图4所示,驾驶操作动作预测指数

Figure BDA0003720378570000156
代表未来时间段的各风险因素影响操作安全程度,偏重自身操作后的其他单位对车辆安全的影响。综合驾驶操作动作指数
Figure BDA0003720378570000157
代表当前时间各风险因素影响操作安全程度,偏重当前位置是否安全以及当前状态能否有效驾驶;Specifically, as shown in Figure 3 and Figure 4, the driving operation action prediction index
Figure BDA0003720378570000156
Each risk factor representing the future time period affects the degree of operational safety, focusing on the impact of other units on vehicle safety after its own operation. Comprehensive driving behavior index
Figure BDA0003720378570000157
Represents the influence of various risk factors at the current time on the degree of operational safety, focusing on whether the current location is safe and whether the current state can be effectively driven;

步骤9.1,使用Z-score标准化公式,对当前时刻的驾驶操作动作预测指数

Figure BDA0003720378570000161
与综合驾驶操作动作指数
Figure BDA0003720378570000162
进行标准化,分别计算这次驾驶中从开始到当前的驾驶操作动作预测指数
Figure BDA0003720378570000163
与综合驾驶操作动作指数
Figure BDA0003720378570000164
的均值与标准差。Step 9.1, use the Z-score standardized formula to predict the driving operation action index at the current moment
Figure BDA0003720378570000161
and Comprehensive Driving Operation Action Index
Figure BDA0003720378570000162
Standardize and calculate the driving operation action prediction index from the beginning to the current driving in this driving respectively
Figure BDA0003720378570000163
and Comprehensive Driving Operation Action Index
Figure BDA0003720378570000164
mean and standard deviation of .

步骤9.2,将Z-Score标准化后的驾驶操作动作预测指数

Figure BDA0003720378570000165
与综合驾驶操作动作指数
Figure BDA0003720378570000166
以及当前分别相应的均值与标准差作为输入参数,输入进基于强化学习的人机共驾控制权切换系统,判断是否满足权重分配条件,若满足,执行步骤9.3,若不满足,重新获取驾驶人信息和车路预测信息。Step 9.2, the driving operation action prediction index after Z-Score normalization
Figure BDA0003720378570000165
and Comprehensive Driving Operation Action Index
Figure BDA0003720378570000166
And the current corresponding mean and standard deviation as input parameters, input into the human-machine co-driving control switching system based on reinforcement learning, and judge whether the weight distribution conditions are met. If yes, perform step 9.3. If not, obtain the driver again. information and traffic forecast information.

具体的,参数Z-Score表示抽样样本值与数据均值相差几个标准差的数目,以驾驶操作动作预测指数

Figure BDA0003720378570000167
为例,第一参数
Figure BDA0003720378570000168
为当前输入的驾驶操作动作预测指数
Figure BDA0003720378570000169
(抽样样本值)与从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数
Figure BDA00037203785700001610
的均值(数据均值)相差标准差的数目,该标准差为从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数
Figure BDA00037203785700001611
的标准差。第二参数
Figure BDA00037203785700001612
类似,不再赘述。Specifically, the parameter Z-Score represents the number of standard deviations between the sampling sample value and the data mean value, and the driving operation action prediction index
Figure BDA0003720378570000167
For example, the first parameter
Figure BDA0003720378570000168
For the current input driving operation action prediction index
Figure BDA0003720378570000169
(sampling sample value) and all input driving operation behavior prediction indexes from the beginning of driving behavior to the current moment
Figure BDA00037203785700001610
The number of standard deviations from the mean (data mean) of , which is the prediction index of all input driving operation actions from the beginning of the driving behavior to the current moment
Figure BDA00037203785700001611
standard deviation of . second parameter
Figure BDA00037203785700001612
Similar, no more details.

本实施例中,控制权切换系统中权重分配条件有三种:In this embodiment, there are three weight distribution conditions in the control right switching system:

(1)连续5次第一参数

Figure BDA00037203785700001613
与第二参数
Figure BDA00037203785700001614
均小于或等于第一触发阈值;(1) 5 times the first parameter in a row
Figure BDA00037203785700001613
with the second parameter
Figure BDA00037203785700001614
are less than or equal to the first trigger threshold;

具体的,判断第一参数

Figure BDA00037203785700001615
与第二参数
Figure BDA00037203785700001616
是否均小于或等于第一触发阈值,该第一触发阈值的取值可以为-3,即当输入的驾驶操作动作预测指数
Figure BDA00037203785700001617
与综合驾驶操作动作指数
Figure BDA00037203785700001618
是否均比当前的相应均值小于或等于3个标准差,对应的公式为:Specifically, determine the first parameter
Figure BDA00037203785700001615
with the second parameter
Figure BDA00037203785700001616
Whether they are all less than or equal to the first trigger threshold, the value of the first trigger threshold can be -3, that is, when the input driving operation prediction index
Figure BDA00037203785700001617
and Comprehensive Driving Operation Action Index
Figure BDA00037203785700001618
Whether it is less than or equal to 3 standard deviations than the current corresponding mean value, the corresponding formula is:

Figure BDA00037203785700001619
Figure BDA00037203785700001620
Figure BDA00037203785700001619
and
Figure BDA00037203785700001620

此类情况说明现在状态不满足安全状态且未来也不满足安全状态,因此,触发控制权切换系统开始工作。在这一条件下,当连续五次输入的指数(

Figure BDA0003720378570000171
Figure BDA0003720378570000172
)均满足
Figure BDA0003720378570000173
Figure BDA0003720378570000174
时,需要调整驾驶人与驾驶系统分别所需的驾驶权权重。This kind of situation shows that the current state does not satisfy the safe state and the future will not satisfy the safe state, therefore, the control right switching system is triggered to start working. Under this condition, when the exponent (
Figure BDA0003720378570000171
and
Figure BDA0003720378570000172
) are satisfied
Figure BDA0003720378570000173
and
Figure BDA0003720378570000174
When , it is necessary to adjust the driving right weights required by the driver and the driving system respectively.

(2)连续3次第二参数

Figure BDA0003720378570000175
小于或等于第二触发阈值;(2) 3 consecutive second parameters
Figure BDA0003720378570000175
less than or equal to the second trigger threshold;

具体的,第二触发阈值的取值可以为-4。当输入的综合驾驶操作动作指数

Figure BDA0003720378570000176
当前的均值小于或等于4个标准差,即第二参数
Figure BDA0003720378570000177
Figure BDA0003720378570000178
时,说明当前状态极不满足安全状态需要驾驶系统紧急介入,触发控制权切换系统开始工作。在这一条件下,当连续三次输入的指数
Figure BDA0003720378570000179
满足第二参数
Figure BDA00037203785700001710
时,调整驾驶人与驾驶系统分别所需的驾驶权权重。Specifically, the value of the second trigger threshold may be -4. When the input comprehensive driving operation behavior index
Figure BDA0003720378570000176
The current mean is less than or equal to 4 standard deviations, the second parameter
Figure BDA0003720378570000177
Figure BDA0003720378570000178
, it means that the current state is extremely unsatisfactory and the driving system needs to intervene urgently, triggering the control right switching system to start working. Under this condition, when three consecutive input indices
Figure BDA0003720378570000179
satisfy the second parameter
Figure BDA00037203785700001710
When , adjust the driving right weights required by the driver and the driving system respectively.

(3)连续3次第一参数

Figure BDA00037203785700001711
小于或等于第二触发阈值;(3) The first parameter for 3 consecutive times
Figure BDA00037203785700001711
less than or equal to the second trigger threshold;

具体的,当输入的驾驶操作动作预测指数

Figure BDA00037203785700001712
当前的均值小于或等于4个标准差,即第一参数
Figure BDA00037203785700001713
时,说明未来状态极不满足安全状态且驾驶员介入也不能自行修正之后状态为安全,触发控制权切换系统开始工作。在这一条件下,当连续三次输入的指数
Figure BDA00037203785700001714
满足第一参数
Figure BDA00037203785700001715
时,调整驾驶人与驾驶系统分别所需的驾驶权权重。Specifically, when the input driving operation action prediction index
Figure BDA00037203785700001712
The current mean is less than or equal to 4 standard deviations, the first parameter
Figure BDA00037203785700001713
When , it means that the future state is extremely unsatisfactory to the safe state and the state is safe after the driver intervenes and cannot be corrected by himself, triggering the control right switching system to start working. Under this condition, when three consecutive input indices
Figure BDA00037203785700001714
satisfy the first parameter
Figure BDA00037203785700001715
When , adjust the driving right weights required by the driver and the driving system respectively.

步骤9.3,基于Q学习算法,利用输入参数调整Q学习算法中的学习状态,并根据Q学习算法中下一个状态的价值最大值中的动作对驾驶人驾驶权权重进行赋值,其中,驾驶系统驾驶权权重为1与驾驶人驾驶权权重的差值。Step 9.3, based on the Q learning algorithm, use the input parameters to adjust the learning state in the Q learning algorithm, and assign a value to the driver's driving right weight according to the action in the value maximum value of the next state in the Q learning algorithm, wherein the driving system driving The weight is the difference between 1 and the driver's driving right weight.

本实施例中,控制权切换系统中对驾驶人与驾驶系统分别所需的驾驶权权重算法为Q学习算法,具体的训练过程如下:In this embodiment, the driving right weight algorithm required by the driver and the driving system in the control right switching system is the Q learning algorithm, and the specific training process is as follows:

(1)Q学习的过渡规则为:(1) The transition rule of Q-learning is:

Q(state,action)=R(state,action)+Gamma*MaxQ(next state,all actions)Q(state,action)=R(state,action)+Gamma*MaxQ(next state,all actions)

即Q(状态,动作)=R(状态,动作)+Gamma*最大[Q(下一个状态,所有动作)That is, Q(state, action)=R(state, action)+Gamma*max[Q(next state, all actions)

Gamma为折扣因子(discount factor),折扣因子越大,MaxQ所起到的作用就越大。在这里,可以理解为眼前价值(R),和记忆中的价值。MaxQ指的便是记忆中的价值,它是指记忆里下一个状态的动作中价值的最大值。Gamma is the discount factor. The larger the discount factor, the greater the effect of MaxQ. Here, it can be understood as the immediate value (R) and the value in memory. MaxQ refers to the value in memory, which refers to the maximum value of the action in the next state in memory.

(2)添加一个“矩阵Q”作为强化学习智能体即驾驶权切换系统的大脑,即通过经验学到的东西。“矩阵Q”的行表示驾驶权切换系统当前的状态,列表示下一个状态(节点之间的链接)的可能动作。驾驶权切换系统初始值为0,即“矩阵Q”被初始化为零。“矩阵Q”只能从一个元素开始。如果找到新状态,则更新“矩阵Q”,这被称为无监督学习。(2) Add a "matrix Q" as the brain of the reinforcement learning agent, the driving-right switching system, that is, what is learned through experience. The rows of the "matrix Q" represent the current state of the driving authority switching system, and the columns represent possible actions for the next state (links between nodes). The initial value of the driving right switching system is 0, that is, the "matrix Q" is initialized to zero. A "matrix Q" can only start with one element. If a new state is found, the "matrix Q" is updated, which is known as unsupervised learning.

(3)驾驶人驾驶权权重为power(driver),驾驶系统驾驶权权重为power(system)=1-power(driver),控制权切换系统使用强化学习Q学习算法调整驾驶人驾驶权权重power(driver),Q学习动作为直接给驾驶人驾驶权权重power(driver)赋值,权重值取值范围为[0,1],步长为0.05。(3) The driver's driving right weight is power(driver), the driving system's driving right weight is power(system)=1-power(driver), and the control right switching system uses the reinforcement learning Q-learning algorithm to adjust the driver's driving right weight power( driver), the Q learning action is to directly assign a value to the driver’s driving weight power(driver), the value range of the weight value is [0,1], and the step size is 0.05.

(4)Q学习状态设置为0,1,2,3,4,5。其中:(4) The Q learning state is set to 0, 1, 2, 3, 4, 5. in:

0代表

Figure BDA0003720378570000181
Figure BDA0003720378570000182
0 means
Figure BDA0003720378570000181
and
Figure BDA0003720378570000182

1代表

Figure BDA0003720378570000183
1 represents
Figure BDA0003720378570000183

2代表

Figure BDA0003720378570000184
2 representatives
Figure BDA0003720378570000184

3代表

Figure BDA0003720378570000185
3 representatives
Figure BDA0003720378570000185

4代表

Figure BDA0003720378570000186
4 representatives
Figure BDA0003720378570000186

5代表

Figure BDA0003720378570000187
Figure BDA0003720378570000188
5 for
Figure BDA0003720378570000187
and
Figure BDA0003720378570000188

(5)触发控制权切换系统开始工作后,得到初始状态为0,1,2中一种,当状态通过(3)中的动作(调整权重)依旧状态为0,1,2中一种时,奖励为-1,更新“矩阵Q”,将该状态与使用动作对应矩阵中的元素赋值为-1;(5) After triggering the control right switching system to start working, the initial state is one of 0, 1, and 2. When the state passes the action in (3) (adjusting the weight), the state is still one of 0, 1, and 2. , the reward is -1, update the "matrix Q", and assign -1 to the elements in the matrix corresponding to the state and the action;

当状态通过(3)中的动作达到状态3,4时,奖励为1,更新“矩阵Q”,将该状态与使用动作对应矩阵中的元素赋值为1;When the state reaches state 3 and 4 through the action in (3), the reward is 1, and the "matrix Q" is updated, and the element in the matrix corresponding to the state and the action is assigned a value of 1;

当状态通过(3)中的动作达到状态5时,奖励为100,更新“矩阵Q”,将该状态与使用动作对应矩阵中的元素赋值为100,状态5为目标状态,最终得到Q-table如图5(元素未赋值)所示。When the state reaches state 5 through the action in (3), the reward is 100, update the "matrix Q", assign the element in the matrix corresponding to the state and the action to 100, state 5 is the target state, and finally get the Q-table As shown in Figure 5 (element is not assigned).

(6)选择道路环境,应用(1)至(5),得到初始“矩阵Q”的Q-table作为Q学习Q(state,action)=R(state,action)+Gamma*MaxQ(next state,all actions)中的MaxQ(next state,all actions)的选择,其中,Gamma根据道路相似程度在[0,1]进行选择,R(state,action)为当前道路环境得到的状态的价值:状态为0,1,2时,奖励为-1,状态为3,4时,奖励为1,状态为5时,奖励为100。(6) Select the road environment, apply (1) to (5), and get the Q-table of the initial "matrix Q" as Q learning Q(state,action)=R(state,action)+Gamma*MaxQ(next state, all actions) MaxQ(next state, all actions) selection, where Gamma selects in [0,1] according to the road similarity, R(state,action) is the value of the state obtained by the current road environment: the state is When the state is 0, 1, 2, the reward is -1, when the state is 3, 4, the reward is 1, and when the state is 5, the reward is 100.

控制权切换系统在计算驾驶权重时,先根据相似路段的Q-table预先计算,可以调整的权重以及达到的状态所得的奖励,奖励为上式中的MaxQ(next state,all actions)。所以Q(state,action)为当前道路环境下的R(state,action)值加上MaxQ(next state,allactions)。When the control right switching system calculates the driving weight, it first pre-calculates according to the Q-table of similar road sections, the weight that can be adjusted and the reward obtained by the achieved state. The reward is MaxQ(next state, all actions) in the above formula. So Q(state,action) is the value of R(state,action) in the current road environment plus MaxQ(next state,allactions).

当Q(state,action)为最大时,MaxQ(next state,all actions)中的动作nextstate,即为需要调整的权重,记为驾驶人驾驶权权重power(driver)。When Q(state,action) is the maximum, the action nextstate in MaxQ(next state,all actions) is the weight that needs to be adjusted, which is recorded as the driver's driving right weight power(driver).

(7)根据计算得到的Q(state,action)值更新Q-table。(7) Update the Q-table according to the calculated Q(state,action) value.

(8)当达到状态5后停止切换权重。(8) Stop switching weights when state 5 is reached.

以上结合附图详细说明了本申请的技术方案,本申请提出了一种基于强化学习的人机共驾控制权切换方法,该方法适用于基于强化学习的人机共驾控制权切换系统对驾驶人与驾驶系统之间驾驶权重的分配,该方法包括:根据驾驶人信息和车路预测信息,计算驾驶操作动作预测指数;将所述驾驶操作动作预测指数与综合驾驶操作动作指数,输入至所述控制权切换系统,计算所述驾驶人与所述驾驶系统之间的所述驾驶权重。通过本申请中的技术方案,有效解决了车辆纵向与横向综合的风险,弱化了驾驶人本身带来的不确定性的影响,对驾驶人进行不同角度的综合考虑从而减少了对驾驶人的判断误差。The technical solution of the present application has been described in detail above in conjunction with the accompanying drawings. This application proposes a method for switching control rights of human-machine co-driving based on reinforcement learning. The distribution of driving weight between human and driving system, the method includes: calculating the driving operation action prediction index according to the driver information and vehicle road prediction information; inputting the driving operation action prediction index and the comprehensive driving operation action index into the The control right switching system calculates the driving weight between the driver and the driving system. Through the technical solution in this application, the risk of vehicle vertical and horizontal integration is effectively solved, the influence of uncertainty brought by the driver itself is weakened, and the comprehensive consideration of different angles for the driver reduces the judgment of the driver error.

本申请中的步骤可根据实际需求进行顺序调整、合并和删减。The steps in this application can be adjusted, combined and deleted according to actual needs.

本申请装置中的单元可根据实际需求进行合并、划分和删减。Units in the device of the present application can be combined, divided and deleted according to actual needs.

尽管参考附图详地公开了本申请,但应理解的是,这些描述仅仅是示例性的,并非用来限制本申请的应用。本申请的保护范围由附加权利要求限定,并可包括在不脱离本申请保护范围和精神的情况下针对发明所作的各种变型、改型及等效方案。While the present application has been disclosed in detail with reference to the accompanying drawings, it should be understood that these descriptions are illustrative only and are not intended to limit the application of the present application. The protection scope of the present application is defined by the appended claims, and may include various changes, modifications and equivalent solutions for the invention without departing from the protection scope and spirit of the present application.

Claims (7)

1.一种基于强化学习的人机共驾控制权切换方法,其特征在于,该方法适用于基于强化学习的人机共驾控制权切换系统对驾驶人与驾驶系统之间驾驶权重的分配,所述方法包括:1. A human-machine joint driving control power switching method based on reinforcement learning, characterized in that the method is applicable to the distribution of driving weights between the driver and the driving system by the human-machine joint driving control power switching system based on reinforcement learning, The methods include: 根据驾驶人信息和车路预测信息,计算驾驶操作动作预测指数;Calculate the driving operation action prediction index according to the driver information and vehicle road prediction information; 将所述驾驶操作动作预测指数与综合驾驶操作动作指数,输入至所述控制权切换系统,计算所述驾驶人与所述驾驶系统之间的所述驾驶权重;inputting the driving operation behavior prediction index and the comprehensive driving operation behavior index into the control right switching system, and calculating the driving weight between the driver and the driving system; 所述驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格以及驾驶人潜意识驱动影响偏差,所述车路预测信息至少包括预测车路危险度以及预测车路危险阈值;The driver information includes at least the driver's state, driver's intention, driver's style, and driver's subconscious driving influence deviation, and the vehicle road prediction information includes at least the predicted vehicle road risk degree and the predicted vehicle road risk threshold; 驾驶操作动作预测指数代表未来时间段的各风险因素影响操作安全程度,偏重自身操作后的其他单位对车辆安全的影响;The driving operation action prediction index represents the impact of various risk factors in the future time period on the degree of operation safety, and focuses on the impact of other units on vehicle safety after one's own operation; 综合驾驶操作动作指数代表当前时间各风险因素影响操作安全程度,偏重当前位置是否安全以及当前状态能否有效驾驶;The comprehensive driving operation action index represents the current time and various risk factors affect the safety of the operation, focusing on whether the current position is safe and whether the current state can be effectively driven; 所述驾驶操作动作预测指数的计算公式为:The formula for calculating the driving operation prediction index is:
Figure FDA0004066029330000011
Figure FDA0004066029330000011
式中,
Figure FDA0004066029330000012
为所述驾驶操作动作预测指数,Zt为驾驶人状态操作反应延时,σ为所述驾驶人潜意识驱动影响偏差,δ为所述驾驶人意图,S为所述驾驶人风格,vrisk为所述预测车路危险度,Aarisk为所述预测车路危险阈值。
In the formula,
Figure FDA0004066029330000012
is the driving operation action prediction index, Z t is the driver’s state operation reaction delay, σ is the driver’s subconscious driving influence deviation, δ is the driver’s intention, S is the driver’s style, and v risk is The predicted vehicle road risk, A risk is the predicted vehicle road risk threshold.
2.如权利要求1所述的基于强化学习的人机共驾控制权切换方法,其特征在于,所述驾驶人潜意识驱动影响偏差σ的计算公式为:2. The method for man-machine co-driving control switching based on reinforcement learning according to claim 1, characterized in that, the calculation formula of the driver's subconscious driving influence deviation σ is:
Figure FDA0004066029330000013
Figure FDA0004066029330000013
Figure FDA0004066029330000014
Figure FDA0004066029330000014
Rd=|d-qki|R d = |dq ki | 式中,σ为驾驶人潜意识驱动影响偏差,sum为采集交通场景数量,Di为一次交通场景时间段内的一系列潜意识驱动强度,ρ′,,为待定参数,α为潜意识侧重权重,β为驾驶人个人安全倾向权重,d为车辆当前横向位置,qki为车辆在此场景下的拟合横向位置,a为车辆加速度,Rd为位置参数。In the formula, σ is the driver’s subconscious driving influence deviation, sum is the number of collected traffic scenes, D i is a series of subconscious driving strengths in a traffic scene time period, ρ′,, is an undetermined parameter, α is the subconscious weight, β d is the current lateral position of the vehicle, q ki is the fitted lateral position of the vehicle in this scene, a is the vehicle acceleration, and R d is the position parameter.
3.如权利要求1所述的基于强化学习的人机共驾控制权切换方法,其特征在于,所述驾驶人信息至少包括驾驶人状态、驾驶人意图、驾驶人风格,所述综合驾驶操作动作指数的计算过程具体包括:3. The method for switching control rights of man-machine co-driving based on reinforcement learning according to claim 1, wherein the driver information at least includes driver status, driver intention, driver style, and the comprehensive driving operation The calculation process of the action index specifically includes: 根据当前车辆在道路中的位置,确定当前车路信息,其中,所述当前车路信息至少包括当前车路危险度以及当前车路危险阈值;Determine the current vehicle road information according to the current vehicle position on the road, wherein the current vehicle road information at least includes the current vehicle road hazard degree and the current vehicle road hazard threshold; 根据所述驾驶人信息和所述当前车路信息,结合环境响应因子,采用分段函数的方式,确定所述综合驾驶操作动作指数,其中,所述综合驾驶操作动作指数的计算公式为:According to the driver information and the current vehicle road information, combined with the environmental response factor, the comprehensive driving operation behavior index is determined by adopting a piecewise function, wherein the calculation formula of the comprehensive driving operation behavior index is:
Figure FDA0004066029330000021
Figure FDA0004066029330000021
式中,
Figure FDA0004066029330000022
为所述综合驾驶操作动作指数,z1为所述驾驶人状态,γ为所述环境响应因子,Hx,y为所述当前车路危险度,σ为道路修正参数,apre为实时操作量化参数,risk为所述当前车路危险阈值。
In the formula,
Figure FDA0004066029330000022
is the comprehensive driving operation action index, z 1 is the state of the driver, γ is the environmental response factor, H x, y is the current vehicle road risk, σ is the road correction parameter, and a pre is the real-time operation A quantitative parameter, risk is the current vehicle road hazard threshold.
4.如权利要求3所述的基于强化学习的人机共驾控制权切换方法,其特征在于,所述根据当前车辆在道路中的位置,确定当前车路信息,具体包括:4. The method for switching control rights of man-machine co-driving based on reinforcement learning according to claim 3, wherein said determining the current vehicle road information according to the position of the current vehicle on the road specifically comprises: 确定当前车辆在道路中的位置,至少包括所述当前车辆的与前车距离和所述当前车辆的横向位置;determining the position of the current vehicle on the road, including at least the distance between the current vehicle and the vehicle in front and the lateral position of the current vehicle; 根据所述当前车辆的与前车距离,确定纵向车路危险值;Determine the longitudinal vehicle road risk value according to the distance between the current vehicle and the preceding vehicle; 根据所述当前车辆的横向位置,确定横向车路危险值;determining a lateral roadway risk value according to the current lateral position of the vehicle; 根据所述纵向车路危险值、所述横向车路危险值,计算所述当前车路危险度,对应的计算公式为:According to the longitudinal vehicle road risk value and the transverse vehicle road risk value, the current vehicle road risk degree is calculated, and the corresponding calculation formula is:
Figure FDA0004066029330000031
Figure FDA0004066029330000031
式中,Hx,y为所述当前车路危险度,
Figure FDA0004066029330000032
为不同路段危险距离影响因子,其取值范围为[1,10],y1为所述纵向车路危险值,y2为所述横向车路危险值;
In the formula, H x, y is the current road hazard,
Figure FDA0004066029330000032
is the risk distance influencing factor of different road sections, and its value range is [1,10], y 1 is the risk value of the longitudinal vehicle road, and y 2 is the risk value of the transverse vehicle road;
根据所述当前车路危险度,计算不同场景的当前车路危险阈值,将所述当前车路危险阈值和所述当前车路危险度记作所述当前车路信息。According to the current vehicle road risk degree, calculate the current vehicle road risk threshold in different scenarios, and record the current vehicle road risk threshold and the current vehicle road risk degree as the current vehicle road information.
5.如权利要求3所述的基于强化学习的人机共驾控制权切换方法,其特征在于,所述环境响应因子γ的计算公式为:5. The method for man-machine co-driving control switching based on reinforcement learning according to claim 3, characterized in that, the calculation formula of the environmental response factor γ is:
Figure FDA0004066029330000033
Figure FDA0004066029330000033
式中,m为车辆质量,M为车辆类型及目的修正参数,k1为动力学修正参数,
Figure FDA0004066029330000034
代表车辆的期望速度以及速度方向,vlimleast(t)为最小速度值,k2为交通场景修正参数,
Figure FDA0004066029330000035
为车辆相互作用力参数,k3为行人遵守交通规则程度的修正参数,
Figure FDA0004066029330000036
为行人相互作用力参数,k4为周边物理环境复杂程度修正参数,
Figure FDA0004066029330000037
为环境相互作用力参数,k5为交通规则影响程度的修正参数,
Figure FDA0004066029330000038
为规则参数。
In the formula, m is the mass of the vehicle, M is the vehicle type and purpose correction parameter, k1 is the dynamic correction parameter,
Figure FDA0004066029330000034
Represents the expected speed and direction of the vehicle, v limleast (t) is the minimum speed value, k 2 is the traffic scene correction parameter,
Figure FDA0004066029330000035
is the vehicle interaction force parameter, k 3 is the correction parameter of the degree of pedestrian compliance with traffic rules,
Figure FDA0004066029330000036
is the pedestrian interaction force parameter, k 4 is the correction parameter of the complexity of the surrounding physical environment,
Figure FDA0004066029330000037
is the environmental interaction force parameter, k 5 is the correction parameter of the influence degree of traffic rules,
Figure FDA0004066029330000038
is a rule parameter.
6.如权利要求1至5中任一项所述的基于强化学习的人机共驾控制权切换方法,其特征在于,所述计算所述驾驶人与所述驾驶系统之间的所述驾驶权重,具体包括:6. The method for switching control rights of man-machine co-driving based on reinforcement learning according to any one of claims 1 to 5, wherein the calculation of the driving force between the driver and the driving system weight, including: 步骤9.1,使用Z-score标准化公式,对当前时刻的驾驶操作动作预测指数
Figure FDA0004066029330000039
与综合驾驶操作动作指数
Figure FDA00040660293300000310
进行标准化,分别计算这次驾驶中从开始到当前的驾驶操作动作预测指数
Figure FDA00040660293300000311
与综合驾驶操作动作指数
Figure FDA00040660293300000312
的均值与标准差;
Step 9.1, use the Z-score standardized formula to predict the driving operation action index at the current moment
Figure FDA0004066029330000039
and Comprehensive Driving Operation Action Index
Figure FDA00040660293300000310
Standardize and calculate the driving operation action prediction index from the beginning to the current driving in this driving respectively
Figure FDA00040660293300000311
and Comprehensive Driving Operation Action Index
Figure FDA00040660293300000312
The mean and standard deviation of ;
步骤9.2,将Z-Score标准化后的驾驶操作动作预测指数
Figure FDA00040660293300000313
与综合驾驶操作动作指数
Figure FDA00040660293300000314
以及当前分别相应的均值与标准差作为输入参数,输入进基于强化学习的人机共驾控制权切换系统,判断是否满足权重分配条件,若满足,执行步骤9.3,若不满足,重新获取驾驶人信息和车路预测信息;
Step 9.2, the driving operation action prediction index after Z-Score normalization
Figure FDA00040660293300000313
and Comprehensive Driving Operation Action Index
Figure FDA00040660293300000314
And the current corresponding mean and standard deviation as input parameters, input into the human-machine co-driving control switching system based on reinforcement learning, and judge whether the weight distribution conditions are met. If yes, perform step 9.3. If not, obtain the driver again. Information and road forecast information;
步骤9.3,基于Q学习算法,利用输入参数调整Q学习算法中的学习状态,并根据Q学习算法中下一个状态的价值最大值中的动作对驾驶人驾驶权权重进行赋值,其中,驾驶系统驾驶权权重为1与驾驶人驾驶权权重的差值。Step 9.3, based on the Q learning algorithm, use the input parameters to adjust the learning state in the Q learning algorithm, and assign a value to the driver's driving right weight according to the action in the value maximum value of the next state in the Q learning algorithm, wherein the driving system driving The weight is the difference between 1 and the driver's driving right weight.
7.如权利要求6所述的基于强化学习的人机共驾控制权切换方法,其特征在于,所述权重分配条件具体包括:7. The method for switching control rights of man-machine co-driving based on reinforcement learning according to claim 6, wherein the weight distribution conditions specifically include: 连续5次第一参数
Figure FDA0004066029330000041
与第二参数
Figure FDA0004066029330000042
均小于或等于第一触发阈值;或者,
5 consecutive first parameter
Figure FDA0004066029330000041
with the second parameter
Figure FDA0004066029330000042
are both less than or equal to the first trigger threshold; or,
连续3次所述第二参数
Figure FDA0004066029330000043
小于或等于第二触发阈值;或者,
3 consecutive times the second parameter
Figure FDA0004066029330000043
is less than or equal to the second trigger threshold; or,
连续3次所述第一参数
Figure FDA0004066029330000044
小于或等于所述第二触发阈值,其中,
3 consecutive times the first parameter
Figure FDA0004066029330000044
less than or equal to the second trigger threshold, where,
所述第一参数
Figure FDA0004066029330000045
为当前输入的驾驶操作动作预测指数a
Figure FDA0004066029330000046
与从驾驶行为开始到当前时刻为止所有输入的驾驶操作动作预测指数
Figure FDA0004066029330000047
的均值相差标准差的数目,
The first parameter
Figure FDA0004066029330000045
For the current input driving operation action prediction index a
Figure FDA0004066029330000046
and all input driving operation behavior prediction indices from the beginning of driving behavior to the current moment
Figure FDA0004066029330000047
The number of standard deviations from the mean of ,
所述第二参数
Figure FDA0004066029330000048
为当前输入的综合驾驶操作动作指数a
Figure FDA0004066029330000049
与从驾驶行为开始到当前时刻为止所有输入的综合驾驶操作动作指数
Figure FDA00040660293300000410
的均值相差标准差的数目。
The second parameter
Figure FDA0004066029330000048
is the currently input comprehensive driving operation action index a
Figure FDA0004066029330000049
and the comprehensive driving operation action index of all inputs from the beginning of driving behavior to the current moment
Figure FDA00040660293300000410
The number of standard deviations from the mean of .
CN202210758672.3A 2022-06-29 2022-06-29 A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning Active CN115071758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210758672.3A CN115071758B (en) 2022-06-29 2022-06-29 A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210758672.3A CN115071758B (en) 2022-06-29 2022-06-29 A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning

Publications (2)

Publication Number Publication Date
CN115071758A CN115071758A (en) 2022-09-20
CN115071758B true CN115071758B (en) 2023-03-21

Family

ID=83254772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210758672.3A Active CN115071758B (en) 2022-06-29 2022-06-29 A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning

Country Status (1)

Country Link
CN (1) CN115071758B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115534994A (en) * 2022-09-30 2022-12-30 北方工业大学 Man-machine driving sharing control right self-adaptive switching method based on cooperative sensing inside and outside vehicle
CN117681901B (en) * 2023-10-19 2025-03-18 杭州电子科技大学 A multi-objective human-vehicle sharing control method based on hierarchical deep reinforcement learning
CN119551015B (en) * 2025-01-26 2025-05-13 常熟理工学院 Method and device for allocating human-machine co-driving control rights considering driving group characteristics

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549367A (en) * 2018-04-09 2018-09-18 吉林大学 A kind of man-machine method for handover control based on prediction safety

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109070895B (en) * 2016-04-18 2021-08-31 本田技研工业株式会社 Vehicle control system, vehicle control method, and storage medium
US11940790B2 (en) * 2018-12-12 2024-03-26 Allstate Insurance Company Safe hand-off between human driver and autonomous driving system
JP7158352B2 (en) * 2019-08-08 2022-10-21 本田技研工業株式会社 DRIVING ASSIST DEVICE, VEHICLE CONTROL METHOD, AND PROGRAM
CN113341730B (en) * 2021-06-28 2022-08-30 上海交通大学 Vehicle steering control method under remote man-machine cooperation
CN113335291B (en) * 2021-07-27 2022-07-08 燕山大学 Man-machine driving-sharing control right decision method based on man-vehicle risk state

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549367A (en) * 2018-04-09 2018-09-18 吉林大学 A kind of man-machine method for handover control based on prediction safety

Also Published As

Publication number Publication date
CN115071758A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN115071758B (en) A Control Switching Method for Human-Machine Co-driving Based on Reinforcement Learning
WO2021077725A1 (en) System and method for predicting motion state of surrounding vehicle based on driving intention
CN109727469B (en) Comprehensive risk degree evaluation method for automatically driven vehicles under multiple lanes
CN113946943B (en) Human-vehicle-road micro traffic system modeling and risk identification method and device
CN111104969A (en) A method for predicting the possibility of collision between unmanned vehicles and surrounding vehicles
JP2023524574A (en) A Unified Quantification Method for Driving Risk Comprehensively Considering Each Element of People, Vehicles, and Roads
CN112258841B (en) Intelligent vehicle risk assessment method based on vehicle track prediction
CN111565990A (en) Software validation for autonomous vehicles
CN110989568B (en) Automatic driving vehicle safe passing method and system based on fuzzy controller
CN114117829B (en) Dynamic modeling method and system for man-vehicle-road closed loop system under limit working condition
CN112249008B (en) Unmanned automobile early warning method aiming at complex dynamic environment
CN114162145A (en) Automatic vehicle driving method and device and electronic equipment
CN117125083B (en) Vehicle following behavior risk quantification method considering driving style inclination
CN116432448B (en) Variable speed limit optimization method based on intelligent network coupling and driver compliance
CN108711285B (en) Hybrid traffic simulation method based on road intersection
CN119417671B (en) Intelligent driving scene self-adaptive teaching method and system based on reinforcement learning
CN116476861A (en) Automatic driving decision system based on multi-mode sensing and layering actions
CN117208021B (en) Unmanned vehicle control method for complex road conditions
CN117208019B (en) Longitudinal decision-making method and system under perceived occlusion based on value distribution reinforcement learning
CN102609599A (en) Method for designing emulational underground road alignment and transverse clear distance based on multiple intelligent agents
CN114863708B (en) Road confluence area roadside real-time accurate induction method for commercial vehicles
CN116910597A (en) Road safety driving behavior identification method based on GM-HMM
CN119176148A (en) Probabilistic driving behavior modeling system for vehicle
Toledo et al. Alternative definitions of passing critical gaps
CN116588123A (en) Risk Perception Early Warning Strategy Method Based on Safety Potential Field Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant