WO2023231569A1 - 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法 - Google Patents

一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法 Download PDF

Info

Publication number
WO2023231569A1
WO2023231569A1 PCT/CN2023/086547 CN2023086547W WO2023231569A1 WO 2023231569 A1 WO2023231569 A1 WO 2023231569A1 CN 2023086547 W CN2023086547 W CN 2023086547W WO 2023231569 A1 WO2023231569 A1 WO 2023231569A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
lane
changing
behind
time
Prior art date
Application number
PCT/CN2023/086547
Other languages
English (en)
French (fr)
Inventor
宋康
郭帆
谢辉
Original Assignee
天津大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天津大学 filed Critical 天津大学
Publication of WO2023231569A1 publication Critical patent/WO2023231569A1/zh

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0011Planning or execution of driving tasks involving control alternatives for a single driving scenario, e.g. planning several paths to avoid obstacles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
    • B60W30/18Propelling the vehicle
    • B60W30/18009Propelling the vehicle related to particular drive situations
    • B60W30/18163Lane change; Overtaking manoeuvres
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/10Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
    • B60W40/105Speed
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/10Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
    • B60W40/107Longitudinal acceleration
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of automatic driving behavior decision-making, and in particular to a vehicle-road collaborative decision-making algorithm for automatic driving vehicle lane-changing behavior based on Bayesian game.
  • An autonomous vehicle is a highly intelligent system that integrates environmental state perception, behavioral decision-making, and planning control.
  • Mixed traffic scenarios including autonomous vehicles and manned vehicles will soon appear and will gradually become a common traffic scenario.
  • Scenarios such as vehicles going up and down ramps, lane merging, and forced lane changes when encountering road construction and avoiding obstacles are typical scenarios that autonomous vehicles often encounter.
  • Lane changing is when It is completed in the process of continuous interaction between the vehicle and the vehicle next to it.
  • the lane-changing decision is a complex optimization problem with a large amount of uncertainty and the interaction of multiple agents. How to make a good decision-making algorithm for autonomous vehicles in this mixed scenario, and then achieve efficient, safe, and comfortable vehicle lane-changing behavior, is one of the keys to autonomous driving technology.
  • Patent (CN202011368453.1) discloses a A V2V-based vehicle cooperative lane-changing control method, literature (Yang Y, Dang S, He Y, et al. Markov decision-based pilot optimization for 5G V2X vehicular communications [J]. IEEE Internet of Things Journal, 2018, 6( 1):1090-1103.) The author uses 5G and V2X technology to assist autonomous vehicles in decision-making; literature (Hobert L, Festag A, Llatser I, et al.
  • V2X communication Enhancements of V2X communication in support of cooperative autonomous driving[J].IEEE communications magazine, 2015,53(12):64-70.)
  • the author also uses V2X facilities to assist autonomous vehicles in driving. Although these strategies can improve traffic safety and efficiency, they are overly dependent on inter-vehicle communication equipment and roadside infrastructure, and are still difficult to be widely promoted in the short term.
  • State machine models mainly include finite state machine models (FSM) and hierarchical state machine models (HSM).
  • FSM finite state machine models
  • HSM hierarchical state machine models
  • This type of model has been adopted by many autonomous vehicles because of its simple structure and clear logic, such as: Literature (Bacha A, Bauman C, Faruque R, et al. Odin: Team victortango's entry in the darpa urban challenge[J].Journal of field Robotics, 2008, 25(8):467-492.) The 2005 DAPRA competition champion car Junior.
  • this type of model does not consider the complex coupling and game process of the interaction between the vehicle and the side vehicle during the vehicle lane change process, and is difficult to be applied to the lane change decision-making task in a structured road environment.
  • the inference decision-making model imitates the behavioral decision-making process of human drivers by the mapping relationship of "scene characteristics-driving actions".
  • This type of model stores driving knowledge in the knowledge base or neural network, and uses the query mechanism to obtain the information from the knowledge base or trained Driving actions are inferred in neural networks, for example: used in the literature (Bojarski M, Del Testa D, Dworakowski D, et al. End to end learning for self-driving cars [J]. arXiv preprint arXiv:1604.07316, 2016.)
  • a method for learning the mapping relationship from features of perceptual images to specific driving behavior controls However, this type of method rarely considers the impact of interactivity in the vehicle lane-changing decision-making process. It mainly relies on the fixed paradigm of the training data and does not consider the uncertainty of the driving style and driving intention of the vehicle next to it in the lane-changing decision-making process. .
  • Non-cooperative game is a type of vehicle interaction behavior commonly used in the industry, and its optimal behavior is determined by Nash equilibrium conditions.
  • the literature (Pekkanen, J., Lappi, O., Rinkkala, P., Tuhkanen, S., Frantsi, R., Summala, H., 2018.
  • the purpose of the present invention is to solve the problems in the existing technology of difficult, risky, and inefficient lane changing decisions for autonomous vehicles due to the difficulty in accurately judging the driving style of the next car during the lane changing process of autonomous vehicles, and to provide a method based on Bei Vehicle collaborative decision-making algorithm for lane-changing behavior of autonomous vehicles based on Yeasian game.
  • a vehicle collaborative decision-making algorithm for lane-changing behavior of autonomous vehicles based on Bayesian game including the following steps:
  • Step 1 Establish the prior probability distribution of the driving style of the vehicle beside the vehicle: obtain the vehicle driving data through the intelligent network-connected road-side sensor, record and count the prior probability distribution of the driving style of the vehicle in different time periods and road sections, and define the driving style of the vehicle beside the vehicle.
  • Vehicle driving styles include aggressive A (aggressive) and non-aggressive NA (none aggressive);
  • the lane-changing willingness calculation module outputs the lane-changing willingness: collect the vehicle SV and the surrounding vehicle information through the vehicle-mounted sensor, define and calculate the original lane driving predicted vehicle distance and lane changing predicted vehicle distance, and introduce the expected distance and The variance constructs a cumulative distribution function to calculate the necessity and safety of lane changing, and establishes a lane changing willingness output model based on fuzzy logic.
  • the following steps 3 to 7 are performed;
  • Step 3 Use Bayesian filtering to infer the posterior probability of the driving style of the RV behind the target lane: when the vehicle SV (specified vehicle) of the vehicle has a desire to change lanes, the on-board sensors collect the driving style of the RV (rear vehicle) behind the target lane. Acceleration information is used to obtain the likelihood function of the driving style of the RV behind the target lane. Through the likelihood function and the prior probability distribution obtained in step 1, the posterior probability of the vehicle driving style of the RV behind the target lane and the target lane are obtained.
  • Driver aggressiveness factor ⁇ of the following RV the value range of this factor is [0,1]);
  • Step 4 Use the long short-term memory neural network LSTM (Long Short-Term Memory) and the vehicle kinematics model to predict the driving trajectory, speed and acceleration of the vehicle SV and the vehicle behind the target lane RV in the future derivation time domain;
  • LSTM Long Short-Term Memory
  • LSTM Long Short-Term Memory
  • Step 5 Establish a game profit matrix and solve it to obtain the lane-changing execution probability: Establish a non-cooperative game profit matrix.
  • the profit matrix includes the profit matrix composed of the vehicle SV and the aggressive and non-aggressive target lane following vehicle RV.
  • the design of the revenue function includes safety prediction revenue, time prediction revenue, comfort prediction revenue and cooperation prediction revenue, and then the lane change execution probability is obtained by solving the revenue matrix;
  • Step 6 Update the vehicle status: when the lane change execution probability does not reach the execution threshold, the own vehicle SV does not execute the lane change and only updates the longitudinal trajectory of the own vehicle SV; when the lane change execution probability reaches the probability threshold, the own vehicle SV is updated at the same time.
  • car vehicle SV ’s lane changing trajectory and longitudinal trajectory;
  • Step 7 Cyclically execute dynamic game decision-making: cyclically execute steps 3 to 6 until the execution of the lane-changing strategy is completed or the willingness to change lanes disappears.
  • step 1 the over-clustering algorithm obtains the number n (A) of aggressive A drivers and the number n (NA) of non-aggressive NA drivers in the set road section and time period, and solves the above The prior probability distribution of vehicle driving style:
  • road and time represent the road section and time period respectively
  • p(A) represents the probability that the vehicle's driving style is aggressive A
  • 1-p(A) represents the probability that the vehicle's driving style is non-aggressive NA.
  • step 2 the original lane driving predicted vehicle distance d min and the lane changing predicted vehicle distance l min are defined and calculated, where d min is the predicted vehicle distance of all original lane driving in the future t time
  • d min is the predicted vehicle distance of all original lane driving in the future t time
  • the variance constructs a cumulative distribution function to calculate the necessity and safety of lane changing:
  • P ne and P sf respectively represent the necessity and safety of lane changing
  • u k and u l respectively represent the expectation of predicted vehicle distance in the original lane and the expectation of predicted vehicle distance in lane change
  • represents the variance
  • the membership function of lane-changing necessity, lane-changing safety and lane-changing willingness is constructed, and the lane-changing willingness is obtained based on the fuzzy rule table and centroid method defuzzification operation.
  • step 3 In the above technical solution, in step 3,
  • the vehicle-mounted sensor obtains the accuracy y of measuring the driving acceleration of the RV behind the target lane, and the likelihood function of measuring the non-deceleration state of the RV behind the target lane is L( ⁇
  • A) (y,1-y) , the likelihood function that the RV behind the target lane is in a decelerating state is L( ⁇
  • NA) (1-y,y);
  • the posterior probability distribution of vehicle driving style of the RV behind the target lane is obtained through the prior probability distribution and likelihood function of the vehicle's driving style: P t (Y
  • (road, time)) normalize(P 0 (X
  • V type is a unit vector.
  • step 4 the speed and acceleration of the own vehicle and the vehicle behind the target lane are predicted using long and short memory neural networks, and the non-lane changing behavior of the host vehicle and the vehicle behind the target lane in the future is deduced
  • the driving trajectory prediction is deduced and predicted by the vehicle kinematics model.
  • the trajectory prediction of the vehicle's lane changing behavior is deduced and predicted by deducing the longitudinal driving trajectory from the vehicle kinematics model combined with the lateral driving trajectory deduced by the fifth degree polynomial curve.
  • step 5 the profit matrix of the vehicle SV and the aggressive vehicle behind the target lane RV is:
  • U 11 , U 12 , U 21 , and U 22 respectively indicate that the vehicle and the vehicle behind the aggressive target lane are in [lane change, slow down], [lane change, accelerate], [no lane change, slow down] and [no lane change] , acceleration] the income of the vehicle under the combination of four strategies;
  • O 11 , O 12 , O 21 , and O 22 respectively indicate that the vehicle and the vehicle behind the aggressive target lane are in [change lane, slow down], [change lane, accelerate], [no lane change, slow down] and [no lane change] , acceleration]
  • the benefits of the vehicle behind the target lane under the combination of four strategies.
  • the profit matrix of the vehicle SV and the non-aggressive vehicle behind the target lane RV is:
  • U 33 , U 34 , U 43 , and U 44 respectively indicate that the vehicle and the vehicle behind the non-aggressive target lane are in [Change lane, slow down], [Change lane, accelerate], [No lane change, slow down] and [No change. Road, acceleration]
  • O 33 , O 34 , O 43 , and O 44 respectively indicate that the vehicle and the vehicle behind the non-aggressive target lane are in [Change lane, slow down], [Change lane, accelerate], [No lane change, slow down] and [No change. Road, acceleration]
  • the benefits of the vehicle behind the target lane under the combination of four strategies.
  • the income U of the vehicle SV and the income Q of the vehicle behind the target lane RV calculate the income in the future, including four parts:
  • v SV (t′) and v RV (t′) are the vehicle speeds of the own vehicle SV and the vehicle behind the target lane RV at the predicted time t′
  • a c (t′) is the vehicle collision determination area at the predicted time t′.
  • Overlapping area A s (t′) is the overlapping area of the vehicle safety reserved area at the predicted time t′
  • ⁇ 11 and ⁇ 12 are the collision weight and safety reserved weight
  • I(A c ) and I(A s ) are 0 -1 function, when the corresponding safe areas overlap, it is taken as 1, and when there is no overlap, it is taken as 0;
  • v(t′) represents the speed of the vehicle behind the target lane at the predicted moment in the game
  • Jerk(t′) represents the derivative jerk of the acceleration at the predicted time
  • the vehicle SV of the vehicle and the vehicle RV behind the target lane are combined and weighted to form the total revenue of the target vehicle:
  • the radical factor ⁇ t is used to construct the weighting coefficient of the vehicle income U of the vehicle:
  • k [k 1 , k 2 , k 3 , k 4 ] represents the gain coefficient for each predicted income.
  • E p P t (Y
  • the lane-changing probability when the expected profit E p is maximum can be obtained as
  • V ( ⁇ (x j )) is the optimization speed function:
  • v max represents the maximum speed of the vehicle
  • h c is the safe distance between vehicles
  • ⁇ x j (t) is the actual distance between vehicles at time t.
  • step 6 when the lane change execution probability reaches the probability threshold, the vehicle lane change trajectory and the longitudinal trajectory are updated at the same time.
  • the Bayesian game idea is adopted, by introducing the driving style prior and the driving style of the side car.
  • Posterior estimation combined with the probability estimation of whether to yield to the next vehicle under different driving styles, establishes a decision-making algorithm with interactive deduction and reasoning learning capabilities, which is expected to be safer and more efficient in mixed scenarios of autonomous driving and manual driving. and more comfortable lane-changing decisions.
  • Figure 1 is a framework diagram of the decision-making algorithm for lane-changing behavior of autonomous vehicles based on Bayesian game
  • Figure 2 is a schematic diagram of the vehicle area of interest division
  • Figure 3 is a schematic diagram of the original lane driving prediction distance calculation
  • Figure 4 is a schematic diagram for calculating the lane changing distance between vehicles
  • Figure 5 shows the membership degree of the necessity of changing lanes, the feasibility of changing lanes, and the willingness to change lanes
  • FIG 6 is the structure diagram of long short memory neural network (LSTM).
  • Figure 7 is a schematic diagram of the calculation of safety prediction income
  • Figure 8 is a schematic diagram of establishing a multi-group lane changing game.
  • Step 1 Establish a prior probability distribution of vehicle driving style: obtain vehicle driving data through intelligent network-connected road-side sensors, such as vision cameras and lidar.
  • vehicle driving styles to include aggressive A (aggressive) and non-aggressive NA (none aggressive).
  • aggressive A aggressive
  • non-aggressive NA non-aggressive
  • the clustering algorithm uses k-means, but is not limited to this.
  • the k-means clustering algorithm can be used to obtain the number n (A) of aggressive A drivers and the number n (NA) of non-aggressive NA (none aggressive) drivers in the set road section and time period.
  • Equation (1) the prior probability distribution P 0 (X
  • the formula is expressed as Equation (1):
  • road and time represent the road section and time period respectively
  • p(A) represents the probability that the vehicle's driving style is aggressive A
  • 1-p(A) represents the probability that the vehicle's driving style is non-aggressive NA
  • the lane-changing willingness calculation module outputs the lane-changing willingness: the necessity and feasibility of lane-changing are necessary conditions for the generation of lane-changing willingness.
  • Information about the own vehicle and surrounding vehicles is collected, and the area of interest of the own vehicle is divided. , as shown in Figure 2 display, including the front area of interest, the left area of interest, and the right area of interest respectively.
  • the original lane driving predicted vehicle distance is defined and calculated as the minimum value d min of all predicted vehicle distances [d 1 , d 2 ,..., d t ] in the future t time. It is defined and calculated as shown in Figure 4 Calculate the predicted distance between lane changing vehicles as the minimum value l min of all predicted vehicle distances [l 1 , l 2 ,..., l t ] in the future t time.
  • the cumulative distribution function is calculated by introducing the expectation and variance of the predicted vehicle distance. Necessity and safety of lane changing.
  • the formula expression is as shown in formula (2)-(3):
  • P ne and P sf respectively represent the necessity and feasibility of lane changing
  • u k and u l respectively represent the expectation of predicted vehicle distance in the original lane and the expectation of predicted vehicle distance in lane change
  • represents the variance
  • a lane-changing willingness output model based on fuzzy logic was established.
  • Figures 5(a), 5(b), and 5(c) we construct the affiliations of lane-changing necessity, lane-changing feasibility, and lane-changing willingness respectively.
  • Table 1 the fuzzy set classification method and fuzzy rule formulation form in the table are recommended but not limited to.
  • Lane changing willingness threshold It is artificially set and can be set according to the actual situation, and then the subsequent Bayesian game lane change decision will be made.
  • Step 3 Use Bayesian filtering to infer the posterior probability of vehicle driving style:
  • the likelihood function of the acceleration of the RV behind the target lane is obtained through the on-board sensor.
  • the accuracy of measuring the vehicle's driving acceleration can be obtained from the product design parameters of the vehicle-mounted sensor as y, then the likelihood function of measuring the vehicle's driving state as a non-deceleration state is L( ⁇
  • A) (y,1-y), and the measurement
  • the likelihood function until the vehicle's driving state is a deceleration state is L( ⁇
  • NA) (1-y,y).
  • (road, time)) of the vehicle driving style is obtained through the prior probability distribution and likelihood function of the vehicle driving style.
  • the formula is expressed as shown in Equation (4): P t (Y
  • (road, time)) normalize(P 0 (X
  • the V type can be represented by a unit vector. For example, (1,0) indicates that the vehicle's driving style is aggressive, and (0,1) indicates that the vehicle's driving style is non-aggressive.
  • Step 4 vehicle behavior prediction
  • the purpose of vehicle behavior prediction is to predict the driving trajectory, speed and acceleration of the vehicle SV and the vehicle behind the target lane RV in the future derivation time domain H, where the driving trajectory consists of discrete path points (x t′ ,y t′ ) in the future time Representation, velocity and acceleration are represented by symbols v(t′) and a(t′).
  • the speed and acceleration of the vehicle SV and the vehicle behind the target lane RV are predicted using the long short memory neural network (LSTM).
  • the LSTM neural network structure is shown in Figure 6.
  • the network consists of a forget gate, an input gate, an output gate and a cell state.
  • the update composition is characterized by the introduction of a gate mechanism to control the circulation and loss of features, which improves the long-term dependency problem existing in recurrent neural networks (RNN).
  • RNN recurrent neural networks
  • the performance of LSTM is usually better than that of time-recurrent neural networks and hidden Marko.
  • HMM is better.
  • v represents the predicted speed.
  • the trajectory prediction of the vehicle's lane-changing behavior is predicted by deducing the longitudinal trajectory of the vehicle kinematics model and deducing the lateral trajectory of the fifth-degree polynomial curve.
  • the fifth-degree polynomial curve is also applied to the vehicle status update of the vehicle's lane-changing behavior, which will be carried out later. Detailed introduction.
  • Step 5 Establish a game profit matrix and solve it to obtain the lane-changing execution probability:
  • the income matrix includes the income matrix composed of the vehicle SV and the aggressive and non-aggressive RVs behind the target lane respectively.
  • the income matrix composed of the vehicle SV and the aggressive RV behind the target lane is shown in Table 2:
  • the income matrix composed of the SV of the own vehicle and the non-aggressive RV of the vehicle behind the target lane is shown in Table 3:
  • the calculation of the income U of the vehicle SV and the income Q of the vehicle behind the target lane RV uses the idea of model prediction to calculate the income in the future, thereby improving the predictability and safety of behavioral decision-making.
  • the income calculation includes four parts:
  • Vehicle safety is one of the most important benefits of intelligent vehicle driving.
  • the thick solid line is the vehicle collision determination area
  • the thick dotted line is the safety reserved area.
  • v SV (t′) and v RV (t′) are the vehicle speeds of the own vehicle SV and the vehicle behind the target lane RV at the predicted time t′
  • a c (t′) is the vehicle collision determination area at the predicted time t′
  • the overlapping area, A s (t′) is the overlapping area of the vehicle safety reserved area at the predicted time t′.
  • the positioning points of the vehicle SV and the vehicle behind the target lane RV are obtained through on-board sensors, combined with the artificially set safety reserved area.
  • the parameters w s , l sf , and l sr can solve the overlapping area A s (t′) of the safety reserved area.
  • a c (t′), ⁇ 11 and ⁇ 12 are collision weights and safety reserve weights.
  • I(A c ) and I(A s ) are 0-1 functions. When the corresponding safety areas overlap, they are taken as 1. When overlapping, it is taken as 0, and the formula of I(A c ) is as shown in Equation (8):
  • the acceleration a j (t′) of the predicted moment of the side vehicle in the game is used as a quantitative indicator of the cooperative prediction benefit, where j represents the number of the vehicle behind the target lane.
  • the total revenue includes the above four important indicators, which are combined and weighted to form the total revenue of the target vehicle.
  • the calculation method of each revenue of the vehicle SV and the RV behind the target lane is as shown in Equations (13)-(14):
  • k [k 1 , k 2 , k 3 , k 4 ] represents the gain coefficient of each predicted income, and its purpose is to adjust the value of each predicted income so that it maintains the same order of magnitude;
  • Game 1 As shown in Figure 8, after the own vehicle SV generates the intention to change lanes, it forms multiple games with multiple vehicles behind the target lane.
  • the game processes are numbered according to the relative longitudinal distance between the vehicles behind the target lane and the own vehicle SV.
  • Game 1, Game 2,..., Game N are formed in sequence.
  • the purpose of the lane change decision-making is to first complete Game 1, which specifically means that the vehicle SV completes the lane change behavior and drives to the target lane in Game 1 in front of the vehicle behind it. If the lane-changing behavior cannot be completed during the process of solving the payoff matrix of Game 1, then Game 2 to Game N will be renumbered as Game 1, Game 2,..., Game N, and then continue to solve the lane-changing probability of Game 1 and perform the lane-changing behavior. decision making.
  • E p P t (Y
  • the change when the expected return E p is the maximum is obtained.
  • the probability is
  • Step 6 vehicle status update:
  • Equation (18) V ( ⁇ (x j )) is the optimization speed function, and the formula is shown in Equation (18):
  • v max represents the maximum speed of the vehicle
  • h c is the safe distance between vehicles
  • ⁇ x j (t) is the actual distance between vehicles at time t.
  • A [a 0 , a 1 , a 2 , a 3 , a 4 , a 5 ] is the polynomial coefficient
  • t is the sampling time during the lane change process
  • the value of a 5 is set artificially.
  • Step 7 Cyclically execute dynamic game decisions:

Abstract

提供了一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法,一方面通过智能网联道路感知和大数据分析归纳不同时段和交通流态下的旁车驾驶风格统计特征,作为对旁车驾驶风格的先验估计;另一方面,不断观测两车换道过程中的动态交互行为,并对旁车的驾驶风格做后验校正,提高估计精度,在本车车辆(SV)产生换道意愿时,通过对驾驶风格及不同风格下旁车让行与否的概率的迭代估计,采用贝叶斯博弈,求解车辆在未来行驶片段内综合考虑风格与驾驶意图概率的价值回报,并给出换道概率,在换道概率超过阈值后,发出换道启动指令。

Description

一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法 技术领域
本发明涉及自动驾驶行为决策技术领域,特别是涉及一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法。
背景技术
自动驾驶车辆是一种将环境状态感知、行为决策及规划控制集合于一体的高度智能化系统。随着自动驾驶技术的快速发展,包含自动驾驶车辆和有人驾驶车辆的混合交通场景将很快出现,并将逐步成为常见的交通场景。车辆上下匝道、车道合并,以及遇到道路施工及躲避障碍物等强制变道场景,是自动驾驶车辆经常遇到的典型场景。据统计,在人工驾驶中,因变更车道引发的交通事故占比达35%以上,而在所有换道事故中,大约有75%的交通事故是由于驾驶员对于换道决策的判断失误而发生的。可见,自动驾驶车辆如何作出科学合理的换道决策,是自动驾驶车辆决策算法的关键,也是后续自动驾驶车辆规划及控制算法的重要基础。
虽然交通法规对于车辆道路行驶制定了明确规范和要求,但是在实际道路行驶工况中,车辆换道的决策算法依然面临比较大的挑战,主要有以下几个方面原因:1)换道是在本车与旁车不断交互的过程中完成的,有复杂的耦合影响和博弈过程,与驾驶风格、驾驶意图关系密切;2)旁车的驾驶风格难以准确判断,很难准确估计旁车是配合还是抵制本车的换道行为;3)即使驾驶风格相对清晰,每一次换道中的具体驾驶意图也会受情绪、突发干扰等其他不确定性因素的影响。
因此,换道决策,是一个存在大量不确定性和多智能体交互影响条件下的复杂优化问题。如何在这种混杂场景下,做好自动驾驶车辆的决策算法,进而实现高效、安全、舒适的车辆换道行为,是自动驾驶技术的关键之一。
针对上述难题,有学者提出了基于车间协同与车路协同的方法,即运用车辆彼此之间的通讯以及与道路基础设施之间的通信来解决交通冲突,比如:专利(CN202011368453.1)公开一种基于V2V的车辆协同换道控制方法,文献(Yang Y,Dang S,He Y,et al.Markov decision-based pilot optimization for 5G V2X vehicular communications[J].IEEE Internet of Things Journal,2018,6(1):1090-1103.)中作者使用5G及V2X技术辅助自动驾驶车辆进行决策;文献(Hobert L,Festag A,Llatser I,et al.Enhancements of V2X communication in support of  cooperative autonomous driving[J].IEEE communications magazine,2015,53(12):64-70.)中作者同样使用V2X设施协助自动驾驶车辆行驶。虽然这些策略能够提高交通的安全性和效率,但其过度依赖于车间通讯设备以及路侧基础设施,在短期内仍然难以大范围推广。
此外,还有大量的研究工作聚焦于对单车智能的决策算法研究,例如:状态机模型、推理决策模型、以及基于博弈论的决策方法等。
状态机模型主要包括有限状态机模型(FSM)和层次状态机模型(HSM)。该类模型因为结构简单、逻辑明确被众多的自动驾驶车辆采用,比如:文献(Bacha A,Bauman C,Faruque R,et al.Odin:Team victortango's entry in the darpa urban challenge[J].Journal of field Robotics,2008,25(8):467-492.)中的2005年DAPRA比赛冠军车Junior。但是该类模型并未考虑车辆变道过程中本车与旁车交互的复杂耦合和博弈过程,很难适用于结构化特征道路环境下的换道决策任务。
推理决策模型由“场景特征-驾驶动作”的映射关系来模仿人类驾驶员的行为决策过程,该类模型将驾驶知识储存在知识库或神经网络中,通过查询的机制从知识库或训练好的神经网络中推理出驾驶动作,比如:文献(Bojarski M,Del Testa D,Dworakowski D,et al.End to end learning for self-driving cars[J].arXiv preprint arXiv:1604.07316,2016.)中使用了学习从感知图像的特征到具体驾驶行为控制之间的映射关系的方法。然而,该类方法对车辆换道决策过程中的交互性影响考虑也比较少,主要依赖于训练数据的固定范式,也没有考虑换道决策过程中旁车的驾驶风格和驾驶意图的不确定性。
通过上述分析可以看到,考虑多车交互是提升自动驾驶车辆决策水平的重要突破口。由此,博弈论在车辆交互特性建模不断得到重视。非合作博弈是行业普遍采用的车辆交互行为类型,其最优行为由纳什均衡条件决定。例如,文献(Pekkanen,J.,Lappi,O.,Rinkkala,P.,Tuhkanen,S.,Frantsi,R.,Summala,H.,2018.Acomputational model for driver’s cognitive state,visual perception,and intermittent attention in a distracted car following task.R.Soc.Open Sci.5(9),180194.)中,作者将博弈论的方法应用于跟车策略中;文献(Q.Zhang,R.Langari,H.E.Tseng,D.Filev,S.Szwabowski and S.Coskun,"A Game Theoretic Model Predictive Controller With Aggressiveness Estimation for Mandatory Lane Change,"in IEEE Transactions on Intelligent Vehicles,vol.5,no.1,pp.75-89,March 2020.)的作者将基于博弈论的模型预测控制应用于车辆决策。然而,上述博弈过程中每个局中人(车辆)的驾驶风格、风险偏好和环境敏感性的不确定性却鲜有考虑,这使得采用假设的、单一的驾驶员行为收益模型与实际不符,制约了车辆在复杂场景下决策水平的提升。
综上所述,在换道过程中对旁车驾驶风格、驾驶意图的估计对换道行为的安全性、高效 性及舒适性至关重要,但现有方法没有特别提出与之相适应的决策算法。因此,如何在混杂场景、旁车驾驶风格不确定的条件下,借助先进的感知与数据处理技术,融合先进的数据方法,开展考虑多车动态交互博弈条件下的科学合理决策算法,对于提升自动驾驶车辆的行为决策品质具有重要意义。但是,目前公开发表的资料中还鲜有报道。
发明内容
本发明的目的是针对现有技术中针对自动驾驶车辆在换道过程中因旁车的驾驶风格难以准确判断而造成的换道难决策、有风险、低效率的问题,而提供一种基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法。
为实现本发明的目的所采用的技术方案是:
一种基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,包括以下步骤:
步骤1,建立旁车的车辆驾驶风格的先验概率分布:通过智能网联路端传感器获取车辆行驶数据,记录并统计在不同时段、路段下的车辆驾驶风格先验概率分布,定义旁车的车辆驾驶风格包括激进型A(aggressive)与非激进型NA(none aggressive)两种;
步骤2,换道意愿计算模块输出换道意愿:通过车载传感器采集本车车辆SV及其周围车辆信息,定义并计算原始车道行驶预判车距和换道预判车距,通过引入期望距离及方差构建累计分布函数计算换道必要性和换道安全性,并建立基于模糊逻辑的换道意愿输出模型,当换道意愿达到设定阈值后执行以下步骤3至步骤7;
步骤3,利用贝叶斯滤波推断目标车道后车RV的驾驶风格后验概率:当本车车辆SV(specified vehicle)产生换道意愿后,通过车载传感器采集目标车道后车RV(rear vehicle)的加速度信息来获得目标车道后车RV驾驶风格的似然函数,通过所述的似然函数及步骤1得到的所述先验概率分布得到目标车道后车RV的车辆驾驶风格后验概率及目标车道后车RV的驾驶员激进性因子β(该因子的取值范围是[0,1]);
步骤4,通过长短记忆神经网络LSTM(Long Short-Term Memory)及车辆运动学模型预测本车车辆SV及目标车道后车RV未来推演时域的行驶轨迹、速度及加速度;
步骤5,建立博弈收益矩阵并求解得到换道执行概率:建立非合作博弈的收益矩阵,该收益矩阵分别包括本车车辆SV与激进型及非激进型的目标车道后车RV所构成的收益矩阵,收益函数的设计包括安全预测收益、时间预测收益、舒适性预测收益及合作预测收益,再通过求解收益矩阵得到换道执行概率;
步骤6,对车辆状态进行更新:换道执行概率没有达到执行阈值时,本车车辆SV不执行换道,仅更新本车车辆SV的纵向轨迹;换道执行概率达到概率阈值时,同时更新本车车辆 SV的换道轨迹和纵向轨迹;
步骤7,循环执行动态博弈决策:循环执行步骤3至步骤6,直到换道策略执行完成或者换道意愿消失。
在上述技术方案中,所述步骤1中,过聚类算法获得设定路段及时段下的激进型A驾驶员数量n(A)和非激进型NA驾驶员数量n(NA),求解所述的车辆驾驶风格先验概率分布:
其中road和time分别表示所处路段及时段,p(A)表示车辆驾驶风格为激进型A的概率,1-p(A)表示表示车辆驾驶风格为非激进型NA的概率。
在上述技术方案中,所述步骤2中,定义并计算原始车道行驶预判车距dmin及换道预判车距lmin,其中dmin为未来t时刻内所有原始车道行驶预判车距[d1,d2,…,dt]的最小值,未来t时刻内所有换道预判车距[l1,l2,…,lt]的最小值lmin,通过引入期望距离及方差构建累计分布函数计算换道必要性和换道安全性:

其中Pne和Psf分别表示换道必要性和换道安全性,uk和ul分别表示原始车道行驶预判车距的期望及换道预判车距的期望,σ表示方差;
在所述换道意愿输出模型中,构建换道必要性、换道安全性及换道意愿的隶属度函数,依据模糊规则表及质心法去模糊化运算求解得到换道意愿
若满足换道意愿>换道意愿阈值则进行后续的贝叶斯博弈换道决策。
在上述技术方案中,所述步骤3中,
所述车载传感器获得测量目标车道后车RV行驶加速度的准确性y,测量到目标车道后车RV行驶状态为非减速状态的似然函数为L(θ|A)=(y,1-y),测量到目标车道后车RV行驶状态为减速状态的似然函数为L(θ|NA)=(1-y,y);
通过车辆驾驶风格的先验概率分布及似然函数得到目标车道后车RV的车辆驾驶风格后验概率分布:
Pt(Y|(road,time))=normalize(P0(X|(road,time)*L(θ))
同时认为t+1时刻的先验概率分布延续t时刻的后验概率分布;
通过所述的车辆驾驶风格后验概率可得第t时刻的目标车道后车RV的驾驶员激进性因子βt
βt=Vtype·Pt(Y|(road,time))
Vtype为单位向量。
在上述技术方案中,所述步骤4中,本车车辆及目标车道后车的速度、加速度使用长短记忆神经网络进行预测,本车车辆及目标车道后车在未来推演时域的非换道行为的行驶轨迹预测由车辆运动学模型进行推演预测,本车车辆换道行为的轨迹预测通过车辆运动学模型推演纵向行驶轨迹结合五次多项式曲线推演横向行驶轨迹进行推演预测。
在上述技术方案中,所述步骤5中,本车车辆SV与激进型的目标车道后车RV的收益矩阵为:
其中:
U11、U12、U21、U22分别表示本车车辆与激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下本车车辆的收益;
O11、O12、O21、O22分别表示本车车辆与激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下目标车道后车的收益。
本车车辆SV与非激进型的目标车道后车RV的收益矩阵为:
其中:
U33、U34、U43、U44分别表示本车车辆与非激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下本车车辆的收益;
O33、O34、O43、O44分别表示本车车辆与非激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下目标车道后车的收益。
在上述技术方案中,本车车辆SV的收益U及目标车道后车RV的收益Q计算了未来时刻的收益,共包括四部分:
(1)安全预测收益:
Term(sf)=-{ω11[Ac(t′)+vSV(t′)*vRV(t′)]*I(Ac)
12[As(t′)+vSV(t′)*vRV(t′)]*I(As)}
式中vSV(t′)和vRV(t′)是预测时刻t′下本车车辆SV及目标车道后车RV的车速,Ac(t′)为预测时刻t′车辆碰撞判定区域的重叠面积,As(t′)为预测时刻t′车辆安全预留区域的重叠面积,ω11和ω12为碰撞权重和安全预留权重,I(Ac)和I(As)为0-1函数,当相应的安全区域出现重叠取为1,不重叠时取为0;
(2)时间预测收益:
Term(time)=v(t′)
v(t′)表示博弈中目标车道后车预测时刻的速度;
(3)舒适性预测收益:
使用车辆行驶过程中预测时刻加速度的导数加加速度Jerk作为舒适性预测收益:
Term(cf)=-|Jerk(t′)|
Jerk(t′)表示预测时刻加速度的导数加加速度;
(4)合作预测收益:
使用博弈中目标车道后车RV的预测时刻的加速度aj(t′)作为合作预测收益的量化指标:
Term(gt)=-|aj(t′)|
本车车辆SV及目标车道后车RV通过组合和加权来构成对象车辆总的收益:

其中ω=[ω1234]和σ=[σ1234]为加权系数,
采用激进性因子βt构建本车车辆收益U的加权系数:
其中k=[k1,k2,k3,k4]表示对各项预测收益的增益系数。
在上述技术方案中,本车车辆SV决定换道概率时会根据本车车辆SV与激进型及非激进型的目标车道后车RV所构成的收益矩阵考虑八种情况,期望收益Ep为:
Ep=Pt(Y|(road,time))*[Pt(lc)*(U11+U12)+(1-Pt(lc))*(U21+U22)]
+(1-Pt(Y|(road,time)))*[Pt(lc)*(U33+U34)+(1-Pt(lc))*(U43+U44)]
可求得期望收益Ep最大时的换道概率为
在上述技术方案中,所述步骤6中,换道概率没有达到既定的执行阈值,本车车辆仅更新车辆纵向轨迹,纵向轨迹使用全速度差(FVD)模型:
aj(t)=ρ[V(Δ(xj))-vj(t)]+λΔvj(t)
其中j表示车辆编号,aj(t)表示t时刻的车辆加速度,vj(t)表示t时刻的车辆速度,Δvj(t)表示t时刻的速度差,ρ和λ是权重系数,V(Δ(xj))是优化速度函数:
其中vmax表示车辆的最大速度,hc是车辆间的安全距离,Δxj(t)是t时刻车辆间的实际距离。
在上述技术方案中,所述步骤6中,换道执行概率达到概率阈值时,同时更新车辆换道轨迹和纵向轨迹,纵向轨迹使用全速度差(FVD)模型,横向轨迹采用推荐使用五次多项式:
y(t)=a0+a1t+a2t2+a3t3+a4t4+a5t5
其中A=[a0,a1,a2,a3,a4,a5]为多项式系数。与现有技术相比,本发明的有益效果是:
1.针对换道过程中,旁车驾驶风格不确定而造成的换道过程决策风险高、效率低、舒适性差的问题,采用贝叶斯博弈的思路,通过引入旁车的驾驶风格先验与后验估计,结合不同驾驶风格下的旁车让行与否的概率估计,建立了具备交互推演与推理学习能力的决策算法,有希望在自动驾驶与人工驾驶混杂场景下完成更安全、更高效及更舒适的换道决策。
2.针对旁车驾驶风格和意图不确定,特别是受到时段、交通拥堵情况等大量因素影响的问题,提出了基于智能网联路测感知和大数据分析的方法,归纳不同时段和交通流态下的旁车驾驶风格的先验估计思路,有望挖掘交通大数据后的统计规律,并有效服务于车辆的决策 行为,提升决策的效率,改进决策合理性。
附图说明
图1是基于贝叶斯博弈的自动驾驶车辆换道行为决策算法框架图;
图2是车辆感兴趣区域划分示意图;
图3是原始车道行驶预判车距计算示意图;
图4是换道预判车距计算示意图;
图5是换道必要性、换道可行性及换道意愿的隶属度;
图6是长短记忆神经网络(LSTM)结构图;
图7是安全预测收益计算示意图;
图8是建立多组换道博弈的示意图。
具体实施方式
以下结合具体实施例对本发明作进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
如图1所示,一种基于贝叶斯博弈的自动驾驶车辆换道决策方法,根据实施步骤依次详细阐释:
步骤1,建立车辆驾驶风格的先验概率分布:通过智能网联路端传感器获取车辆行驶数据,例如视觉相机和激光雷达等。
定义车辆驾驶风格包括激进型A(aggressive)与非激进型NA(none aggressive)两种,通过记录并统计在不同时段及不同路端下的车辆行驶速度和加速度信息作为驾驶风格分类的特征值,聚类算法使用k-means,但不局限于此。通过k-means聚类算法可获得设定路段及时段下的激进型A驾驶员数量n(A)和非激进型NA(none aggressive)驾驶员数量n(NA)。
根据上述信息来求解车辆驾驶风格的先验概率分布P0(X|(road,time)),公式表述如式(1)所示:
其中road和time分别表示所处路段及时段,p(A)表示车辆驾驶风格为激进型A的概率,1-p(A)表示表示车辆驾驶风格为非激进型NA的概率;
步骤2,换道意愿计算模块输出换道意愿:换道必要性及换道可行性是换道意愿产生的必要条件,采集本车车辆及其周围车辆信息,对本车车辆的感兴趣区域进行划分,如图2所 示,分别包括前方感兴趣区域、左侧感兴趣区域及右侧感兴趣区域。
如图3所示定义并计算原始车道行驶预判车距为未来t时刻内所有预判车距[d1,d2,…,dt]的最小值dmin,如图4所示定义并计算换道预判车距为未来t时刻内所有预判车距[l1,l2,…,lt]的最小值lmin,通过引入预判车距的期望及方差构建累计分布函数计算换道必要性和换道安全性。公式表述如式(2)-(3)所示:

其中Pne和Psf分别表示换道必要性和换道可行性,uk和ul分别表示原始车道行驶预判车距的期望及换道预判车距的期望,σ表示方差。
建立基于模糊逻辑的换道意愿输出模型。首先设计基于模糊逻辑的换道意愿模型的输入和输出,如图5(a),5(b),5(c)所示分别构建换道必要性、换道可行性及换道意愿的隶属度函数,其中输入换道必要性及换道可行性的模糊集合为{小,较小,中,较大,大},输出换道意愿的模糊集合为{弱,较弱,中,较强,强},最后依据模糊规则表及质心法去模糊化运算求解得到换道意愿如表格1所示,推荐但不局限于表格中的模糊集合分类方式及模糊规则制定形式。
表1模糊规则表
若满足换道意愿>换道意愿阈值换道意愿阈值为人为设定,根据实际情况设定即可,则进行后续的贝叶斯博弈换道决策。
步骤3,利用贝叶斯滤波推断车辆驾驶风格后验概率:
当本车车辆产生换道意愿后,通过车载传感器获取目标车道后车RV的加速度的似然函数。由车载传感器的产品设计参数可以获得测量车辆行驶加速度的准确性为y,则测量到车辆行驶状态为非减速状态的似然函数为L(θ|A)=(y,1-y),测量到车辆行驶状态为减速状态的似然函数为L(θ|NA)=(1-y,y)。
通过车辆驾驶风格的先验概率分布及似然函数得到车辆驾驶风格后验概率分布Pt(Y|(road,time)),公式表述如式(4)所示:
Pt(Y|(road,time))=normalize(P0(X|(road,time)*L(θ))#(4)
如换道决策未完成或换道意愿未消失,对车辆状态进行更新的同时则需要循环求第t时刻的车辆驾驶风格的后验概率Pt(Y|(road,time)),同时认为t+1时刻的先验概率分布Pt+1(Y|(road,time))延续t时刻的后验概率分布。通过车辆驾驶风格后验概率可得第t时刻的驾驶员激进性因子βt,公式表述如式(5)所示:
βt=Vtype·Pt(Y|(road,time))#(5)
其中Vtype可用单位向量表示,例如(1,0)表示车辆驾驶风格为激进型,(0,1)表示车辆驾驶风格为非激进型。
步骤4,车辆行为预测:
车辆行为预测的目的在于预测本车车辆SV及目标车道后车RV在未来推演时域H的行驶轨迹、速度及加速度,其中行驶轨迹由未来时刻的离散路径点(xt′,yt′)表示,速度及加速度由符号v(t′)和a(t′)表示。本车车辆SV及目标车道后车RV的速度、加速度使用长短记忆神经网络(LSTM)进行预测,LSTM神经网络结构图如图6所示,该网络由遗忘门、输入门、输出门及细胞状态更新组成,其特点在于引入了门(gate)机制用于控制特征的流通和损失,改善了递归神经网络(RNN)中存在的长期依赖问题,LSTM的表现通常比时间递归神经网络及隐马尔科夫模型(HMM)更好。选择特征数据速度和加速度作为一个步长的数据,输出推演时域H内的未来时刻速度及加速度(v(t′),a(t′))。
本车车辆SV及目标车道后车RV在未来推演时域H的非换道行为的轨迹预测由车辆运动学模型进行推演,公式表述如式(6)所示:
其中表示车辆的实际航向,v表示预测速度。本车车辆换道行为的轨迹预测通过车辆运动学模型推演纵向轨迹结合五次多项式曲线推演横向轨迹进行预测,五次多项式曲线同样应用于本车车辆换道行为的车辆状态更新,将在后续进行详细介绍。
步骤5,建立博弈收益矩阵并求解得到换道执行概率:
该收益矩阵分别包括本车SV与激进型及非激进型的目标车道后车RV所构成的收益矩阵,其中本车SV与目标车道激进型后车RV所构成的收益矩阵如表2所示:
表2 SV与A型RV博弈收益矩阵

本车SV与目标车道非激进型后车RV所构成的收益矩阵如表3所示:
表3 SV与NA型RV博弈收益矩阵
其中本车车辆SV的收益U及目标车道后车RV的收益Q计算运用模型预测的思想,计算了未来时刻的收益,从而提高行为决策的预见性及安全性,收益计算共包括四部分:
1)安全预测收益
车辆安全是智能车辆行驶中最为重要的收益之一,如图7所示,粗实线为车辆碰撞判定区域,粗虚线为安全预留区域。安全预测公式表述如式(7)所示:
Term(sf)=-{ω11[Ac(t′)+vSV(t′)*vRV(t′)]*I(Ac)
12[As(t′)+vSV(t′)*vRV(t′)]*I(As)}#(7)
式中vSV(t′)和vRV(t′)是预测时刻t′下本车车辆SV及目标车道后车RV的车速,Ac(t′)为预测时刻t′车辆碰撞判定区域的重叠面积,As(t′)为预测时刻t′车辆安全预留区域的重叠面积,通过车载传感器获取本车车辆SV以及目标车道后车RV的定位点,结合人为设定的安全预留区域的参数ws,lsf,lsr可求解安全预留区域的重叠面积As(t′),通过设定碰撞判定区域的参数wc,lcf,lcr可求解碰撞判定区域的重叠面积Ac(t′),ω11和ω12为碰撞权重和安全预留权重,I(Ac)和I(As)为0-1函数,当相应的安全区域出现重叠取为1,不重叠时取为0,I(Ac)公式表述如式(8)所示:
I(As)公式表述如式(9)所示:
2)时间预测收益
车辆行驶的另一个重要收益是以较短的时间到达目的地.越快的速度将会得到更多的时间收益.因此将预测时刻对象车辆的速度v(t′)作为时间预测收益。公式表述如式(10)所示:
Term(time)=v(t′)#(10)
3)舒适性预测收益
乘客的舒适性同样是决策的收益之一,使用车辆行驶过程中预测时刻加速度的导数加加速度Jerk作为舒适性预测收益,公式表述如式(11)所示:
Term(cf)=-|Jerk(t′)|#(11)
4)合作预测收益
考虑到车辆行为决策对其他交通参与者的影响,使用博弈中旁车的预测时刻的加速度aj(t′)作为合作预测收益的量化指标,其中j表示目标车道后车的编号。
公式表述如式(12)所示:
Term(gt)=-|aj(t′)|#(12)
总的收益包括上述四项重要指标,通过组合和加权来构成对象车辆总的收益,本车车辆SV及目标车道后车RV的各项收益计算方式如式(13)-(14)所示:

其中ω=[ω1234]和σ=[σ1234]为加权系数,H表示预测推演的总时间,t+1表示当前时刻的下一时刻。因本车SV与激进型及非激进型的目标车道后车RV所构成的收益不同,因此采用激进性因子βt构建本车车辆收益U的加权系数ω=[ω1234],公式表述如式(15)所示:
其中k=[k1,k2,k3,k4]表示对各项预测收益的增益系数,其目的是调整各项预测性收益的数值使得其保持同一数量级;
1234]参数由人工标定得到。
如图8所示,本车车辆SV在产生换道意愿后,与目标车道多辆后车构成多个博弈,依据目标车道后车与本车车辆SV的相对纵向距离对其博弈过程进行编号,依次构成博弈1、博弈2,…,博弈N。变道决策的目的在于首先完成博弈1,具体指本车车辆SV完成变道行为,行驶至博弈1中的目标车道后车前方。若在求解博弈1的收益矩阵过程中换道行为无法完成,则对博弈2至博弈N重新编号为博弈1、博弈2,…,博弈N,进而继续对博弈1求解换道概率进行换道行为决策。
本车车辆SV决定换道概率Pt(lc)时会根据本车SV与激进型及非激进型的目标车道后车RV所构成的收益矩阵考虑八种情况,本车车辆SV期望收益Ep表达式如式(16)所示:
Ep=Pt(Y|(road,time))*[Pt(lc)*(U11+U12)+(1-Pt(lc))*(U21+U22)]
+(1-Pt(Y|(road,time)))*[Pt(lc)*(U33+U34)+(1-Pt(lc))*(U43+U44)]#(16)
经整理可得到本车车辆SV期望收益Ep是关于换道概率Pt(lc)的函数,可表述为Ep=F(Pt(lc)),求得期望收益Ep最大时的换道概率为
步骤6,车辆状态更新:
在执行决策的时候,是一个动态博弈的过程,车辆的状态会实时更新,然后根据新的状态信息进行新的决策,以完成整个决策过程。进行车辆状态更新的过程分为两种情况:
1)换道概率没有达到既定的执行阈值Pdes(lc),本车车辆不执行换道,仅更新车辆纵向轨迹,纵向轨迹使用全速度差(FVD)模型,该模型能够展示交通流中时走时停及突发的交通堵塞和激波等实际的交通现象,并且同时考虑速度差为正负时的影响,公式表述如式(17)所示:
aj(t)=ρ[V(Δ(xj))-vj(t)]+λΔvj(t)#(17)
其中j表示车辆编号,aj(t)表示t时刻的车辆加速度,vj(t)表示t时刻的车辆速度,Δvj(t)表示t时刻的速度差,ρ和λ是权重系数,V(Δ(xj))是优化速度函数,公式表述如式(18)所示:
其中vmax表示车辆的最大速度,hc是车辆间的安全距离,Δxj(t)是t时刻车辆间的实际距离。
2)换道执行概率达到概率阈值Pdes(lc)时,同时更新车辆换道轨迹和纵向轨迹,纵向轨迹同样使用全速度差(FVD)模型,横向轨迹采用推荐使用五次多项式,当不限于此,公式表述如式(19)所示:
y(t)=a0+a1t+a2t2+a3t3+a4t4+a5t5#(19)
其中A=[a0,a1,a2,a3,a4,a5]为多项式系数,t为换道过程中的采样时刻,a0,a1,a2,a3,a4,a5的取值由人为设定。
步骤7,循环执行动态博弈决策:
执行步骤3至步骤6,直到换道行为决策执行完成或者换道意愿消失。
以上所述仅是本发明的优选实施方式,应当指出的是,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。

Claims (10)

  1. 一种基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,包括以下步骤:
    步骤1,建立旁车的车辆驾驶风格的先验概率分布:通过智能网联路端传感器获取车辆行驶数据,记录并统计在不同时段、路段下的车辆驾驶风格先验概率分布,定义旁车的车辆驾驶风格包括激进型A(aggressive)与非激进型NA(none aggressive)两种;
    步骤2,换道意愿计算模块输出换道意愿:通过车载传感器采集本车车辆SV及其周围车辆信息,定义并计算原始车道行驶预判车距和换道预判车距,通过引入期望距离及方差构建累计分布函数计算换道必要性和换道安全性,并建立基于模糊逻辑的换道意愿输出模型,当换道意愿达到设定阈值后执行以下步骤3至步骤7;
    步骤3,利用贝叶斯滤波推断目标车道后车RV的驾驶风格后验概率:当本车车辆SV(specified vehicle)产生换道意愿后,通过车载传感器采集目标车道后车RV(rear vehicle)的加速度信息来获得目标车道后车RV驾驶风格的似然函数,通过所述的似然函数及步骤1得到的所述先验概率分布得到目标车道后车RV的车辆驾驶风格后验概率及目标车道后车RV的驾驶员激进性因子β(该因子的取值范围是[0,1]);
    步骤4,通过长短记忆神经网络LSTM(Long Short-Term Memory)及车辆运动学模型预测本车车辆SV及目标车道后车RV未来推演时域的行驶轨迹、速度及加速度;
    步骤5,建立博弈收益矩阵并求解得到换道执行概率:建立非合作博弈的收益矩阵,该收益矩阵分别包括本车车辆SV与激进型及非激进型的目标车道后车RV所构成的收益矩阵,收益函数的设计包括安全预测收益、时间预测收益、舒适性预测收益及合作预测收益,再通过求解收益矩阵得到换道执行概率;
    步骤6,对车辆状态进行更新:换道执行概率没有达到执行阈值时,本车车辆SV不执行换道,仅更新本车车辆SV的纵向轨迹;换道执行概率达到概率阈值时,同时更新本车车辆SV的换道轨迹和纵向轨迹;
    步骤7,循环执行动态博弈决策:循环执行步骤3至步骤6,直到换道策略执行完成或者换道意愿消失。
  2. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤1中,过聚类算法获得设定路段及时段下的激进型A驾驶员数量n(A)和非激进型NA驾驶员数量n(NA),求解所述的车辆驾驶风格先验概率分布:
    其中road和time分别表示所处路段及时段,p(A)表示车辆驾驶风格为激进型A的概率,1-p(A)表示表示车辆驾驶风格为非激进型NA的概率。
  3. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤2中,定义并计算原始车道行驶预判车距dmin及换道预判车距lmin,其中dmin为未来t时刻内所有原始车道行驶预判车距[d1,d2,…,dt]的最小值,未来t时刻内所有换道预判车距[l1,l2,…,lt]的最小值lmin,通过引入期望距离及方差构建累计分布函数计算换道必要性和换道安全性:

    其中Pne和Psf分别表示换道必要性和换道安全性,uk和ul分别表示原始车道行驶预判车距的期望及换道预判车距的期望,σ表示方差;
    在所述换道意愿输出模型中,构建换道必要性、换道安全性及换道意愿的隶属度函数,依据模糊规则表及质心法去模糊化运算求解得到换道意愿
    若满足换道意愿>换道意愿阈值则进行后续的贝叶斯博弈换道决策。
  4. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤3中,
    所述车载传感器获得测量目标车道后车RV行驶加速度的准确性y,测量到目标车道后车RV行驶状态为非减速状态的似然函数为L(θ|A)=(y,1-y),测量到目标车道后车RV行驶状态为减速状态的似然函数为L(θ|NA)=(1-y,y);
    通过车辆驾驶风格的先验概率分布及似然函数得到目标车道后车RV的车辆驾驶风格后验概率分布:
    Pt(Y|(road,time))=normalize(Po(X|(road,time)*L(θ))
    同时认为t+1时刻的先验概率分布延续t时刻的后验概率分布;
    通过所述的车辆驾驶风格后验概率可得第t时刻的目标车道后车RV的驾驶员激进性因子βt
    βt=Vtype·Pt(Y|(road,time))
    Vtype为单位向量。
  5. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤4中,本车车辆及目标车道后车的速度、加速度使用长短记忆神经网络进行预测,本车车辆及目标车道后车在未来推演时域的非换道行为的行驶轨迹预测由车辆运动学模型进行推演预测,本车车辆换道行为的轨迹预测通过车辆运动学模型推演纵向行驶轨迹结合五次多项式曲线推演横向行驶轨迹进行推演预测。
  6. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤5中,本车车辆SV与激进型的目标车道后车RV的收益矩阵为:
    其中:
    U11、U12、U21、U22分别表示本车车辆与激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下本车车辆的收益;
    Q11、Q12、Q21、Q22分别表示本车车辆与激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下目标车道后车的收益。
    本车车辆SV与非激进型的目标车道后车RV的收益矩阵为:
    其中:
    U33、U34、U43、U44分别表示本车车辆与非激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下本车车辆的收益;
    Q33、Q34、Q43、Q44分别表示本车车辆与非激进型目标车道后车在[换道,减速]、[换道,加速]、[不换道,减速]及[不换道,加速]四种策略组合下目标车道后车的收益。
  7. 如权利要求6所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,本车车辆SV的收益U及目标车道后车RV的收益Q计算了未来时刻的收益,共包括四部分:
    (1)安全预测收益:
    Term(sf)=-{ω11[Ac(t′)+vSV(t′)*vRV(t′)]*I(Ac)
    12[As(t′)+vSV(t′)*vRV(t′)]*I(As)}
    式中vSV(t′)和vRV(t′)是预测时刻t′下本车车辆SV及目标车道后车RV的车速,Ac(t′)为预测时刻t′车辆碰撞判定区域的重叠面积,As(t′)为预测时刻t′车辆安全预留区域的重叠面积,ω11和ω12为碰撞权重和安全预留权重,I(Ac)和I(As)为0-1函数,当相应的安全区域出现重叠取为1,不重叠时取为0;
    (2)时间预测收益:
    Term(time)=v(t′)
    v(t′)表示博弈中目标车道后车预测时刻的速度;
    (3)舒适性预测收益:
    使用车辆行驶过程中预测时刻加速度的导数加加速度Jerk作为舒适性预测收益:
    Term(rf)=-|Jerk(t′)|
    Jerk(t′)表示预测时刻加速度的导数加加速度;
    (4)合作预测收益:
    使用博弈中目标车道后车RV的预测时刻的加速度aj(t′)作为合作预测收益的量化指标:
    Term(gt)=-|aj(t′)|
    本车车辆SV及目标车道后车RV通过组合和加权来构成对象车辆总的收益:

    其中ω=[ω1234]和σ=[σ1234]为加权系数,
    采用激进性因子βt构建本车车辆收益U的加权系数:
    其中k=[k1,k2,k3,k4]表示对各项预测收益的增益系数。
  8. 如权利要求7所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,本车车辆SV决定换道概率时会根据本车车辆SV与激进型及非激进型的目标车道后车RV所构成的收益矩阵考虑八种情况,期望收益Ep为:
    Ep=Pt(Y|(road,time))*[Pt(lc)*(U11+U12)+(1-Pt(lc))*(U21+U22)]
    +(1-Pt(Y|(road,time)))*[Pt(lc)*(U33+U34)+(1-Pt(lc))*(U43+U44)]
    可求得期望收益Ep最大时的换道概率为
  9. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤6中,换道概率没有达到既定的执行阈值,本车车辆仅更新车辆纵向轨迹,纵向轨迹使用全速度差(FVD)模型:
    aj(t)=ρ[V(Δ(xj))-vj(t)]+λΔvj(t)
    其中j表示车辆编号,aj(t)表示t时刻的车辆加速度,vj(t)表示t时刻的车辆速度,Δvj(t)表示t时刻的速度差,ρ和λ是权重系数,V(Δ(xj))是优化速度函数:
    其中vmax表示车辆的最大速度,hc是车辆间的安全距离,Δxj(t)是t时刻车辆间的实际距离。
  10. 如权利要求1所述的基于贝叶斯博弈的自动驾驶车辆换道行为车辆协同决策算法,其特征在于,所述步骤6中,换道执行概率达到概率阈值时,同时更新车辆换道轨迹和纵向轨迹,纵向轨迹使用全速度差(FVD)模型,横向轨迹采用推荐使用五次多项式:
    y(t)=a0+a1t+a2t2+a3t3+a4t4+a5t5
    其中A=[a0,a1,a2,a3,a4,a5]为多项式系数。
PCT/CN2023/086547 2022-05-30 2023-04-06 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法 WO2023231569A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210603381.7 2022-05-30
CN202210603381.7A CN115056798B (zh) 2022-05-30 2022-05-30 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法

Publications (1)

Publication Number Publication Date
WO2023231569A1 true WO2023231569A1 (zh) 2023-12-07

Family

ID=83198484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/086547 WO2023231569A1 (zh) 2022-05-30 2023-04-06 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法

Country Status (2)

Country Link
CN (1) CN115056798B (zh)
WO (1) WO2023231569A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115056798B (zh) * 2022-05-30 2024-04-09 天津大学 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法
CN115731708B (zh) * 2022-11-15 2023-10-17 东南大学 一种基于贝叶斯理论的实时车辆轨迹换道点监测方法
CN117284297B (zh) * 2023-11-27 2024-02-27 福思(杭州)智能科技有限公司 车辆控制方法、装置及域控制器

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206654A1 (de) * 2013-06-27 2014-12-31 Bayerische Motoren Werke Aktiengesellschaft Vorhersage von fahrpfaden eines fahrzeugs
CN104391504A (zh) * 2014-11-25 2015-03-04 浙江吉利汽车研究院有限公司 基于车联网的自动驾驶控制策略的生成方法与生成装置
CN108595823A (zh) * 2018-04-20 2018-09-28 大连理工大学 一种联合驾驶风格和博弈理论的自主车换道策略的计算方法
CN113095558A (zh) * 2021-04-01 2021-07-09 天津大学 一种智能网联汽车的迭代优化多尺度融合车速预测算法
CN114493191A (zh) * 2022-01-07 2022-05-13 东南大学 一种基于网约车数据的驾驶行为建模分析方法
CN115056798A (zh) * 2022-05-30 2022-09-16 天津大学 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8788134B1 (en) * 2013-01-04 2014-07-22 GM Global Technology Operations LLC Autonomous driving merge management system
KR102138979B1 (ko) * 2018-11-29 2020-07-29 한국과학기술원 차선 기반의 확률론적 주변 차량 거동 예측 및 이를 이용한 종방향 제어 방법
CN110471408B (zh) * 2019-07-03 2022-07-29 天津大学 基于决策过程的无人驾驶车辆路径规划方法
CN112298200B (zh) * 2019-07-26 2022-12-23 魔门塔(苏州)科技有限公司 一种车辆的换道方法和装置
CN111081065B (zh) * 2019-12-13 2021-03-30 北京理工大学 路段混行条件下的智能车辆协同换道决策模型
CN111775961B (zh) * 2020-06-29 2022-01-04 阿波罗智能技术(北京)有限公司 自动驾驶车辆规划方法、装置、电子设备及存储介质
US20220073098A1 (en) * 2020-09-09 2022-03-10 GM Global Technology Operations LLC Method and apparatus for predicting lateral acceleration prior to an automated lane change

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014206654A1 (de) * 2013-06-27 2014-12-31 Bayerische Motoren Werke Aktiengesellschaft Vorhersage von fahrpfaden eines fahrzeugs
CN104391504A (zh) * 2014-11-25 2015-03-04 浙江吉利汽车研究院有限公司 基于车联网的自动驾驶控制策略的生成方法与生成装置
CN108595823A (zh) * 2018-04-20 2018-09-28 大连理工大学 一种联合驾驶风格和博弈理论的自主车换道策略的计算方法
CN113095558A (zh) * 2021-04-01 2021-07-09 天津大学 一种智能网联汽车的迭代优化多尺度融合车速预测算法
CN114493191A (zh) * 2022-01-07 2022-05-13 东南大学 一种基于网约车数据的驾驶行为建模分析方法
CN115056798A (zh) * 2022-05-30 2022-09-16 天津大学 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法

Also Published As

Publication number Publication date
CN115056798B (zh) 2024-04-09
CN115056798A (zh) 2022-09-16

Similar Documents

Publication Publication Date Title
WO2023231569A1 (zh) 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法
CN110304075B (zh) 基于混合动态贝叶斯网络和高斯过程的车辆轨迹预测方法
US11783594B2 (en) Method of segmenting pedestrians in roadside image by using convolutional network fusing features at different scales
CN112347567B (zh) 一种车辆意图和轨迹预测的方法
CN111081065B (zh) 路段混行条件下的智能车辆协同换道决策模型
CN107169567B (zh) 一种用于车辆自动驾驶的决策网络模型的生成方法及装置
CN110796856B (zh) 车辆变道意图预测方法及变道意图预测网络的训练方法
WO2022052406A1 (zh) 一种自动驾驶训练方法、装置、设备及介质
Elallid et al. A comprehensive survey on the application of deep and reinforcement learning approaches in autonomous driving
US11465650B2 (en) Model-free reinforcement learning
CN113044064B (zh) 基于元强化学习的车辆自适应的自动驾驶决策方法及系统
Hecker et al. Learning accurate, comfortable and human-like driving
Dong et al. Interactive ramp merging planning in autonomous driving: Multi-merging leading PGM (MML-PGM)
Wang et al. Vehicle trajectory prediction by knowledge-driven LSTM network in urban environments
CN112249008A (zh) 针对复杂动态环境的无人驾驶汽车预警方法
Sun et al. Human-like highway trajectory modeling based on inverse reinforcement learning
CN117406756B (zh) 一种运动轨迹参数的确定方法、装置、设备和存储介质
Richardos et al. Vehicle maneuver-based long-term trajectory prediction at intersection crossings
US11794780B2 (en) Reward function for vehicles
CN113033902B (zh) 一种基于改进深度学习的自动驾驶换道轨迹规划方法
WO2021258847A1 (zh) 一种驾驶决策方法、装置及芯片
Li et al. Crash injury severity prediction considering data imbalance: A Wasserstein generative adversarial network with gradient penalty approach
Lv et al. A lane-changing decision-making model of bus entering considering bus priority based on GRU neural network
Li et al. Decision making for autonomous vehicles
Si et al. A deep coordination graph convolution reinforcement learning for multi-intelligent vehicle driving policy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23814765

Country of ref document: EP

Kind code of ref document: A1