CN114312830A - Intelligent vehicle coupling decision model and method considering dangerous driving conditions - Google Patents

Intelligent vehicle coupling decision model and method considering dangerous driving conditions Download PDF

Info

Publication number
CN114312830A
CN114312830A CN202111526027.0A CN202111526027A CN114312830A CN 114312830 A CN114312830 A CN 114312830A CN 202111526027 A CN202111526027 A CN 202111526027A CN 114312830 A CN114312830 A CN 114312830A
Authority
CN
China
Prior art keywords
intelligent vehicle
driving
decision
model
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111526027.0A
Other languages
Chinese (zh)
Inventor
蔡英凤
张雪翔
滕成龙
王海
刘擎超
孙晓强
陈龙
李祎承
熊晓夏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN202111526027.0A priority Critical patent/CN114312830A/en
Publication of CN114312830A publication Critical patent/CN114312830A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an intelligent vehicle coupling decision model and method considering dangerous driving conditions, which adopt a decision method of self-learning and driving rule coupling, overcome the problems of limitation, lack of flexibility, unreliability and the like of a single decision method, and effectively process the intelligent vehicle driving decision problem of various complex traffic scenes. The invention fully considers the collision risk and lane change risk in the driving process of the intelligent vehicle, and divides the corresponding decision algorithm on the basis, thereby further improving the real-time performance of the intelligent vehicle decision and the decision reliability of the intelligent vehicle under the dangerous driving working condition. The transfer learning algorithm based on the feature space mapping provided by the invention realizes the transfer of the optimal value action of the intelligent vehicle from the knowledge of a simulation scene to a real scene, solves the problem of modeling error of the real traffic scene, simultaneously verifies the effectiveness of the intelligent vehicle coupling decision model provided by the invention in the real driving scene, and greatly improves the transfer learning capability of the intelligent vehicle.

Description

Intelligent vehicle coupling decision model and method considering dangerous driving conditions
Technical Field
The invention relates to the technical field of unmanned vehicle driving decision, in particular to an intelligent vehicle coupling decision model and method considering dangerous driving conditions.
Background
In the current stage of research, the development of future intelligent driving technology is generally considered to play a crucial role in solving the problems of road safety, traffic congestion, reduction of the workload of drivers and the like. One of the core challenges of the current intelligent driving technology is to make a safe and efficient driving decision based on external uncertainty multi-sensor fusion sensing information and existing driving priori knowledge when the vehicle can drive in a highly complex traffic environment. Therefore, the decision algorithm needs to further consider the influence factors such as the personalized requirements (including safety, comfort and high efficiency) of the driver, the road environment structure, traffic regulation constraint information, vehicle dynamics performance and driving habits in different regions, and the algorithm also has wide applicability and robustness to deal with the randomness of the high-dimensional traffic environment, particularly the problem of decision failure and the like caused by the asynchronous information of the perception layer and the decision layer.
The existing decision algorithms are mainly classified into the following three categories: the driving decision algorithm based on the reinforcement learning, the driving decision algorithm based on the driving rules, and the driving decision algorithm based on the coupling of the driving rules and the self-learning algorithm are gradually becoming a research hotspot due to partial interpretability of the decision process and applicability to a high-dimensional random dynamic environment. However, in the existing driving decision algorithm, in consideration of sampling efficiency and decision safety of experimental samples in a real traffic scene, aspects such as driving data analysis, decision model training, decision model verification and the like are mostly performed in a constructed simulation environment, and whether the optimal value decision of an intelligent vehicle in the simulation environment is suitable for the real traffic driving environment cannot be verified, so that decision knowledge transfer from a simulator to the real environment is realized. And the construction of the simulation driving environment mostly considers single driving environments such as an expressway, the reliability of decision making of an intelligent vehicle driving decision algorithm under dangerous driving conditions is less considered, and particularly, the decision making research of the intelligent vehicle on the consideration of collision risks and lane change risks in the driving process is less.
Disclosure of Invention
In order to solve the technical problem, the invention constructs an intelligent vehicle coupling decision model considering dangerous driving conditions. In the construction of the intelligent vehicle simulation driving scene model, the invention considers the position, speed and orientation angle information, lane environment structure information, traffic rule information and the like of the intelligent vehicle and surrounding traffic participants, and models the constructed traffic scene into a Markov Decision Process (MDP). On the input information acquisition of the intelligent vehicle driving condition evaluation model, information such as a vehicle, surrounding traffic participants, lane environment, driving rule constraints and the like is acquired through a GPS positioning device, a speed and acceleration sensor, a laser radar, a camera and the like which are installed on the intelligent vehicle, and the driving conditions are divided based on the collision risk of the intelligent vehicle and the surrounding traffic participants and the lane change risk of the intelligent vehicle, so that the driving conditions are divided into general driving conditions and dangerous driving conditions. In the selection of the intelligent vehicle behavior decision model algorithm, considering the problems of the limitation of a driving rule base, the lack of flexibility for random scene processing and the like, the method mainly adopts a decision mode based on the coupling of a rule and a deep reinforcement learning algorithm, and on one hand, a decision method based on the driving rule is constructed from the aspects of a driving safety rule, a danger obstacle avoidance rule, a pedestrian rule with the highest priority and the like, so that the driving decision under the common driving condition is effectively processed, and the interpretability of the decision process is improved; on the other hand, for dangerous driving conditions, a Deep Q Network (DQN) model with a constrained action space is mainly adopted to enable the intelligent vehicle to autonomously learn the optimal driving action strategy in an interactive scene. In the knowledge transfer process of the intelligent vehicle optimal value action in the simulation-real environment, the characteristic probability distribution of the optimal driving decision state mapping space of the intelligent vehicle is the same when the intelligent vehicle makes decisions in the same action, the same reward function and the similar driving scenes regardless of the simulation driving scene or the real driving scene, so the optimal value action state of the intelligent vehicle in the real traffic scene can be solved by implicit learning of the corresponding relation of the corresponding characteristic spaces in different fields.
The technical scheme adopted by the intelligent vehicle coupling decision method considering the dangerous driving working condition sequentially comprises the following steps of:
step 1) building an intelligent vehicle simulation driving scene, and modeling the scene into a Markov decision process;
step 2) collecting the information of the self vehicle and the driving scene through a GPS, a laser radar, a speed sensor, a camera and other sensors which are arranged on the intelligent vehicle, and taking the information as the input of a driving condition evaluation model;
step 3) constructing a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants based on the relevant information acquired by the multiple sensors in the step 2), and dividing the driving conditions of the intelligent vehicle according to the collision risk model delta and the lane change risk model eta, as shown in the following formula (1):
Figure BDA0003410521370000021
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
Step 4) setting hyper-parameters of the DQN model in the training of the intelligent vehicle decision model, wherein the hyper-parameters comprise the learning rate beta of the model, the training round N and the discount rate gamma of the model, and the initial speed ranges of vehicles and pedestrians;
step 5) randomly initializing weight parameter omega, TD (time difference algorithm) target of Q network
Figure BDA0003410521370000022
Weight parameter ω of-ω, and a storage space V for model training samples;
step 6) in the process of training N times of the model, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through a plurality of sensors and constructs a reward function r (t) corresponding to the current state;
step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action space
Figure BDA0003410521370000031
And decision actions a (t) of the intelligent vehicle;
step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;
step 9) randomly sampling 64 groups of sample data from the storage space V every time in an iteration mode to train a DQN model to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of TD (time difference algorithm) in the training iteration process of the DQN model
Figure BDA0003410521370000032
Weight parameter ω of-=ω;
Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);
step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time t
Figure BDA0003410521370000033
And finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping
Figure BDA0003410521370000034
Further, the modeling of the simulated traffic scene in step 1) as the markov decision process is to construct a state space s (t) of the scene, a decision action a (t) of the smart car, a reward function r (t), and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at time t + 1. Wherein, the state space s (t) of the traffic scene is the state information s of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t) and the like; the driving decision of the intelligent vehicle is made by controlling the longitudinal acceleration a of the intelligent vehicleL(t) and front wheel Angle aT(t) and correspondingly forming a decision action set a (t) of the intelligent vehicle; in addition, the reward function r (t) is constructed by considering the constraint r of the navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t), etc.; finally, the random state transition function is the state transition probability distribution p(s) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participantsOA(t +1) | s (t)).
Further, the input information of the driving condition evaluation model in the step 2) comprises the speed v of the intelligent vehicle at the time tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane change
Figure BDA0003410521370000035
And lateral velocity
Figure BDA0003410521370000036
Transverse distance d between lane and boundary line during intelligent lane changeALLane width wkAnd so on.
Further, the collision risk model δ in step 3) mainly compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH) and the collision time (TTC) and other indexesh(t) and a safety distance Ds(t) wherein the safety distance is mainly the braking distance v of the intelligent vehicleAV(t) ρ, Final following distance
Figure BDA0003410521370000041
And longitudinal displacement of front traffic participants
Figure BDA0003410521370000042
And (6) calculating.
The lane change risk model eta is mainly used for comparing the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebThe adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4
Further, the initial learning rate β of the DQN model in step 4) is set to 0.002, the model structure is formed by a fully-connected network with five layers, and each hidden layer of the network contains 100 neuron nodes, and the initial training round N and the discount rate γ of the model are set to 10000 and 0.9, respectively. And the ranges of the initial speeds of the vehicles and pedestrians in the simulation scene are [15,65] km/h and [0,5] km/h, respectively.
Further, the decision algorithm based on the driving rule in the step 7) mainly comprises a safety rule of the secondary driving and the drivingThe obstacle avoidance rule and the rule angle of courtesy pedestrians and the like are combined with the IF-THEN information triggering event mode, and the special position information P of the intelligent vehicle is obtained*(t) (e.g., vicinity of intersection), navigation target point position information
Figure BDA0003410521370000043
And current state information of the intelligent vehicle
Figure BDA0003410521370000044
Generating a desired motion space
Figure BDA0003410521370000045
And decision actions a (t) of the intelligent vehicle, so that the dimensional requirement of the intelligent vehicle for the perception task is reduced, and the real-time performance and reliability of the decision are improved.
Further, the model training in step 9) is to train the DQN model mainly through a time difference algorithm (TD), and the general process is as follows: firstly, an optimal value action function Q is obtained by solving based on sample data (s (t), a (t), r (t), s (t +1)) and an optimal Bellman equation*(s (t), a (t)), and replacing it with a neural network Q (s (t), a (t) | ω); then, the objective function of the TD algorithm is set
Figure BDA0003410521370000046
And calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error.
Further, the transfer learning algorithm based on feature space mapping in step 11) mainly considers that the feature probability distribution of the optimal driving decision state mapping space of the smart vehicle should be the same when the smart vehicle makes a decision in the same action, the same reward function and the similar driving scenes no matter in the simulated driving scene or the real driving scene, that is, the feature probability distribution is the same in the optimal driving decision state mapping space
Figure BDA0003410521370000047
Wherein f and g represent neural network functions of the feature space mapping.
The invention has the beneficial effects that:
1. according to the intelligent vehicle coupling decision model considering the dangerous driving working condition, the decision method of self-learning and driving rule coupling is adopted, the problems of limitation, lack of flexibility, unreliability and the like of a single decision method are solved, and the intelligent vehicle driving decision problem of various complex traffic scenes can be effectively processed.
2. The intelligent vehicle coupling decision model provided by the invention fully considers the collision risk and the lane change risk in the driving process of the intelligent vehicle, and divides the corresponding decision algorithm on the basis, thereby further improving the real-time performance of the intelligent vehicle decision and the decision reliability of the intelligent vehicle under the dangerous driving working condition.
3. The invention provides a transfer learning algorithm based on feature space mapping, which realizes the transfer of the optimal value action of the intelligent vehicle from the knowledge of a simulation scene to a real scene, solves the problem of modeling error of a real traffic scene, simultaneously verifies the effectiveness of the intelligent vehicle coupling decision model in a real driving scene, and greatly improves the transfer learning capability of the intelligent vehicle.
Drawings
FIG. 1 is a study route of the present invention
FIG. 2 is a view of the simulated driving scene of the intelligent vehicle
FIG. 3 is a schematic diagram of the collision risk of the intelligent vehicle of the present invention
FIG. 4 is a schematic diagram of the lane change risk of the intelligent vehicle according to the present invention
FIG. 5 is a schematic diagram of the adaptive braking safety distance of the present invention
FIG. 6 is a flowchart of a transfer learning algorithm based on feature space mapping according to the present invention
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in FIG. 1, the invention provides an intelligent vehicle coupling decision model and method considering dangerous driving conditions. The technical scheme of the invention comprises the following steps in sequence,
step 1): firstly, a model of a simulated driving scene of the intelligent vehicle is constructed, as shown in fig. 2, and the simulated driving scene is modeled as a markov decision process, which is composed of a state space s (t) of a traffic scene, a decision action a (t) of the intelligent vehicle, a reward function r (t), a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at a time t +1, and the like.
1) State space s (t) of traffic scene
For the state space s (t) of the traffic scene, the state space s (t) is mainly composed of the state information s of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t), and the like. Wherein the state information s of the intelligent vehicleAV(t) is the position p of the intelligent vehicleAV(t), velocity vAV(t) and orientation information θAV(t) is represented by the following formula (2):
sAV(t)={pAV(t),vAV(t),θAV(t)} (2)
in the formula, pAV(t) is expressed as the position coordinate (x) of the intelligent vehicle at the time tAV,yAV)。
Status information s of surrounding traffic participantsOA(t) then includes its position
Figure BDA0003410521370000061
Speed of rotation
Figure BDA0003410521370000062
Orientation of
Figure BDA0003410521370000063
And category information
Figure BDA0003410521370000064
As shown in the following formula (3):
Figure BDA0003410521370000065
in the formula (I), the compound is shown in the specification,
Figure BDA0003410521370000066
then the position coordinates (x) of the surrounding traffic participants at time t are representedOV,yOV) (ii) a i represents the ith traffic participant in the scene; j represents the category information of the surrounding traffic participants, wherein j 1 represents a vehicle and j 0 represents a pedestrian.
Lane structure and traffic regulation information sTR(t) is represented by the following formula (4):
Figure BDA0003410521370000067
in the formula, k represents the current lane number and the kth lane; ckA position vector represented as a lane centerline point; wkExpressed as the width of the lane in which it is located;
Figure BDA0003410521370000068
then representing the angle of the tangent direction of the central line point of the lane; vmin,kThe minimum speed limit expressed as the lane in which it is located; vmax,kThe speed is expressed as the maximum speed limit of the lane;
Figure BDA0003410521370000069
denoted as traffic signal, which determines whether the vehicle needs to stop at the end by a (0,1) signal;
Figure BDA00034105213700000610
indicating the position of a navigation target point of the intelligent vehicle; tau issThe driving boundary of the traffic scene is represented, the boundary is formed by point rows which are connected in sequence, and the points are connected by straight lines.
In summary, the state space s (t) of the traffic scene can be represented as:
s(t)={sAV(t),sOA(t),sTR(t)} (5)
2) decision action a (t) of intelligent vehicle
For the future driving decision action set of the intelligent vehicle, the future driving decision action set mainly comprises the longitudinal acceleration a of the intelligent vehicleL(t) and front wheel Angle aT(t) is asRepresented by the following formula (6):
a(t)={aL(t),aT(t)} (6)
in the formula, the longitudinal acceleration a is considered for the comfort of drivingL(t) has a value range of [ -3,2 ]]m/s2(ii) a Front wheel corner aT(t) is in the range of [ -40 DEG, 40 DEG]。
3) Reward function r (t)
In the reinforcement learning process, a reward function needs to be designed to reward or punish the operation of the intelligent vehicle in the driving process, and the reward function design mainly considers the constraint r of a navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t), and the like.
A) Constraint r of navigation target points1(t)
The motion decision of the intelligent vehicle in the driving process is subjected to a navigation target point to a certain extent
Figure BDA0003410521370000071
The vehicle needs to plan a reasonable path to reach the point in the travelable area, and the reward function r of the vehicle1(t) can be expressed as:
Figure BDA0003410521370000072
B) running safety index r2(t)
The avoidance of collision is the premise of a driving decision of the intelligent vehicle, and if the intelligent vehicle has a collision accident in the model training process, the model training of the round is finished. Wherein the running safety index r2(t) can be expressed as:
r2(t)=-vAV(t)2·φ{Collsion} (8)
in the formula, when the intelligent vehicle has a collision accident, the value of phi { Collision } is 1, and the value of phi is 0; as can be seen from the formula (8), the faster the intelligent vehicle speed, the more serious the accident.
C) FeasibleDriving area constraint r3(t)
Similarly, the driving range of the intelligent vehicle should be within the state set of the drivable region, and the intelligent vehicle is punished when exceeding the state set range. Particularly, when the current person is considered, the intelligent vehicle needs to make an avoidance behavior, so that the constraint of a lane is not needed to be considered, and the constraint of a driving area is only needed to be considered. So that the driving area of the intelligent vehicle is restricted r3The expression of (t) is as follows:
Figure BDA0003410521370000073
D) lane constraint r4(t)
According to the driving rule, the driving direction of the intelligent vehicle is mostly consistent with the direction of the lane, otherwise, the intelligent vehicle is punished, and the lane constraint r of the intelligent vehicle4The expression of (t) is as follows:
r4(t)=cosα(t)-sinα(t) (10)
in the formula, α represents an included angle between the driving direction of the intelligent vehicle and the lane direction, as shown in fig. 2.
In summary, the final reward function of the smart car is determined by r1(t)、r2(t)、r3(t)、r4(t) is given by the weighted sum of the following equation (11):
Figure BDA0003410521370000074
in the formula, ωLRepresenting a weight parameter.
4) Random state transfer function p (s (t +1) | s (t), a (t))
Considering the interaction among the traffic participants, given the current state s (t) and the action a (t) of the selected intelligent vehicle, the random state transfer function p (s (t +1) | s (t), a (t)) of the scene at the moment t +1 is mainly composed of the state transfer probability distribution p(s) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the status of the surrounding participantsTransition probability distribution p(s)OAThe product of (t +1) | s (t)), which is expressed by the following formula (12):
p(s(t+1)|s(t),a(t))=p(sAV(t+1)|sAV(t),a(t))×p(sOA(t+1)|s(t)) (12)
step 2): based on the driving simulation scene constructed above, the driving simulation scene information of the self-vehicle and the surrounding driving scene information of the self-vehicle is acquired through a GPS, a laser radar, a speed sensor, a camera and other multi-sensors which are arranged on the intelligent vehicle, and mainly comprises the speed v of the intelligent vehicle at the moment tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane change
Figure BDA0003410521370000081
And lateral velocity
Figure BDA0003410521370000082
Transverse distance d between lane and boundary line during intelligent lane changeALLane width wkAnd the like, and the information is used as the input of the driving condition evaluation model.
Step 3): constructing a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants based on the relevant information acquired by the multiple sensors in the step 2).
1) Collision risk model delta
As shown in fig. 3, the collision risk model δ mainly compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH) and the Time To Collision (TTC) indexesh(t) and a safety distance Ds(t) the magnitude of the ratio, wherein the safety distance Ds(t) braking distance v is mainly determined by intelligent vehicleAV(t) ρ, Final following distance
Figure BDA0003410521370000083
And longitudinal displacement of front traffic participants
Figure BDA0003410521370000084
Calculated as follows, formula (12):
Figure BDA0003410521370000085
in the formula, vAV(t) and vFV(t) respectively representing the speed of the intelligent vehicle and the front traffic participant at the moment t; v'AV(t) and v'FV(t) respectively representing the deceleration of the intelligent vehicle and the front traffic participant at the moment t, wherein the deceleration of the intelligent vehicle and the front traffic participant is the same as the deceleration of the intelligent vehicle and the front traffic participant in value for the passenger vehicle; rho represents the reaction time of the intelligent vehicle, including the reaction time rho of the system1And brake response time ρ2
The collision risk model δ of the smart car with the surrounding traffic participants can be expressed as:
Figure BDA0003410521370000086
in the formula, when delta is larger than or equal to 1, the intelligent vehicle has collision risk, otherwise, the intelligent vehicle does not have collision risk.
Finally, by using the Time Headway (TH), the Time To Collision (TTC) and other indexes, which are defined as the following formula (14), and combining the above formulas (12) and (13), the final intelligent vehicle collision risk model δ is shown as the following formula (15).
Figure BDA0003410521370000091
Figure BDA0003410521370000092
2) Lane change risk model η
As shown in fig. 4 and 5, the lane change risk model η is mainly obtained by comparing the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebWherein the adaptive braking distance of the rear vehicle is mainly accumulatedDriving distance D of rear vehicle driver in reaction stage1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4And (6) obtaining.
A) Distance D between two vehicles after lane change of intelligent vehicleLFComputing
Before the intelligent vehicle changes lane, the longitudinal distance D between the intelligent vehicle and the rear vehicleLBCan be expressed as:
DLB=yAV-yOV (16)
according to the transverse speed of the intelligent vehicle
Figure BDA0003410521370000093
And lateral acceleration
Figure BDA0003410521370000094
Solving the time t from the intelligent vehicle to the center line of the target laneLC
Figure BDA0003410521370000095
In the formula, wkIndicates the lane width, dALThen the lateral distance of the smart vehicle from the lane boundary is indicated.
Then at tLCThe longitudinal displacement of the rear vehicle over the time period may be expressed as:
Figure BDA0003410521370000096
similarly, the intelligent vehicle is at tLCThe longitudinal displacement over the time period can then be expressed as:
Figure BDA0003410521370000097
finally, after the lane change of the intelligent vehicle is completed, the distance D between the intelligent vehicle and the rear vehicleLFIt can be expressed as:
Figure BDA0003410521370000098
B) self-adaptive braking distance D of rear vehiclebComputing
Adaptive braking distance D for rear vehiclebThe calculation of (2) mainly takes into account the rear vehicle speed, the rear vehicle braking performance, the driver and the response time of the system, and is described in detail as follows:
step 1: suppose driver reaction time t1(1s), the driving distance D of the driver of the rear vehicle in the reaction stage1Then it is:
D1=vOV(t)×t1 (21)
step 2: in the response phase of rear vehicle braking, the response time is assumed to be t2(0.2s), the driving distance D in the brake response stage of the rear vehicle2Then it is:
D2=vOV(t)×t2 (22)
step 3: at the rear vehicle braking force increasing stage t3In this case, the change in the deceleration of the rear vehicle is approximately a linear change, assuming that the rear vehicle is at a comfortable deceleration (a)soft) The speed is reduced, and the driving distance D in the process of increasing the braking force of the vehicle at the later stage is obtained3Can be expressed as:
Figure BDA0003410521370000101
step 4: in the continuous braking stage of the rear vehicle, the rear vehicle uses asoftThe deceleration of the magnitude is reduced to reduce the speed of the following vehicle to zero, and the driving distance D of the following vehicle is obtained in the stage4Can be expressed as:
Figure BDA0003410521370000102
step 5: finally, the driving distance D of the driver in the reaction stage of the rear vehicle is accumulated1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4Solving the self-adaptive braking distance D of the rear vehiclebAs shown in the following formula:
Figure BDA0003410521370000103
in summary, the lane change risk model η of the smart car can be expressed as:
η=DLF-Db (26)
in the formula, when eta is less than or equal to 0, the lane change risk exists in the intelligent vehicle, otherwise, the lane change risk does not exist.
Meanwhile, based on the constructed collision risk model delta and lane change risk model eta of the intelligent vehicle and surrounding traffic participants, the driving condition of the intelligent vehicle is evaluated according to the following evaluation criteria:
Figure BDA0003410521370000104
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
And 4) setting hyper-parameters of the DQN model including the learning rate beta of the model, the training round N and the discount rate gamma of the model and the initial speed ranges of the vehicles and pedestrians in the training of the intelligent vehicle decision model. The initial learning rate beta of the DQN model is set to be 0.002, the model structure is formed by a five-layer fully-connected network, each hidden layer of the network contains 100 neuron nodes, and the initial training turn N and the discount rate gamma of the model are set to be 10000 and 0.9 respectively. And the ranges of the initial speeds of the vehicles and pedestrians in the simulation scene are [15,65] km/h and [0,5] km/h, respectively.
Step 5) then randomly initializing weight parameters omega, TD (time difference algorithm) target of Q network
Figure BDA0003410521370000111
Weight parameter ω of-ω, and a storage space V for model training samples.
And 6) in the process of N times of model training, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through the multiple sensors and constructs a reward function r (t) corresponding to the current state.
Step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action space
Figure BDA0003410521370000112
And decision action a (t) of the smart car.
The decision algorithm based on the driving rules is mainly realized by setting a larger driving safety distance for the intelligent vehicle in a simulation scene, executing operations such as braking or steering when meeting static obstacles, avoiding pedestrians, following normal driving rules when running straight or turning at a crossroad and the like from the aspects of driving safety rules, driving obstacle avoidance rules, giving away pedestrians and other rules. In order to reduce the dimensional requirement on complex environment perception, the decision algorithm based on the driving rule mainly combines the IF-THEN information triggering event mode and judges the special position information P of the intelligent vehicle*(t) (e.g., vicinity of intersection), navigation target point position information
Figure BDA0003410521370000113
And current state information of the intelligent vehicle
Figure BDA0003410521370000114
Generating a desired motion space
Figure BDA0003410521370000115
And decision actions a (t) of the intelligent vehicle, wherein an action space is expected
Figure BDA0003410521370000116
Is represented as follows:
Figure BDA0003410521370000117
in the formula (I), the compound is shown in the specification,
Figure BDA0003410521370000118
representing a desired motion space
Figure BDA0003410521370000119
The intelligent vehicle longitudinal action set;
Figure BDA00034105213700001110
representing a desired motion space
Figure BDA00034105213700001111
The intelligent vehicle transverse motion set.
Step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;
step 9) randomly sampling 64 groups of sample data from the storage space V every time in an iteration mode to train a DQN model to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of TD (time difference algorithm) in the training iteration process of the DQN model
Figure BDA0003410521370000121
Weight parameter ω of-=ω;
For the training of the DQN model, the DQN model is mainly trained by a time difference algorithm (TD), and the rough procedure is as follows:
A) firstly, based on the training sample data (s (t), a (t), r (t), s (t +1)) in the storage space V in the step 8) and the optimal Bellman equationSolving to obtain the optimal value action function Q*(s (t), a (t)) as shown in the following formula (29):
Figure BDA0003410521370000122
in the formula (I), the compound is shown in the specification,
Figure BDA0003410521370000123
representing the expectation of accumulated rewards of the intelligent vehicle at the moment t + 1; and a represents the motion space set of the smart car.
B) Secondly, considering that in practical problems it is not feasible to solve the optimal strategy by iteration, especially in case of large state space, the computation of the method is large. So the optimum merit function Q is used here*(s (t), a (t)) is replaced with a neural network Q (s (t), a (t) | ω) in the form:
Figure BDA0003410521370000124
in the formula, Q (s (t), a (t) | ω) represents the prediction of the maximum accumulated return value of all decision actions of the intelligent vehicle by the neural network at the time t, and no factual component is considered; while
Figure BDA0003410521370000125
(also denoted as objective function of TD algorithm)
Figure BDA0003410521370000126
) It represents the prediction of the maximum cumulative return of all decision-making actions of the smart vehicle by the neural network at time t +1, which is based in part on the real observed reward r (t).
C) Then, consider that
Figure BDA0003410521370000127
And Q (s (t), a (t) | ω) are all values for optimal action Q*(s (t), a (t)), but
Figure BDA0003410521370000128
Partly based on the fact that Q (s (t), a (t) omega) should be as close as possible to
Figure BDA0003410521370000129
Therefore, by using the objective function of the TD algorithm
Figure BDA00034105213700001210
And calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error:
Figure BDA00034105213700001211
D) finally, updating the weight parameter ω in the training iteration process of the DQN model by using a TD algorithm, as follows:
Figure BDA0003410521370000131
in the formula, β represents the learning rate of the model;
Figure BDA0003410521370000132
an error of the TD algorithm is indicated;
Figure BDA0003410521370000133
it means that the neural network Q (s (t), a (t) | ω) derives the weight parameter ω.
Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);
step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time t
Figure BDA0003410521370000134
And finally obtaining the intelligent vehicle by combining a transfer learning algorithm based on feature space mappingOptimal value action state in real driving scene
Figure BDA0003410521370000135
As shown in fig. 6, the feature space mapping-based transfer learning algorithm mainly considers that, in the case of making a decision in the driving scenes with the same actions, the same reward functions, and the similar driving scenes, the feature probability distribution of the optimal driving decision state mapping space of the smart vehicle should be the same, i.e., the feature probability distribution is the same, regardless of whether the smart vehicle is in the simulated driving scene or the real driving scene, i.e., the smart vehicle makes a decision in the driving scenes with the same actions, the same reward functions, and the similar driving scenes
Figure BDA0003410521370000136
Wherein f and g represent neural network functions of feature space mapping, and similarity measurement indexes (2-norm) are adopted to optimize the neural network functions, and the specific formula is as follows:
Figure BDA0003410521370000137
in the formula (I), the compound is shown in the specification,
Figure BDA0003410521370000138
representing the optimal value action state set of the intelligent vehicle in the simulation environment;
Figure BDA0003410521370000139
representing the optimal value action state set of the intelligent vehicle in the real driving environment;
Figure BDA00034105213700001310
a neural network function representing a feature space mapping in the source domain (in a simulated driving environment);
Figure BDA00034105213700001311
a neural network function representing a feature space mapping within a target domain (in a real driving environment); and omegafAnd ωgThe weight parameters of the neural network functions f and g are represented, respectively.
Objectively speaking, the mapping function f andg should be invertible, in order to maximize the mapping functions f and g as much as possible, and to preserve the invariant information of the respective domains, here mainly by training the decoder network to reconstruct the optimal value action state sets from the mapping feature space, respectively
Figure BDA00034105213700001312
And
Figure BDA00034105213700001313
the optimization objective of the decoder network training is then as follows:
Figure BDA00034105213700001314
Figure BDA00034105213700001315
in the formula (I), the compound is shown in the specification,
Figure BDA00034105213700001316
representing a reconstruction target of a decoder in a source domain;
Figure BDA00034105213700001317
representing a reconstruction target of the decoder within the target domain; wherein ω isSAnd ωTThe weight parameters of the two decoders are represented, respectively.
In summary, the optimization objective of the transfer learning algorithm model based on the feature space mapping is shown in the following formula (35), and meanwhile, according to step 11), the optimal value action state of the smart vehicle in the simulation scene at the time t is obtained
Figure BDA0003410521370000141
On the premise of (1), the optimal value action state of the intelligent vehicle in the real driving scene at the moment t can be solved by combining the neural network functions f and g mapped by the feature space
Figure BDA0003410521370000142
As shown in the following formula (36):
Figure BDA0003410521370000143
Figure BDA0003410521370000144
in the formula, psi represents the reward weight of the intelligent vehicle optimal value decision migration.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. An intelligent vehicle coupling decision model considering dangerous driving conditions, comprising: the system comprises a traffic scene model, a driving condition evaluation model and a behavior decision model;
the traffic scene model adopts a Markov model according to the position, speed and orientation angle information, lane environment structure information and traffic rule information of the intelligent vehicle and surrounding traffic participants;
the driving condition evaluation model divides driving conditions into general driving conditions and dangerous driving conditions based on the collision risk with surrounding traffic participants and the lane change risk of the intelligent vehicle when the intelligent vehicle runs;
the behavior decision model adopts a decision based on coupling of rules and a deep reinforcement learning algorithm, and on one hand, a decision algorithm based on driving rules is constructed from the angles of driving safety rules, danger obstacle avoidance rules and rules that pedestrians have the highest priority, so that the driving decision under the general driving working condition is processed; on the other hand, for dangerous driving conditions, a Deep Q Network (DQN) model with a constrained action space is adopted to enable the intelligent vehicle to autonomously learn the optimal driving action strategy in an interactive scene.
2. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the traffic scene model is specifically as follows:
the method comprises a state space s (t) of a scene, a decision action a (t) of the intelligent vehicle, a reward function r (t) and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at a time t +1, wherein the state space s (t) of the traffic scene is formed by state information s (t) of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t) information composition; the decision action a (t) of the intelligent vehicle is to control the longitudinal acceleration a of the intelligent vehicle according to the behavior decision modelL(t) and front wheel Angle aT(t) a decision action set of the intelligent vehicle is correspondingly formed; the design of the reward function r (t) fuses the constraint r of the navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t) information; the random state transfer function p (s (t +1) | s (t), a (t)) is the state transfer probability distribution p (s +1) | s (t)) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participantsOA(t +1) | s (t)).
3. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the input information of the driving condition evaluation model comprises speed v of the intelligent vehicle at time tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane change
Figure FDA0003410521360000011
And lateral velocity
Figure FDA0003410521360000012
Boundary with lane when changing lane intelligentlyTransverse distance d of the wireALLane width wkAnd (4) information.
4. An intelligent vehicle coupling decision model considering dangerous driving conditions according to claim 1, characterized by comprising a collision risk model δ and a lane change risk model η;
the collision risk model delta compares the actual distance D between the intelligent vehicle and the front traffic participant by using the time before vehicle (TH) and Time To Collision (TTC) indexesh(t) and a safety distance Ds(t), wherein the safety distance is mainly determined by the braking distance of the intelligent vehicle from the driving condition evaluation model vAV(t) ρ, Final following distance
Figure FDA0003410521360000021
And longitudinal displacement of front traffic participants
Figure FDA0003410521360000022
Calculating to obtain;
the lane change risk model eta compares the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebThe lane change risk is judged, wherein the self-adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4
5. The intelligent vehicle coupling decision model considering the dangerous driving condition as claimed in claim 4, wherein the driving condition evaluation model divides the driving condition of the intelligent vehicle according to the collision risk model δ and the lane change risk model η as shown in the following formula (1):
Figure FDA0003410521360000023
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
6. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the behavior decision model:
when the driving working condition is a common driving working condition, a decision algorithm based on driving rules is adopted to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action space
Figure FDA0003410521360000024
And decision actions a (t) of the intelligent vehicle; storing decision actions a (t), a reward function r (t), a scene state s (t) and a scene state s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in a form of quadruples (s (t), a (t), r (t), s (t + 1)); training a DQN model by randomly sampling a plurality of groups of sample data from a storage space V at each iteration to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of the TD in the training iteration process of the DQN model
Figure FDA0003410521360000031
Weight parameter ω of-=ω;
When the driving working condition is a dangerous driving working condition, randomly selecting a decision action a (t) of the intelligent vehicle, and storing the decision action a (t), the reward function r (t), the scene state s (t) and the scene state s (t +1) of the intelligent vehicle at the time t +1 in a form of quadruple (s (t), a (t), r (t), s (t +1)) in V; training a DQN model by randomly sampling 64 groups of sample data from each iteration in a storage space V to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating a Q network in the training iteration process of the DQN modelAnd the weight parameters ω and TD (time difference algorithm) of
Figure FDA0003410521360000032
Weight parameter ω of-=ω;
Solving the optimal value action state of the intelligent vehicle in the simulation scene at the moment t
Figure FDA0003410521360000033
And finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping
Figure FDA0003410521360000034
The feature space mapping transfer learning algorithm considers that the feature probability distribution of the optimal driving decision state mapping space of the intelligent vehicle is the same when the intelligent vehicle makes decisions under the driving scenes with the same action, the same reward function and the similar driving scenes no matter in a simulation driving scene or a real driving scene, namely the feature probability distribution is the same
Figure FDA0003410521360000035
Wherein f and g represent neural network functions of feature space mapping, and similarity measurement indexes are adopted to optimize the neural network functions, and the specific formula is as follows:
Figure FDA0003410521360000036
in the formula (I), the compound is shown in the specification,
Figure FDA0003410521360000037
representing the optimal value action state set of the intelligent vehicle in the simulation environment;
Figure FDA0003410521360000038
representing the optimal value action state set of the intelligent vehicle in the real driving environment;
Figure FDA0003410521360000039
a neural network function representing a feature space mapping in the source domain (in a simulated driving environment);
Figure FDA00034105213600000310
a neural network function representing a feature space mapping within a target domain (in a real driving environment); and omegafAnd ωgThe weight parameters of the neural network functions f and g are represented, respectively.
7. An intelligent vehicle coupling decision-making method considering dangerous driving conditions is characterized by comprising the following steps:
step 1) building an intelligent vehicle simulation driving scene, and modeling the scene into a Markov decision process;
step 2) collecting the information of the self vehicle and the driving scene through a GPS, a laser radar, a speed sensor and a camera multi-sensor which are arranged on the intelligent vehicle, and taking the information as the input of a driving condition evaluation model;
step 3) constructing a driving condition evaluation model based on the relevant information acquired by the multiple sensors in the step 2), wherein the driving condition evaluation model comprises a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants, and dividing the driving conditions of the intelligent vehicle according to the collision risk model delta and the lane change risk model eta, as shown in the following formula (1):
Figure FDA0003410521360000041
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
Step 4) training an intelligent vehicle decision model, firstly setting hyper-parameters of the DQN model, including the learning rate beta of the model, the training round N and the discount rate gamma of the model, and the initial speed ranges of vehicles and pedestrians;
step 5) randomly initializing the Q networkHeavy parameter omega, TD (time difference algorithm) target
Figure FDA0003410521360000042
Weight parameter ω of-ω, and a storage space V for model training samples;
step 6) in the process of training N times of the model, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through a plurality of sensors and constructs a reward function r (t) corresponding to the current state;
step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action space
Figure FDA0003410521360000043
And decision actions a (t) of the intelligent vehicle;
the decision algorithm based on the driving rule is based on the driving safety rule, the driving obstacle avoidance rule and the courtesy pedestrian rule, combines the IF-THEN information to trigger an event, and passes through the special position information P of the intelligent vehicle*(t) navigation target Point position information
Figure FDA0003410521360000044
And current state information of the intelligent vehicle
Figure FDA0003410521360000045
Generating a desired motion space
Figure FDA0003410521360000046
And decision actions a (t) of the intelligent vehicle;
step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;
step 9) from the storage space V each timeThe method comprises the steps of training a DQN model by using 64 groups of sample data through sub-iteration random sampling to calculate reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating weight parameters omega of a Q network and an objective function of TD (time difference algorithm) in the training iteration process of the DQN model
Figure FDA0003410521360000047
Weight parameter ω of-=ω;
Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);
step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time t
Figure FDA0003410521360000051
And finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping
Figure FDA0003410521360000052
8. The intelligent vehicle coupling decision method considering dangerous driving conditions according to claim 7, wherein the modeling of the simulated traffic scene in step 1) is a Markov decision process, specifically as follows:
constructing a state space s (t) of a scene, a decision action a (t) of the intelligent vehicle, a reward function r (t) and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at the time t +1, wherein the state space s (t) of the traffic scene is formed by state information s (t) of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t) and the like; the driving decision of the intelligent vehicle is made by controlling the longitudinal acceleration a of the intelligent vehicleL(t) and front wheel Angle aT(t) and correspondingly forming a decision action set a (t) of the intelligent vehicle; in addition, the reward function r (t) is constructed by considering the constraint r of the navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t), etc.; finally, the random state transition function is the state transition probability distribution p(s) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participantsOA(t +1) | s (t)).
9. The intelligent vehicle coupling decision method considering dangerous driving conditions as claimed in claim 7, wherein the input information of the driving condition evaluation model in step 3) comprises the speed v of the intelligent vehicle at time tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane change
Figure FDA0003410521360000055
And lateral velocity
Figure FDA0003410521360000056
Transverse distance d between lane and boundary line during intelligent lane changeALLane width wkAnd the like;
the collision risk model delta compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH), the collision time (TTC) and other indexesh(t) and a safety distance Ds(t) wherein the safety distance is mainly the braking distance v of the intelligent vehicleAV(t) ρ, Final following distance
Figure FDA0003410521360000053
And longitudinal displacement of front traffic participants
Figure FDA0003410521360000054
Calculating to obtain;
the lane change risk model eta compares the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebThe adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4
10. The intelligent vehicle coupling decision method considering the dangerous driving condition as claimed in claim 7, wherein the initial learning rate β of the DQN model in step 4) is set to 0.002, the model structure is formed by a five-layer fully-connected network, and each hidden layer of the network contains 100 neuron nodes, while the initial training round N and the discount rate γ of the model are set to 10000 and 0.9, respectively, and the range of the initial speed of the vehicle and the pedestrian in the simulation scene is [15,65] km/h, [0,5] km/h, respectively;
the model training in step 9) is to train the DQN model by a time difference algorithm (TD), and the procedure is as follows: firstly, an optimal value action function Q is obtained by solving based on sample data (s (t), a (t), r (t), s (t +1)) and an optimal Bellman equation*(s (t), a (t)), and replacing it with a neural network Q (s (t), a (t) | ω); then, the objective function of the TD algorithm is set
Figure FDA0003410521360000061
Calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error;
the transfer learning algorithm based on the feature space mapping in the step 11) considers that the feature probability distribution of the optimal driving decision state mapping space is the same when the intelligent vehicle makes a decision in the driving scenes with the same action, the same reward function and the similar driving scenes no matter in a simulated driving scene or a real driving scene,namely, it is
Figure FDA0003410521360000062
Wherein f and g represent neural network functions of the feature space mapping.
CN202111526027.0A 2021-12-14 2021-12-14 Intelligent vehicle coupling decision model and method considering dangerous driving conditions Pending CN114312830A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111526027.0A CN114312830A (en) 2021-12-14 2021-12-14 Intelligent vehicle coupling decision model and method considering dangerous driving conditions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111526027.0A CN114312830A (en) 2021-12-14 2021-12-14 Intelligent vehicle coupling decision model and method considering dangerous driving conditions

Publications (1)

Publication Number Publication Date
CN114312830A true CN114312830A (en) 2022-04-12

Family

ID=81050039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111526027.0A Pending CN114312830A (en) 2021-12-14 2021-12-14 Intelligent vehicle coupling decision model and method considering dangerous driving conditions

Country Status (1)

Country Link
CN (1) CN114312830A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346450A (en) * 2019-07-22 2021-02-09 沃尔沃汽车公司 Robust autonomous driving design
CN114880938A (en) * 2022-05-16 2022-08-09 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN115630583A (en) * 2022-12-08 2023-01-20 西安深信科创信息技术有限公司 Method, device, equipment and medium for generating simulated vehicle driving state
CN116946162A (en) * 2023-09-19 2023-10-27 东南大学 Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition
CN117574111A (en) * 2024-01-15 2024-02-20 大秦数字能源技术股份有限公司 BMS algorithm selection method, device, equipment and medium based on scene state
CN117708999A (en) * 2024-02-06 2024-03-15 北京航空航天大学 Scene-oriented hybrid electric vehicle energy management strategy evaluation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239741A (en) * 2014-09-28 2014-12-24 清华大学 Travelling risk field-based automobile driving safety assistance method
US20160187880A1 (en) * 2014-12-25 2016-06-30 Automotive Research & Testing Center Driving control system and dynamic decision control method thereof
CN108332977A (en) * 2018-01-23 2018-07-27 常熟昆仑智能科技有限公司 A kind of classifying and analyzing method joining automotive test scene to intelligent network
CN112242059A (en) * 2020-09-30 2021-01-19 南京航空航天大学 Intelligent decision-making method for unmanned vehicle based on motivation and risk assessment
CN113253739A (en) * 2021-06-24 2021-08-13 深圳慧拓无限科技有限公司 Driving behavior decision method for expressway
CN113291308A (en) * 2021-06-02 2021-08-24 天津职业技术师范大学(中国职业培训指导教师进修中心) Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239741A (en) * 2014-09-28 2014-12-24 清华大学 Travelling risk field-based automobile driving safety assistance method
US20160187880A1 (en) * 2014-12-25 2016-06-30 Automotive Research & Testing Center Driving control system and dynamic decision control method thereof
CN108332977A (en) * 2018-01-23 2018-07-27 常熟昆仑智能科技有限公司 A kind of classifying and analyzing method joining automotive test scene to intelligent network
CN112242059A (en) * 2020-09-30 2021-01-19 南京航空航天大学 Intelligent decision-making method for unmanned vehicle based on motivation and risk assessment
CN113291308A (en) * 2021-06-02 2021-08-24 天津职业技术师范大学(中国职业培训指导教师进修中心) Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics
CN113253739A (en) * 2021-06-24 2021-08-13 深圳慧拓无限科技有限公司 Driving behavior decision method for expressway

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112346450A (en) * 2019-07-22 2021-02-09 沃尔沃汽车公司 Robust autonomous driving design
CN114880938A (en) * 2022-05-16 2022-08-09 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN114880938B (en) * 2022-05-16 2023-04-18 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN115630583A (en) * 2022-12-08 2023-01-20 西安深信科创信息技术有限公司 Method, device, equipment and medium for generating simulated vehicle driving state
CN116946162A (en) * 2023-09-19 2023-10-27 东南大学 Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition
CN116946162B (en) * 2023-09-19 2023-12-15 东南大学 Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition
CN117574111A (en) * 2024-01-15 2024-02-20 大秦数字能源技术股份有限公司 BMS algorithm selection method, device, equipment and medium based on scene state
CN117574111B (en) * 2024-01-15 2024-03-19 大秦数字能源技术股份有限公司 BMS algorithm selection method, device, equipment and medium based on scene state
CN117708999A (en) * 2024-02-06 2024-03-15 北京航空航天大学 Scene-oriented hybrid electric vehicle energy management strategy evaluation method
CN117708999B (en) * 2024-02-06 2024-04-09 北京航空航天大学 Scene-oriented hybrid electric vehicle energy management strategy evaluation method

Similar Documents

Publication Publication Date Title
CN114312830A (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
CN110745136B (en) Driving self-adaptive control method
Huang et al. Personalized trajectory planning and control of lane-change maneuvers for autonomous driving
CN112347567B (en) Vehicle intention and track prediction method
CN112162555B (en) Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN112888612A (en) Autonomous vehicle planning
CN107813820A (en) A kind of unmanned vehicle lane-change paths planning method for imitating outstanding driver
Min et al. Deep Q learning based high level driving policy determination
CN115257745A (en) Automatic driving lane change decision control method based on rule fusion reinforcement learning
Sun et al. DDPG-based decision-making strategy of adaptive cruising for heavy vehicles considering stability
CN110956851A (en) Intelligent networking automobile cooperative scheduling lane changing method
Yu et al. Autonomous overtaking decision making of driverless bus based on deep Q-learning method
CN114564016A (en) Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning
CN115257819A (en) Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment
Sun et al. Human-like highway trajectory modeling based on inverse reinforcement learning
Feng et al. Active collision avoidance strategy considering motion uncertainty of the pedestrian
CN113255998B (en) Expressway unmanned vehicle formation method based on multi-agent reinforcement learning
CN113200054B (en) Path planning method and system for automatic driving take-over
CN114368387A (en) Attention mechanism-based driver intention identification and vehicle track prediction method
Lodh et al. Autonomous vehicular overtaking maneuver: A survey and taxonomy
Dubey et al. Autonomous braking and throttle system: A deep reinforcement learning approach for naturalistic driving
US20230391371A1 (en) Precise pull-over with mechanical simulation
Zhang et al. Spatial attention for autonomous decision-making in highway scene
CN114802306A (en) Intelligent vehicle integrated decision-making system based on man-machine co-driving concept
Siboo et al. An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination