CN114312830A - Intelligent vehicle coupling decision model and method considering dangerous driving conditions - Google Patents
Intelligent vehicle coupling decision model and method considering dangerous driving conditions Download PDFInfo
- Publication number
- CN114312830A CN114312830A CN202111526027.0A CN202111526027A CN114312830A CN 114312830 A CN114312830 A CN 114312830A CN 202111526027 A CN202111526027 A CN 202111526027A CN 114312830 A CN114312830 A CN 114312830A
- Authority
- CN
- China
- Prior art keywords
- intelligent vehicle
- driving
- decision
- model
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000008878 coupling Effects 0.000 title claims abstract description 26
- 238000010168 coupling process Methods 0.000 title claims abstract description 26
- 238000005859 coupling reaction Methods 0.000 title claims abstract description 26
- 230000009471 action Effects 0.000 claims abstract description 84
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 57
- 230000008859 change Effects 0.000 claims abstract description 31
- 238000013507 mapping Methods 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 27
- 238000004088 simulation Methods 0.000 claims abstract description 23
- 238000013526 transfer learning Methods 0.000 claims abstract description 14
- 238000012546 transfer Methods 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 65
- 238000012549 training Methods 0.000 claims description 36
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000013210 evaluation model Methods 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 10
- 230000035484 reaction time Effects 0.000 claims description 7
- 230000007704 transition Effects 0.000 claims description 7
- 230000001133 acceleration Effects 0.000 claims description 6
- 230000006399 behavior Effects 0.000 claims description 6
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 150000001875 compounds Chemical class 0.000 claims description 6
- 238000006073 displacement reaction Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 230000003044 adaptive effect Effects 0.000 claims description 5
- 238000011156 evaluation Methods 0.000 claims description 4
- 230000002787 reinforcement Effects 0.000 claims description 4
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000005259 measurement Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 3
- 230000008447 perception Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Abstract
The invention discloses an intelligent vehicle coupling decision model and method considering dangerous driving conditions, which adopt a decision method of self-learning and driving rule coupling, overcome the problems of limitation, lack of flexibility, unreliability and the like of a single decision method, and effectively process the intelligent vehicle driving decision problem of various complex traffic scenes. The invention fully considers the collision risk and lane change risk in the driving process of the intelligent vehicle, and divides the corresponding decision algorithm on the basis, thereby further improving the real-time performance of the intelligent vehicle decision and the decision reliability of the intelligent vehicle under the dangerous driving working condition. The transfer learning algorithm based on the feature space mapping provided by the invention realizes the transfer of the optimal value action of the intelligent vehicle from the knowledge of a simulation scene to a real scene, solves the problem of modeling error of the real traffic scene, simultaneously verifies the effectiveness of the intelligent vehicle coupling decision model provided by the invention in the real driving scene, and greatly improves the transfer learning capability of the intelligent vehicle.
Description
Technical Field
The invention relates to the technical field of unmanned vehicle driving decision, in particular to an intelligent vehicle coupling decision model and method considering dangerous driving conditions.
Background
In the current stage of research, the development of future intelligent driving technology is generally considered to play a crucial role in solving the problems of road safety, traffic congestion, reduction of the workload of drivers and the like. One of the core challenges of the current intelligent driving technology is to make a safe and efficient driving decision based on external uncertainty multi-sensor fusion sensing information and existing driving priori knowledge when the vehicle can drive in a highly complex traffic environment. Therefore, the decision algorithm needs to further consider the influence factors such as the personalized requirements (including safety, comfort and high efficiency) of the driver, the road environment structure, traffic regulation constraint information, vehicle dynamics performance and driving habits in different regions, and the algorithm also has wide applicability and robustness to deal with the randomness of the high-dimensional traffic environment, particularly the problem of decision failure and the like caused by the asynchronous information of the perception layer and the decision layer.
The existing decision algorithms are mainly classified into the following three categories: the driving decision algorithm based on the reinforcement learning, the driving decision algorithm based on the driving rules, and the driving decision algorithm based on the coupling of the driving rules and the self-learning algorithm are gradually becoming a research hotspot due to partial interpretability of the decision process and applicability to a high-dimensional random dynamic environment. However, in the existing driving decision algorithm, in consideration of sampling efficiency and decision safety of experimental samples in a real traffic scene, aspects such as driving data analysis, decision model training, decision model verification and the like are mostly performed in a constructed simulation environment, and whether the optimal value decision of an intelligent vehicle in the simulation environment is suitable for the real traffic driving environment cannot be verified, so that decision knowledge transfer from a simulator to the real environment is realized. And the construction of the simulation driving environment mostly considers single driving environments such as an expressway, the reliability of decision making of an intelligent vehicle driving decision algorithm under dangerous driving conditions is less considered, and particularly, the decision making research of the intelligent vehicle on the consideration of collision risks and lane change risks in the driving process is less.
Disclosure of Invention
In order to solve the technical problem, the invention constructs an intelligent vehicle coupling decision model considering dangerous driving conditions. In the construction of the intelligent vehicle simulation driving scene model, the invention considers the position, speed and orientation angle information, lane environment structure information, traffic rule information and the like of the intelligent vehicle and surrounding traffic participants, and models the constructed traffic scene into a Markov Decision Process (MDP). On the input information acquisition of the intelligent vehicle driving condition evaluation model, information such as a vehicle, surrounding traffic participants, lane environment, driving rule constraints and the like is acquired through a GPS positioning device, a speed and acceleration sensor, a laser radar, a camera and the like which are installed on the intelligent vehicle, and the driving conditions are divided based on the collision risk of the intelligent vehicle and the surrounding traffic participants and the lane change risk of the intelligent vehicle, so that the driving conditions are divided into general driving conditions and dangerous driving conditions. In the selection of the intelligent vehicle behavior decision model algorithm, considering the problems of the limitation of a driving rule base, the lack of flexibility for random scene processing and the like, the method mainly adopts a decision mode based on the coupling of a rule and a deep reinforcement learning algorithm, and on one hand, a decision method based on the driving rule is constructed from the aspects of a driving safety rule, a danger obstacle avoidance rule, a pedestrian rule with the highest priority and the like, so that the driving decision under the common driving condition is effectively processed, and the interpretability of the decision process is improved; on the other hand, for dangerous driving conditions, a Deep Q Network (DQN) model with a constrained action space is mainly adopted to enable the intelligent vehicle to autonomously learn the optimal driving action strategy in an interactive scene. In the knowledge transfer process of the intelligent vehicle optimal value action in the simulation-real environment, the characteristic probability distribution of the optimal driving decision state mapping space of the intelligent vehicle is the same when the intelligent vehicle makes decisions in the same action, the same reward function and the similar driving scenes regardless of the simulation driving scene or the real driving scene, so the optimal value action state of the intelligent vehicle in the real traffic scene can be solved by implicit learning of the corresponding relation of the corresponding characteristic spaces in different fields.
The technical scheme adopted by the intelligent vehicle coupling decision method considering the dangerous driving working condition sequentially comprises the following steps of:
step 1) building an intelligent vehicle simulation driving scene, and modeling the scene into a Markov decision process;
step 2) collecting the information of the self vehicle and the driving scene through a GPS, a laser radar, a speed sensor, a camera and other sensors which are arranged on the intelligent vehicle, and taking the information as the input of a driving condition evaluation model;
step 3) constructing a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants based on the relevant information acquired by the multiple sensors in the step 2), and dividing the driving conditions of the intelligent vehicle according to the collision risk model delta and the lane change risk model eta, as shown in the following formula (1):
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
Step 4) setting hyper-parameters of the DQN model in the training of the intelligent vehicle decision model, wherein the hyper-parameters comprise the learning rate beta of the model, the training round N and the discount rate gamma of the model, and the initial speed ranges of vehicles and pedestrians;
step 5) randomly initializing weight parameter omega, TD (time difference algorithm) target of Q networkWeight parameter ω of-ω, and a storage space V for model training samples;
step 6) in the process of training N times of the model, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through a plurality of sensors and constructs a reward function r (t) corresponding to the current state;
step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action spaceAnd decision actions a (t) of the intelligent vehicle;
step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;
step 9) randomly sampling 64 groups of sample data from the storage space V every time in an iteration mode to train a DQN model to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of TD (time difference algorithm) in the training iteration process of the DQN modelWeight parameter ω of-=ω;
Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);
step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time tAnd finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping
Further, the modeling of the simulated traffic scene in step 1) as the markov decision process is to construct a state space s (t) of the scene, a decision action a (t) of the smart car, a reward function r (t), and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at time t + 1. Wherein, the state space s (t) of the traffic scene is the state information s of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t) and the like; the driving decision of the intelligent vehicle is made by controlling the longitudinal acceleration a of the intelligent vehicleL(t) and front wheel Angle aT(t) and correspondingly forming a decision action set a (t) of the intelligent vehicle; in addition, the reward function r (t) is constructed by considering the constraint r of the navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t), etc.; finally, the random state transition function is the state transition probability distribution p(s) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participantsOA(t +1) | s (t)).
Further, the input information of the driving condition evaluation model in the step 2) comprises the speed v of the intelligent vehicle at the time tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane changeAnd lateral velocityTransverse distance d between lane and boundary line during intelligent lane changeALLane width wkAnd so on.
Further, the collision risk model δ in step 3) mainly compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH) and the collision time (TTC) and other indexesh(t) and a safety distance Ds(t) wherein the safety distance is mainly the braking distance v of the intelligent vehicleAV(t) ρ, Final following distanceAnd longitudinal displacement of front traffic participantsAnd (6) calculating.
The lane change risk model eta is mainly used for comparing the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebThe adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4。
Further, the initial learning rate β of the DQN model in step 4) is set to 0.002, the model structure is formed by a fully-connected network with five layers, and each hidden layer of the network contains 100 neuron nodes, and the initial training round N and the discount rate γ of the model are set to 10000 and 0.9, respectively. And the ranges of the initial speeds of the vehicles and pedestrians in the simulation scene are [15,65] km/h and [0,5] km/h, respectively.
Further, the decision algorithm based on the driving rule in the step 7) mainly comprises a safety rule of the secondary driving and the drivingThe obstacle avoidance rule and the rule angle of courtesy pedestrians and the like are combined with the IF-THEN information triggering event mode, and the special position information P of the intelligent vehicle is obtained*(t) (e.g., vicinity of intersection), navigation target point position informationAnd current state information of the intelligent vehicleGenerating a desired motion spaceAnd decision actions a (t) of the intelligent vehicle, so that the dimensional requirement of the intelligent vehicle for the perception task is reduced, and the real-time performance and reliability of the decision are improved.
Further, the model training in step 9) is to train the DQN model mainly through a time difference algorithm (TD), and the general process is as follows: firstly, an optimal value action function Q is obtained by solving based on sample data (s (t), a (t), r (t), s (t +1)) and an optimal Bellman equation*(s (t), a (t)), and replacing it with a neural network Q (s (t), a (t) | ω); then, the objective function of the TD algorithm is setAnd calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error.
Further, the transfer learning algorithm based on feature space mapping in step 11) mainly considers that the feature probability distribution of the optimal driving decision state mapping space of the smart vehicle should be the same when the smart vehicle makes a decision in the same action, the same reward function and the similar driving scenes no matter in the simulated driving scene or the real driving scene, that is, the feature probability distribution is the same in the optimal driving decision state mapping spaceWherein f and g represent neural network functions of the feature space mapping.
The invention has the beneficial effects that:
1. according to the intelligent vehicle coupling decision model considering the dangerous driving working condition, the decision method of self-learning and driving rule coupling is adopted, the problems of limitation, lack of flexibility, unreliability and the like of a single decision method are solved, and the intelligent vehicle driving decision problem of various complex traffic scenes can be effectively processed.
2. The intelligent vehicle coupling decision model provided by the invention fully considers the collision risk and the lane change risk in the driving process of the intelligent vehicle, and divides the corresponding decision algorithm on the basis, thereby further improving the real-time performance of the intelligent vehicle decision and the decision reliability of the intelligent vehicle under the dangerous driving working condition.
3. The invention provides a transfer learning algorithm based on feature space mapping, which realizes the transfer of the optimal value action of the intelligent vehicle from the knowledge of a simulation scene to a real scene, solves the problem of modeling error of a real traffic scene, simultaneously verifies the effectiveness of the intelligent vehicle coupling decision model in a real driving scene, and greatly improves the transfer learning capability of the intelligent vehicle.
Drawings
FIG. 1 is a study route of the present invention
FIG. 2 is a view of the simulated driving scene of the intelligent vehicle
FIG. 3 is a schematic diagram of the collision risk of the intelligent vehicle of the present invention
FIG. 4 is a schematic diagram of the lane change risk of the intelligent vehicle according to the present invention
FIG. 5 is a schematic diagram of the adaptive braking safety distance of the present invention
FIG. 6 is a flowchart of a transfer learning algorithm based on feature space mapping according to the present invention
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in FIG. 1, the invention provides an intelligent vehicle coupling decision model and method considering dangerous driving conditions. The technical scheme of the invention comprises the following steps in sequence,
step 1): firstly, a model of a simulated driving scene of the intelligent vehicle is constructed, as shown in fig. 2, and the simulated driving scene is modeled as a markov decision process, which is composed of a state space s (t) of a traffic scene, a decision action a (t) of the intelligent vehicle, a reward function r (t), a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at a time t +1, and the like.
1) State space s (t) of traffic scene
For the state space s (t) of the traffic scene, the state space s (t) is mainly composed of the state information s of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t), and the like. Wherein the state information s of the intelligent vehicleAV(t) is the position p of the intelligent vehicleAV(t), velocity vAV(t) and orientation information θAV(t) is represented by the following formula (2):
sAV(t)={pAV(t),vAV(t),θAV(t)} (2)
in the formula, pAV(t) is expressed as the position coordinate (x) of the intelligent vehicle at the time tAV,yAV)。
Status information s of surrounding traffic participantsOA(t) then includes its positionSpeed of rotationOrientation ofAnd category informationAs shown in the following formula (3):
in the formula (I), the compound is shown in the specification,then the position coordinates (x) of the surrounding traffic participants at time t are representedOV,yOV) (ii) a i represents the ith traffic participant in the scene; j represents the category information of the surrounding traffic participants, wherein j 1 represents a vehicle and j 0 represents a pedestrian.
Lane structure and traffic regulation information sTR(t) is represented by the following formula (4):
in the formula, k represents the current lane number and the kth lane; ckA position vector represented as a lane centerline point; wkExpressed as the width of the lane in which it is located;then representing the angle of the tangent direction of the central line point of the lane; vmin,kThe minimum speed limit expressed as the lane in which it is located; vmax,kThe speed is expressed as the maximum speed limit of the lane;denoted as traffic signal, which determines whether the vehicle needs to stop at the end by a (0,1) signal;indicating the position of a navigation target point of the intelligent vehicle; tau issThe driving boundary of the traffic scene is represented, the boundary is formed by point rows which are connected in sequence, and the points are connected by straight lines.
In summary, the state space s (t) of the traffic scene can be represented as:
s(t)={sAV(t),sOA(t),sTR(t)} (5)
2) decision action a (t) of intelligent vehicle
For the future driving decision action set of the intelligent vehicle, the future driving decision action set mainly comprises the longitudinal acceleration a of the intelligent vehicleL(t) and front wheel Angle aT(t) is asRepresented by the following formula (6):
a(t)={aL(t),aT(t)} (6)
in the formula, the longitudinal acceleration a is considered for the comfort of drivingL(t) has a value range of [ -3,2 ]]m/s2(ii) a Front wheel corner aT(t) is in the range of [ -40 DEG, 40 DEG]。
3) Reward function r (t)
In the reinforcement learning process, a reward function needs to be designed to reward or punish the operation of the intelligent vehicle in the driving process, and the reward function design mainly considers the constraint r of a navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t), and the like.
A) Constraint r of navigation target points1(t)
The motion decision of the intelligent vehicle in the driving process is subjected to a navigation target point to a certain extentThe vehicle needs to plan a reasonable path to reach the point in the travelable area, and the reward function r of the vehicle1(t) can be expressed as:
B) running safety index r2(t)
The avoidance of collision is the premise of a driving decision of the intelligent vehicle, and if the intelligent vehicle has a collision accident in the model training process, the model training of the round is finished. Wherein the running safety index r2(t) can be expressed as:
r2(t)=-vAV(t)2·φ{Collsion} (8)
in the formula, when the intelligent vehicle has a collision accident, the value of phi { Collision } is 1, and the value of phi is 0; as can be seen from the formula (8), the faster the intelligent vehicle speed, the more serious the accident.
C) FeasibleDriving area constraint r3(t)
Similarly, the driving range of the intelligent vehicle should be within the state set of the drivable region, and the intelligent vehicle is punished when exceeding the state set range. Particularly, when the current person is considered, the intelligent vehicle needs to make an avoidance behavior, so that the constraint of a lane is not needed to be considered, and the constraint of a driving area is only needed to be considered. So that the driving area of the intelligent vehicle is restricted r3The expression of (t) is as follows:
D) lane constraint r4(t)
According to the driving rule, the driving direction of the intelligent vehicle is mostly consistent with the direction of the lane, otherwise, the intelligent vehicle is punished, and the lane constraint r of the intelligent vehicle4The expression of (t) is as follows:
r4(t)=cosα(t)-sinα(t) (10)
in the formula, α represents an included angle between the driving direction of the intelligent vehicle and the lane direction, as shown in fig. 2.
In summary, the final reward function of the smart car is determined by r1(t)、r2(t)、r3(t)、r4(t) is given by the weighted sum of the following equation (11):
in the formula, ωLRepresenting a weight parameter.
4) Random state transfer function p (s (t +1) | s (t), a (t))
Considering the interaction among the traffic participants, given the current state s (t) and the action a (t) of the selected intelligent vehicle, the random state transfer function p (s (t +1) | s (t), a (t)) of the scene at the moment t +1 is mainly composed of the state transfer probability distribution p(s) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the status of the surrounding participantsTransition probability distribution p(s)OAThe product of (t +1) | s (t)), which is expressed by the following formula (12):
p(s(t+1)|s(t),a(t))=p(sAV(t+1)|sAV(t),a(t))×p(sOA(t+1)|s(t)) (12)
step 2): based on the driving simulation scene constructed above, the driving simulation scene information of the self-vehicle and the surrounding driving scene information of the self-vehicle is acquired through a GPS, a laser radar, a speed sensor, a camera and other multi-sensors which are arranged on the intelligent vehicle, and mainly comprises the speed v of the intelligent vehicle at the moment tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane changeAnd lateral velocityTransverse distance d between lane and boundary line during intelligent lane changeALLane width wkAnd the like, and the information is used as the input of the driving condition evaluation model.
Step 3): constructing a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants based on the relevant information acquired by the multiple sensors in the step 2).
1) Collision risk model delta
As shown in fig. 3, the collision risk model δ mainly compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH) and the Time To Collision (TTC) indexesh(t) and a safety distance Ds(t) the magnitude of the ratio, wherein the safety distance Ds(t) braking distance v is mainly determined by intelligent vehicleAV(t) ρ, Final following distanceAnd longitudinal displacement of front traffic participantsCalculated as follows, formula (12):
in the formula, vAV(t) and vFV(t) respectively representing the speed of the intelligent vehicle and the front traffic participant at the moment t; v'AV(t) and v'FV(t) respectively representing the deceleration of the intelligent vehicle and the front traffic participant at the moment t, wherein the deceleration of the intelligent vehicle and the front traffic participant is the same as the deceleration of the intelligent vehicle and the front traffic participant in value for the passenger vehicle; rho represents the reaction time of the intelligent vehicle, including the reaction time rho of the system1And brake response time ρ2。
The collision risk model δ of the smart car with the surrounding traffic participants can be expressed as:
in the formula, when delta is larger than or equal to 1, the intelligent vehicle has collision risk, otherwise, the intelligent vehicle does not have collision risk.
Finally, by using the Time Headway (TH), the Time To Collision (TTC) and other indexes, which are defined as the following formula (14), and combining the above formulas (12) and (13), the final intelligent vehicle collision risk model δ is shown as the following formula (15).
2) Lane change risk model η
As shown in fig. 4 and 5, the lane change risk model η is mainly obtained by comparing the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebWherein the adaptive braking distance of the rear vehicle is mainly accumulatedDriving distance D of rear vehicle driver in reaction stage1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4And (6) obtaining.
A) Distance D between two vehicles after lane change of intelligent vehicleLFComputing
Before the intelligent vehicle changes lane, the longitudinal distance D between the intelligent vehicle and the rear vehicleLBCan be expressed as:
DLB=yAV-yOV (16)
according to the transverse speed of the intelligent vehicleAnd lateral accelerationSolving the time t from the intelligent vehicle to the center line of the target laneLC:
In the formula, wkIndicates the lane width, dALThen the lateral distance of the smart vehicle from the lane boundary is indicated.
Then at tLCThe longitudinal displacement of the rear vehicle over the time period may be expressed as:
similarly, the intelligent vehicle is at tLCThe longitudinal displacement over the time period can then be expressed as:
finally, after the lane change of the intelligent vehicle is completed, the distance D between the intelligent vehicle and the rear vehicleLFIt can be expressed as:
B) self-adaptive braking distance D of rear vehiclebComputing
Adaptive braking distance D for rear vehiclebThe calculation of (2) mainly takes into account the rear vehicle speed, the rear vehicle braking performance, the driver and the response time of the system, and is described in detail as follows:
step 1: suppose driver reaction time t1(1s), the driving distance D of the driver of the rear vehicle in the reaction stage1Then it is:
D1=vOV(t)×t1 (21)
step 2: in the response phase of rear vehicle braking, the response time is assumed to be t2(0.2s), the driving distance D in the brake response stage of the rear vehicle2Then it is:
D2=vOV(t)×t2 (22)
step 3: at the rear vehicle braking force increasing stage t3In this case, the change in the deceleration of the rear vehicle is approximately a linear change, assuming that the rear vehicle is at a comfortable deceleration (a)soft) The speed is reduced, and the driving distance D in the process of increasing the braking force of the vehicle at the later stage is obtained3Can be expressed as:
step 4: in the continuous braking stage of the rear vehicle, the rear vehicle uses asoftThe deceleration of the magnitude is reduced to reduce the speed of the following vehicle to zero, and the driving distance D of the following vehicle is obtained in the stage4Can be expressed as:
step 5: finally, the driving distance D of the driver in the reaction stage of the rear vehicle is accumulated1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4Solving the self-adaptive braking distance D of the rear vehiclebAs shown in the following formula:
in summary, the lane change risk model η of the smart car can be expressed as:
η=DLF-Db (26)
in the formula, when eta is less than or equal to 0, the lane change risk exists in the intelligent vehicle, otherwise, the lane change risk does not exist.
Meanwhile, based on the constructed collision risk model delta and lane change risk model eta of the intelligent vehicle and surrounding traffic participants, the driving condition of the intelligent vehicle is evaluated according to the following evaluation criteria:
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
And 4) setting hyper-parameters of the DQN model including the learning rate beta of the model, the training round N and the discount rate gamma of the model and the initial speed ranges of the vehicles and pedestrians in the training of the intelligent vehicle decision model. The initial learning rate beta of the DQN model is set to be 0.002, the model structure is formed by a five-layer fully-connected network, each hidden layer of the network contains 100 neuron nodes, and the initial training turn N and the discount rate gamma of the model are set to be 10000 and 0.9 respectively. And the ranges of the initial speeds of the vehicles and pedestrians in the simulation scene are [15,65] km/h and [0,5] km/h, respectively.
Step 5) then randomly initializing weight parameters omega, TD (time difference algorithm) target of Q networkWeight parameter ω of-ω, and a storage space V for model training samples.
And 6) in the process of N times of model training, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through the multiple sensors and constructs a reward function r (t) corresponding to the current state.
Step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action spaceAnd decision action a (t) of the smart car.
The decision algorithm based on the driving rules is mainly realized by setting a larger driving safety distance for the intelligent vehicle in a simulation scene, executing operations such as braking or steering when meeting static obstacles, avoiding pedestrians, following normal driving rules when running straight or turning at a crossroad and the like from the aspects of driving safety rules, driving obstacle avoidance rules, giving away pedestrians and other rules. In order to reduce the dimensional requirement on complex environment perception, the decision algorithm based on the driving rule mainly combines the IF-THEN information triggering event mode and judges the special position information P of the intelligent vehicle*(t) (e.g., vicinity of intersection), navigation target point position informationAnd current state information of the intelligent vehicleGenerating a desired motion spaceAnd decision actions a (t) of the intelligent vehicle, wherein an action space is expectedIs represented as follows:
in the formula (I), the compound is shown in the specification,representing a desired motion spaceThe intelligent vehicle longitudinal action set;representing a desired motion spaceThe intelligent vehicle transverse motion set.
Step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t + 1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;
step 9) randomly sampling 64 groups of sample data from the storage space V every time in an iteration mode to train a DQN model to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of TD (time difference algorithm) in the training iteration process of the DQN modelWeight parameter ω of-=ω;
For the training of the DQN model, the DQN model is mainly trained by a time difference algorithm (TD), and the rough procedure is as follows:
A) firstly, based on the training sample data (s (t), a (t), r (t), s (t +1)) in the storage space V in the step 8) and the optimal Bellman equationSolving to obtain the optimal value action function Q*(s (t), a (t)) as shown in the following formula (29):
in the formula (I), the compound is shown in the specification,representing the expectation of accumulated rewards of the intelligent vehicle at the moment t + 1; and a represents the motion space set of the smart car.
B) Secondly, considering that in practical problems it is not feasible to solve the optimal strategy by iteration, especially in case of large state space, the computation of the method is large. So the optimum merit function Q is used here*(s (t), a (t)) is replaced with a neural network Q (s (t), a (t) | ω) in the form:
in the formula, Q (s (t), a (t) | ω) represents the prediction of the maximum accumulated return value of all decision actions of the intelligent vehicle by the neural network at the time t, and no factual component is considered; while(also denoted as objective function of TD algorithm)) It represents the prediction of the maximum cumulative return of all decision-making actions of the smart vehicle by the neural network at time t + 1, which is based in part on the real observed reward r (t).
C) Then, consider thatAnd Q (s (t), a (t) | ω) are all values for optimal action Q*(s (t), a (t)), butPartly based on the fact that Q (s (t), a (t) omega) should be as close as possible toTherefore, by using the objective function of the TD algorithmAnd calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error:
D) finally, updating the weight parameter ω in the training iteration process of the DQN model by using a TD algorithm, as follows:
in the formula, β represents the learning rate of the model;an error of the TD algorithm is indicated;it means that the neural network Q (s (t), a (t) | ω) derives the weight parameter ω.
Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);
step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time tAnd finally obtaining the intelligent vehicle by combining a transfer learning algorithm based on feature space mappingOptimal value action state in real driving scene
As shown in fig. 6, the feature space mapping-based transfer learning algorithm mainly considers that, in the case of making a decision in the driving scenes with the same actions, the same reward functions, and the similar driving scenes, the feature probability distribution of the optimal driving decision state mapping space of the smart vehicle should be the same, i.e., the feature probability distribution is the same, regardless of whether the smart vehicle is in the simulated driving scene or the real driving scene, i.e., the smart vehicle makes a decision in the driving scenes with the same actions, the same reward functions, and the similar driving scenesWherein f and g represent neural network functions of feature space mapping, and similarity measurement indexes (2-norm) are adopted to optimize the neural network functions, and the specific formula is as follows:
in the formula (I), the compound is shown in the specification,representing the optimal value action state set of the intelligent vehicle in the simulation environment;representing the optimal value action state set of the intelligent vehicle in the real driving environment;a neural network function representing a feature space mapping in the source domain (in a simulated driving environment);a neural network function representing a feature space mapping within a target domain (in a real driving environment); and omegafAnd ωgThe weight parameters of the neural network functions f and g are represented, respectively.
Objectively speaking, the mapping function f andg should be invertible, in order to maximize the mapping functions f and g as much as possible, and to preserve the invariant information of the respective domains, here mainly by training the decoder network to reconstruct the optimal value action state sets from the mapping feature space, respectivelyAndthe optimization objective of the decoder network training is then as follows:
in the formula (I), the compound is shown in the specification,representing a reconstruction target of a decoder in a source domain;representing a reconstruction target of the decoder within the target domain; wherein ω isSAnd ωTThe weight parameters of the two decoders are represented, respectively.
In summary, the optimization objective of the transfer learning algorithm model based on the feature space mapping is shown in the following formula (35), and meanwhile, according to step 11), the optimal value action state of the smart vehicle in the simulation scene at the time t is obtainedOn the premise of (1), the optimal value action state of the intelligent vehicle in the real driving scene at the moment t can be solved by combining the neural network functions f and g mapped by the feature spaceAs shown in the following formula (36):
in the formula, psi represents the reward weight of the intelligent vehicle optimal value decision migration.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. An intelligent vehicle coupling decision model considering dangerous driving conditions, comprising: the system comprises a traffic scene model, a driving condition evaluation model and a behavior decision model;
the traffic scene model adopts a Markov model according to the position, speed and orientation angle information, lane environment structure information and traffic rule information of the intelligent vehicle and surrounding traffic participants;
the driving condition evaluation model divides driving conditions into general driving conditions and dangerous driving conditions based on the collision risk with surrounding traffic participants and the lane change risk of the intelligent vehicle when the intelligent vehicle runs;
the behavior decision model adopts a decision based on coupling of rules and a deep reinforcement learning algorithm, and on one hand, a decision algorithm based on driving rules is constructed from the angles of driving safety rules, danger obstacle avoidance rules and rules that pedestrians have the highest priority, so that the driving decision under the general driving working condition is processed; on the other hand, for dangerous driving conditions, a Deep Q Network (DQN) model with a constrained action space is adopted to enable the intelligent vehicle to autonomously learn the optimal driving action strategy in an interactive scene.
2. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the traffic scene model is specifically as follows:
the method comprises a state space s (t) of a scene, a decision action a (t) of the intelligent vehicle, a reward function r (t) and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at a time t +1, wherein the state space s (t) of the traffic scene is formed by state information s (t) of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t) information composition; the decision action a (t) of the intelligent vehicle is to control the longitudinal acceleration a of the intelligent vehicle according to the behavior decision modelL(t) and front wheel Angle aT(t) a decision action set of the intelligent vehicle is correspondingly formed; the design of the reward function r (t) fuses the constraint r of the navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t) information; the random state transfer function p (s (t +1) | s (t), a (t)) is the state transfer probability distribution p (s +1) | s (t)) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participantsOA(t +1) | s (t)).
3. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the input information of the driving condition evaluation model comprises speed v of the intelligent vehicle at time tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane changeAnd lateral velocityBoundary with lane when changing lane intelligentlyTransverse distance d of the wireALLane width wkAnd (4) information.
4. An intelligent vehicle coupling decision model considering dangerous driving conditions according to claim 1, characterized by comprising a collision risk model δ and a lane change risk model η;
the collision risk model delta compares the actual distance D between the intelligent vehicle and the front traffic participant by using the time before vehicle (TH) and Time To Collision (TTC) indexesh(t) and a safety distance Ds(t), wherein the safety distance is mainly determined by the braking distance of the intelligent vehicle from the driving condition evaluation model vAV(t) ρ, Final following distanceAnd longitudinal displacement of front traffic participantsCalculating to obtain;
the lane change risk model eta compares the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebThe lane change risk is judged, wherein the self-adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4。
5. The intelligent vehicle coupling decision model considering the dangerous driving condition as claimed in claim 4, wherein the driving condition evaluation model divides the driving condition of the intelligent vehicle according to the collision risk model δ and the lane change risk model η as shown in the following formula (1):
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
6. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the behavior decision model:
when the driving working condition is a common driving working condition, a decision algorithm based on driving rules is adopted to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action spaceAnd decision actions a (t) of the intelligent vehicle; storing decision actions a (t), a reward function r (t), a scene state s (t) and a scene state s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in a form of quadruples (s (t), a (t), r (t), s (t + 1)); training a DQN model by randomly sampling a plurality of groups of sample data from a storage space V at each iteration to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of the TD in the training iteration process of the DQN modelWeight parameter ω of-=ω;
When the driving working condition is a dangerous driving working condition, randomly selecting a decision action a (t) of the intelligent vehicle, and storing the decision action a (t), the reward function r (t), the scene state s (t) and the scene state s (t +1) of the intelligent vehicle at the time t +1 in a form of quadruple (s (t), a (t), r (t), s (t +1)) in V; training a DQN model by randomly sampling 64 groups of sample data from each iteration in a storage space V to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating a Q network in the training iteration process of the DQN modelAnd the weight parameters ω and TD (time difference algorithm) ofWeight parameter ω of-=ω;
Solving the optimal value action state of the intelligent vehicle in the simulation scene at the moment tAnd finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping
The feature space mapping transfer learning algorithm considers that the feature probability distribution of the optimal driving decision state mapping space of the intelligent vehicle is the same when the intelligent vehicle makes decisions under the driving scenes with the same action, the same reward function and the similar driving scenes no matter in a simulation driving scene or a real driving scene, namely the feature probability distribution is the sameWherein f and g represent neural network functions of feature space mapping, and similarity measurement indexes are adopted to optimize the neural network functions, and the specific formula is as follows:
in the formula (I), the compound is shown in the specification,representing the optimal value action state set of the intelligent vehicle in the simulation environment;representing the optimal value action state set of the intelligent vehicle in the real driving environment;a neural network function representing a feature space mapping in the source domain (in a simulated driving environment);a neural network function representing a feature space mapping within a target domain (in a real driving environment); and omegafAnd ωgThe weight parameters of the neural network functions f and g are represented, respectively.
7. An intelligent vehicle coupling decision-making method considering dangerous driving conditions is characterized by comprising the following steps:
step 1) building an intelligent vehicle simulation driving scene, and modeling the scene into a Markov decision process;
step 2) collecting the information of the self vehicle and the driving scene through a GPS, a laser radar, a speed sensor and a camera multi-sensor which are arranged on the intelligent vehicle, and taking the information as the input of a driving condition evaluation model;
step 3) constructing a driving condition evaluation model based on the relevant information acquired by the multiple sensors in the step 2), wherein the driving condition evaluation model comprises a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants, and dividing the driving conditions of the intelligent vehicle according to the collision risk model delta and the lane change risk model eta, as shown in the following formula (1):
in the formula, DcRepresenting a set of intelligent vehicle driving conditions; ddIndicating a dangerous driving condition; dgIt indicates a general driving condition.
Step 4) training an intelligent vehicle decision model, firstly setting hyper-parameters of the DQN model, including the learning rate beta of the model, the training round N and the discount rate gamma of the model, and the initial speed ranges of vehicles and pedestrians;
step 5) randomly initializing the Q networkHeavy parameter omega, TD (time difference algorithm) targetWeight parameter ω of-ω, and a storage space V for model training samples;
step 6) in the process of training N times of the model, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through a plurality of sensors and constructs a reward function r (t) corresponding to the current state;
step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action spaceAnd decision actions a (t) of the intelligent vehicle;
the decision algorithm based on the driving rule is based on the driving safety rule, the driving obstacle avoidance rule and the courtesy pedestrian rule, combines the IF-THEN information to trigger an event, and passes through the special position information P of the intelligent vehicle*(t) navigation target Point position informationAnd current state information of the intelligent vehicleGenerating a desired motion spaceAnd decision actions a (t) of the intelligent vehicle;
step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;
step 9) from the storage space V each timeThe method comprises the steps of training a DQN model by using 64 groups of sample data through sub-iteration random sampling to calculate reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating weight parameters omega of a Q network and an objective function of TD (time difference algorithm) in the training iteration process of the DQN modelWeight parameter ω of-=ω;
Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);
step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time tAnd finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping
8. The intelligent vehicle coupling decision method considering dangerous driving conditions according to claim 7, wherein the modeling of the simulated traffic scene in step 1) is a Markov decision process, specifically as follows:
constructing a state space s (t) of a scene, a decision action a (t) of the intelligent vehicle, a reward function r (t) and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at the time t +1, wherein the state space s (t) of the traffic scene is formed by state information s (t) of the intelligent vehicleAV(t) status information s of surrounding traffic participantsOA(t), Lane Structure and traffic rules sTR(t) and the like; the driving decision of the intelligent vehicle is made by controlling the longitudinal acceleration a of the intelligent vehicleL(t) and front wheel Angle aT(t) and correspondingly forming a decision action set a (t) of the intelligent vehicle; in addition, the reward function r (t) is constructed by considering the constraint r of the navigation target point1(t) index of running safety r2(t) travelable area constraint r3(t) and lane constraint r4(t), etc.; finally, the random state transition function is the state transition probability distribution p(s) of the intelligent vehicleAV(t+1)|sAV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participantsOA(t +1) | s (t)).
9. The intelligent vehicle coupling decision method considering dangerous driving conditions as claimed in claim 7, wherein the input information of the driving condition evaluation model in step 3) comprises the speed v of the intelligent vehicle at time tAV(t) speed v of the front traffic participantFV(t) speed v of the vehicle in the adjacent laneOV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in fronth(t) longitudinal speed of intelligent vehicle during lane changeAnd lateral velocityTransverse distance d between lane and boundary line during intelligent lane changeALLane width wkAnd the like;
the collision risk model delta compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH), the collision time (TTC) and other indexesh(t) and a safety distance Ds(t) wherein the safety distance is mainly the braking distance v of the intelligent vehicleAV(t) ρ, Final following distanceAnd longitudinal displacement of front traffic participantsCalculating to obtain;
the lane change risk model eta compares the distance D between two vehicles after the lane change of the intelligent vehicleLFSelf-adaptive braking distance D with rear vehiclebThe adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle1And the driving distance D of the rear vehicle in the braking response stage2And the driving distance D at the rear vehicle braking force increasing stage3Driving distance D between the vehicle and the rear vehicle in the continuous braking stage4。
10. The intelligent vehicle coupling decision method considering the dangerous driving condition as claimed in claim 7, wherein the initial learning rate β of the DQN model in step 4) is set to 0.002, the model structure is formed by a five-layer fully-connected network, and each hidden layer of the network contains 100 neuron nodes, while the initial training round N and the discount rate γ of the model are set to 10000 and 0.9, respectively, and the range of the initial speed of the vehicle and the pedestrian in the simulation scene is [15,65] km/h, [0,5] km/h, respectively;
the model training in step 9) is to train the DQN model by a time difference algorithm (TD), and the procedure is as follows: firstly, an optimal value action function Q is obtained by solving based on sample data (s (t), a (t), r (t), s (t +1)) and an optimal Bellman equation*(s (t), a (t)), and replacing it with a neural network Q (s (t), a (t) | ω); then, the objective function of the TD algorithm is setCalculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error;
the transfer learning algorithm based on the feature space mapping in the step 11) considers that the feature probability distribution of the optimal driving decision state mapping space is the same when the intelligent vehicle makes a decision in the driving scenes with the same action, the same reward function and the similar driving scenes no matter in a simulated driving scene or a real driving scene,namely, it isWherein f and g represent neural network functions of the feature space mapping.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111526027.0A CN114312830A (en) | 2021-12-14 | 2021-12-14 | Intelligent vehicle coupling decision model and method considering dangerous driving conditions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111526027.0A CN114312830A (en) | 2021-12-14 | 2021-12-14 | Intelligent vehicle coupling decision model and method considering dangerous driving conditions |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114312830A true CN114312830A (en) | 2022-04-12 |
Family
ID=81050039
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111526027.0A Pending CN114312830A (en) | 2021-12-14 | 2021-12-14 | Intelligent vehicle coupling decision model and method considering dangerous driving conditions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114312830A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112346450A (en) * | 2019-07-22 | 2021-02-09 | 沃尔沃汽车公司 | Robust autonomous driving design |
CN114880938A (en) * | 2022-05-16 | 2022-08-09 | 重庆大学 | Method for realizing decision of automatically driving automobile behavior |
CN115630583A (en) * | 2022-12-08 | 2023-01-20 | 西安深信科创信息技术有限公司 | Method, device, equipment and medium for generating simulated vehicle driving state |
CN116946162A (en) * | 2023-09-19 | 2023-10-27 | 东南大学 | Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition |
CN117574111A (en) * | 2024-01-15 | 2024-02-20 | 大秦数字能源技术股份有限公司 | BMS algorithm selection method, device, equipment and medium based on scene state |
CN117708999A (en) * | 2024-02-06 | 2024-03-15 | 北京航空航天大学 | Scene-oriented hybrid electric vehicle energy management strategy evaluation method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104239741A (en) * | 2014-09-28 | 2014-12-24 | 清华大学 | Travelling risk field-based automobile driving safety assistance method |
US20160187880A1 (en) * | 2014-12-25 | 2016-06-30 | Automotive Research & Testing Center | Driving control system and dynamic decision control method thereof |
CN108332977A (en) * | 2018-01-23 | 2018-07-27 | 常熟昆仑智能科技有限公司 | A kind of classifying and analyzing method joining automotive test scene to intelligent network |
CN112242059A (en) * | 2020-09-30 | 2021-01-19 | 南京航空航天大学 | Intelligent decision-making method for unmanned vehicle based on motivation and risk assessment |
CN113253739A (en) * | 2021-06-24 | 2021-08-13 | 深圳慧拓无限科技有限公司 | Driving behavior decision method for expressway |
CN113291308A (en) * | 2021-06-02 | 2021-08-24 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics |
-
2021
- 2021-12-14 CN CN202111526027.0A patent/CN114312830A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104239741A (en) * | 2014-09-28 | 2014-12-24 | 清华大学 | Travelling risk field-based automobile driving safety assistance method |
US20160187880A1 (en) * | 2014-12-25 | 2016-06-30 | Automotive Research & Testing Center | Driving control system and dynamic decision control method thereof |
CN108332977A (en) * | 2018-01-23 | 2018-07-27 | 常熟昆仑智能科技有限公司 | A kind of classifying and analyzing method joining automotive test scene to intelligent network |
CN112242059A (en) * | 2020-09-30 | 2021-01-19 | 南京航空航天大学 | Intelligent decision-making method for unmanned vehicle based on motivation and risk assessment |
CN113291308A (en) * | 2021-06-02 | 2021-08-24 | 天津职业技术师范大学(中国职业培训指导教师进修中心) | Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics |
CN113253739A (en) * | 2021-06-24 | 2021-08-13 | 深圳慧拓无限科技有限公司 | Driving behavior decision method for expressway |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112346450A (en) * | 2019-07-22 | 2021-02-09 | 沃尔沃汽车公司 | Robust autonomous driving design |
CN114880938A (en) * | 2022-05-16 | 2022-08-09 | 重庆大学 | Method for realizing decision of automatically driving automobile behavior |
CN114880938B (en) * | 2022-05-16 | 2023-04-18 | 重庆大学 | Method for realizing decision of automatically driving automobile behavior |
CN115630583A (en) * | 2022-12-08 | 2023-01-20 | 西安深信科创信息技术有限公司 | Method, device, equipment and medium for generating simulated vehicle driving state |
CN116946162A (en) * | 2023-09-19 | 2023-10-27 | 东南大学 | Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition |
CN116946162B (en) * | 2023-09-19 | 2023-12-15 | 东南大学 | Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition |
CN117574111A (en) * | 2024-01-15 | 2024-02-20 | 大秦数字能源技术股份有限公司 | BMS algorithm selection method, device, equipment and medium based on scene state |
CN117574111B (en) * | 2024-01-15 | 2024-03-19 | 大秦数字能源技术股份有限公司 | BMS algorithm selection method, device, equipment and medium based on scene state |
CN117708999A (en) * | 2024-02-06 | 2024-03-15 | 北京航空航天大学 | Scene-oriented hybrid electric vehicle energy management strategy evaluation method |
CN117708999B (en) * | 2024-02-06 | 2024-04-09 | 北京航空航天大学 | Scene-oriented hybrid electric vehicle energy management strategy evaluation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114312830A (en) | Intelligent vehicle coupling decision model and method considering dangerous driving conditions | |
CN110745136B (en) | Driving self-adaptive control method | |
Huang et al. | Personalized trajectory planning and control of lane-change maneuvers for autonomous driving | |
CN112347567B (en) | Vehicle intention and track prediction method | |
CN112162555B (en) | Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet | |
CN112888612A (en) | Autonomous vehicle planning | |
CN107813820A (en) | A kind of unmanned vehicle lane-change paths planning method for imitating outstanding driver | |
Min et al. | Deep Q learning based high level driving policy determination | |
CN115257745A (en) | Automatic driving lane change decision control method based on rule fusion reinforcement learning | |
Sun et al. | DDPG-based decision-making strategy of adaptive cruising for heavy vehicles considering stability | |
CN110956851A (en) | Intelligent networking automobile cooperative scheduling lane changing method | |
Yu et al. | Autonomous overtaking decision making of driverless bus based on deep Q-learning method | |
CN114564016A (en) | Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning | |
CN115257819A (en) | Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment | |
Sun et al. | Human-like highway trajectory modeling based on inverse reinforcement learning | |
Feng et al. | Active collision avoidance strategy considering motion uncertainty of the pedestrian | |
CN113255998B (en) | Expressway unmanned vehicle formation method based on multi-agent reinforcement learning | |
CN113200054B (en) | Path planning method and system for automatic driving take-over | |
CN114368387A (en) | Attention mechanism-based driver intention identification and vehicle track prediction method | |
Lodh et al. | Autonomous vehicular overtaking maneuver: A survey and taxonomy | |
Dubey et al. | Autonomous braking and throttle system: A deep reinforcement learning approach for naturalistic driving | |
US20230391371A1 (en) | Precise pull-over with mechanical simulation | |
Zhang et al. | Spatial attention for autonomous decision-making in highway scene | |
CN114802306A (en) | Intelligent vehicle integrated decision-making system based on man-machine co-driving concept | |
Siboo et al. | An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |