CN114312830A

CN114312830A - Intelligent vehicle coupling decision model and method considering dangerous driving conditions

Info

Publication number: CN114312830A
Application number: CN202111526027.0A
Authority: CN
Inventors: 蔡英凤; 张雪翔; 滕成龙; 王海; 刘擎超; 孙晓强; 陈龙; 李祎承; 熊晓夏
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-04-12

Abstract

The invention discloses an intelligent vehicle coupling decision model and method considering dangerous driving conditions, which adopt a decision method of self-learning and driving rule coupling, overcome the problems of limitation, lack of flexibility, unreliability and the like of a single decision method, and effectively process the intelligent vehicle driving decision problem of various complex traffic scenes. The invention fully considers the collision risk and lane change risk in the driving process of the intelligent vehicle, and divides the corresponding decision algorithm on the basis, thereby further improving the real-time performance of the intelligent vehicle decision and the decision reliability of the intelligent vehicle under the dangerous driving working condition. The transfer learning algorithm based on the feature space mapping provided by the invention realizes the transfer of the optimal value action of the intelligent vehicle from the knowledge of a simulation scene to a real scene, solves the problem of modeling error of the real traffic scene, simultaneously verifies the effectiveness of the intelligent vehicle coupling decision model provided by the invention in the real driving scene, and greatly improves the transfer learning capability of the intelligent vehicle.

Description

Intelligent vehicle coupling decision model and method considering dangerous driving conditions

Technical Field

The invention relates to the technical field of unmanned vehicle driving decision, in particular to an intelligent vehicle coupling decision model and method considering dangerous driving conditions.

Background

In the current stage of research, the development of future intelligent driving technology is generally considered to play a crucial role in solving the problems of road safety, traffic congestion, reduction of the workload of drivers and the like. One of the core challenges of the current intelligent driving technology is to make a safe and efficient driving decision based on external uncertainty multi-sensor fusion sensing information and existing driving priori knowledge when the vehicle can drive in a highly complex traffic environment. Therefore, the decision algorithm needs to further consider the influence factors such as the personalized requirements (including safety, comfort and high efficiency) of the driver, the road environment structure, traffic regulation constraint information, vehicle dynamics performance and driving habits in different regions, and the algorithm also has wide applicability and robustness to deal with the randomness of the high-dimensional traffic environment, particularly the problem of decision failure and the like caused by the asynchronous information of the perception layer and the decision layer.

The existing decision algorithms are mainly classified into the following three categories: the driving decision algorithm based on the reinforcement learning, the driving decision algorithm based on the driving rules, and the driving decision algorithm based on the coupling of the driving rules and the self-learning algorithm are gradually becoming a research hotspot due to partial interpretability of the decision process and applicability to a high-dimensional random dynamic environment. However, in the existing driving decision algorithm, in consideration of sampling efficiency and decision safety of experimental samples in a real traffic scene, aspects such as driving data analysis, decision model training, decision model verification and the like are mostly performed in a constructed simulation environment, and whether the optimal value decision of an intelligent vehicle in the simulation environment is suitable for the real traffic driving environment cannot be verified, so that decision knowledge transfer from a simulator to the real environment is realized. And the construction of the simulation driving environment mostly considers single driving environments such as an expressway, the reliability of decision making of an intelligent vehicle driving decision algorithm under dangerous driving conditions is less considered, and particularly, the decision making research of the intelligent vehicle on the consideration of collision risks and lane change risks in the driving process is less.

Disclosure of Invention

In order to solve the technical problem, the invention constructs an intelligent vehicle coupling decision model considering dangerous driving conditions. In the construction of the intelligent vehicle simulation driving scene model, the invention considers the position, speed and orientation angle information, lane environment structure information, traffic rule information and the like of the intelligent vehicle and surrounding traffic participants, and models the constructed traffic scene into a Markov Decision Process (MDP). On the input information acquisition of the intelligent vehicle driving condition evaluation model, information such as a vehicle, surrounding traffic participants, lane environment, driving rule constraints and the like is acquired through a GPS positioning device, a speed and acceleration sensor, a laser radar, a camera and the like which are installed on the intelligent vehicle, and the driving conditions are divided based on the collision risk of the intelligent vehicle and the surrounding traffic participants and the lane change risk of the intelligent vehicle, so that the driving conditions are divided into general driving conditions and dangerous driving conditions. In the selection of the intelligent vehicle behavior decision model algorithm, considering the problems of the limitation of a driving rule base, the lack of flexibility for random scene processing and the like, the method mainly adopts a decision mode based on the coupling of a rule and a deep reinforcement learning algorithm, and on one hand, a decision method based on the driving rule is constructed from the aspects of a driving safety rule, a danger obstacle avoidance rule, a pedestrian rule with the highest priority and the like, so that the driving decision under the common driving condition is effectively processed, and the interpretability of the decision process is improved; on the other hand, for dangerous driving conditions, a Deep Q Network (DQN) model with a constrained action space is mainly adopted to enable the intelligent vehicle to autonomously learn the optimal driving action strategy in an interactive scene. In the knowledge transfer process of the intelligent vehicle optimal value action in the simulation-real environment, the characteristic probability distribution of the optimal driving decision state mapping space of the intelligent vehicle is the same when the intelligent vehicle makes decisions in the same action, the same reward function and the similar driving scenes regardless of the simulation driving scene or the real driving scene, so the optimal value action state of the intelligent vehicle in the real traffic scene can be solved by implicit learning of the corresponding relation of the corresponding characteristic spaces in different fields.

The technical scheme adopted by the intelligent vehicle coupling decision method considering the dangerous driving working condition sequentially comprises the following steps of:

step 1) building an intelligent vehicle simulation driving scene, and modeling the scene into a Markov decision process;

step 2) collecting the information of the self vehicle and the driving scene through a GPS, a laser radar, a speed sensor, a camera and other sensors which are arranged on the intelligent vehicle, and taking the information as the input of a driving condition evaluation model;

step 3) constructing a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants based on the relevant information acquired by the multiple sensors in the step 2), and dividing the driving conditions of the intelligent vehicle according to the collision risk model delta and the lane change risk model eta, as shown in the following formula (1):

in the formula, D_cRepresenting a set of intelligent vehicle driving conditions; d_dIndicating a dangerous driving condition; d_gIt indicates a general driving condition.

Step 4) setting hyper-parameters of the DQN model in the training of the intelligent vehicle decision model, wherein the hyper-parameters comprise the learning rate beta of the model, the training round N and the discount rate gamma of the model, and the initial speed ranges of vehicles and pedestrians;

step 5) randomly initializing weight parameter omega, TD (time difference algorithm) target of Q network

Weight parameter ω of^-ω, and a storage space V for model training samples;

step 6) in the process of training N times of the model, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through a plurality of sensors and constructs a reward function r (t) corresponding to the current state;

step 7) evaluating the driving condition of the intelligent vehicle according to the step 3), and when the driving condition is a general driving condition, adopting a decision algorithm based on driving rules to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action space

And decision actions a (t) of the intelligent vehicle;

step 8), storing decision actions a (t), reward functions r (t), scene states s (t) and scene states s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in the form of quadruples (s (t), a (t), r (t) and s (t +1)) in V;

step 9) randomly sampling 64 groups of sample data from the storage space V every time in an iteration mode to train a DQN model to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of TD (time difference algorithm) in the training iteration process of the DQN model

Weight parameter ω of^-＝ω；

Step 10) if the result of the evaluation of the driving condition in the step 3) is a dangerous driving condition, randomly selecting a decision action a (t) of the intelligent vehicle, and repeating the step 8) and the step 9 by adopting a DQN decision algorithm);

step 11) according to the solved optimal value action state of the intelligent vehicle in the simulation scene at the time t

And finally obtaining the optimal value action state of the intelligent vehicle in the real driving scene by combining the transfer learning algorithm based on the feature space mapping

Further, the modeling of the simulated traffic scene in step 1) as the markov decision process is to construct a state space s (t) of the scene, a decision action a (t) of the smart car, a reward function r (t), and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at time t + 1. Wherein, the state space s (t) of the traffic scene is the state information s of the intelligent vehicle_AV(t) status information s of surrounding traffic participants_OA(t), Lane Structure and traffic rules s_TR(t) and the like; the driving decision of the intelligent vehicle is made by controlling the longitudinal acceleration a of the intelligent vehicle_L(t) and front wheel Angle a_T(t) and correspondingly forming a decision action set a (t) of the intelligent vehicle; in addition, the reward function r (t) is constructed by considering the constraint r of the navigation target point₁(t) index of running safety r₂(t) travelable area constraint r₃(t) and lane constraint r₄(t), etc.; finally, the random state transition function is the state transition probability distribution p(s) of the intelligent vehicle_AV(t+1)|s_AV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participants_OA(t +1) | s (t)).

Further, the input information of the driving condition evaluation model in the step 2) comprises the speed v of the intelligent vehicle at the time t_AV(t) speed v of the front traffic participant_FV(t) speed v of the vehicle in the adjacent lane_OV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in front_h(t) longitudinal speed of intelligent vehicle during lane change

And lateral velocity

Transverse distance d between lane and boundary line during intelligent lane change_ALLane width w_kAnd so on.

Further, the collision risk model δ in step 3) mainly compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH) and the collision time (TTC) and other indexes_h(t) and a safety distance D_s(t) wherein the safety distance is mainly the braking distance v of the intelligent vehicle_AV(t) ρ, Final following distance

And longitudinal displacement of front traffic participants

And (6) calculating.

The lane change risk model eta is mainly used for comparing the distance D between two vehicles after the lane change of the intelligent vehicle_LFSelf-adaptive braking distance D with rear vehicle_bThe adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle₁And the driving distance D of the rear vehicle in the braking response stage₂And the driving distance D at the rear vehicle braking force increasing stage₃Driving distance D between the vehicle and the rear vehicle in the continuous braking stage₄。

Further, the initial learning rate β of the DQN model in step 4) is set to 0.002, the model structure is formed by a fully-connected network with five layers, and each hidden layer of the network contains 100 neuron nodes, and the initial training round N and the discount rate γ of the model are set to 10000 and 0.9, respectively. And the ranges of the initial speeds of the vehicles and pedestrians in the simulation scene are [15,65] km/h and [0,5] km/h, respectively.

Further, the decision algorithm based on the driving rule in the step 7) mainly comprises a safety rule of the secondary driving and the drivingThe obstacle avoidance rule and the rule angle of courtesy pedestrians and the like are combined with the IF-THEN information triggering event mode, and the special position information P of the intelligent vehicle is obtained^*(t) (e.g., vicinity of intersection), navigation target point position information

And current state information of the intelligent vehicle

Generating a desired motion space

And decision actions a (t) of the intelligent vehicle, so that the dimensional requirement of the intelligent vehicle for the perception task is reduced, and the real-time performance and reliability of the decision are improved.

Further, the model training in step 9) is to train the DQN model mainly through a time difference algorithm (TD), and the general process is as follows: firstly, an optimal value action function Q is obtained by solving based on sample data (s (t), a (t), r (t), s (t +1)) and an optimal Bellman equation_*(s (t), a (t)), and replacing it with a neural network Q (s (t), a (t) | ω); then, the objective function of the TD algorithm is set

And calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error.

Further, the transfer learning algorithm based on feature space mapping in step 11) mainly considers that the feature probability distribution of the optimal driving decision state mapping space of the smart vehicle should be the same when the smart vehicle makes a decision in the same action, the same reward function and the similar driving scenes no matter in the simulated driving scene or the real driving scene, that is, the feature probability distribution is the same in the optimal driving decision state mapping space

Wherein f and g represent neural network functions of the feature space mapping.

The invention has the beneficial effects that:

1. according to the intelligent vehicle coupling decision model considering the dangerous driving working condition, the decision method of self-learning and driving rule coupling is adopted, the problems of limitation, lack of flexibility, unreliability and the like of a single decision method are solved, and the intelligent vehicle driving decision problem of various complex traffic scenes can be effectively processed.

2. The intelligent vehicle coupling decision model provided by the invention fully considers the collision risk and the lane change risk in the driving process of the intelligent vehicle, and divides the corresponding decision algorithm on the basis, thereby further improving the real-time performance of the intelligent vehicle decision and the decision reliability of the intelligent vehicle under the dangerous driving working condition.

3. The invention provides a transfer learning algorithm based on feature space mapping, which realizes the transfer of the optimal value action of the intelligent vehicle from the knowledge of a simulation scene to a real scene, solves the problem of modeling error of a real traffic scene, simultaneously verifies the effectiveness of the intelligent vehicle coupling decision model in a real driving scene, and greatly improves the transfer learning capability of the intelligent vehicle.

Drawings

FIG. 1 is a study route of the present invention

FIG. 2 is a view of the simulated driving scene of the intelligent vehicle

FIG. 3 is a schematic diagram of the collision risk of the intelligent vehicle of the present invention

FIG. 4 is a schematic diagram of the lane change risk of the intelligent vehicle according to the present invention

FIG. 5 is a schematic diagram of the adaptive braking safety distance of the present invention

FIG. 6 is a flowchart of a transfer learning algorithm based on feature space mapping according to the present invention

Detailed Description

The invention will be further explained with reference to the drawings.

As shown in FIG. 1, the invention provides an intelligent vehicle coupling decision model and method considering dangerous driving conditions. The technical scheme of the invention comprises the following steps in sequence,

step 1): firstly, a model of a simulated driving scene of the intelligent vehicle is constructed, as shown in fig. 2, and the simulated driving scene is modeled as a markov decision process, which is composed of a state space s (t) of a traffic scene, a decision action a (t) of the intelligent vehicle, a reward function r (t), a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at a time t +1, and the like.

1) State space s (t) of traffic scene

For the state space s (t) of the traffic scene, the state space s (t) is mainly composed of the state information s of the intelligent vehicle_AV(t) status information s of surrounding traffic participants_OA(t), Lane Structure and traffic rules s_TR(t), and the like. Wherein the state information s of the intelligent vehicle_AV(t) is the position p of the intelligent vehicle_AV(t), velocity v_AV(t) and orientation information θ_AV(t) is represented by the following formula (2):

s_AV(t)＝{p_AV(t),v_AV(t),θ_AV(t)} (2)

in the formula, p_AV(t) is expressed as the position coordinate (x) of the intelligent vehicle at the time t_AV,y_AV)。

Status information s of surrounding traffic participants_OA(t) then includes its position

Speed of rotation

Orientation of

And category information

As shown in the following formula (3):

in the formula (I), the compound is shown in the specification,

then the position coordinates (x) of the surrounding traffic participants at time t are represented_OV,y_OV) (ii) a i represents the ith traffic participant in the scene; j represents the category information of the surrounding traffic participants, wherein j 1 represents a vehicle and j 0 represents a pedestrian.

Lane structure and traffic regulation information s_TR(t) is represented by the following formula (4):

in the formula, k represents the current lane number and the kth lane; c_kA position vector represented as a lane centerline point; w_kExpressed as the width of the lane in which it is located;

then representing the angle of the tangent direction of the central line point of the lane; v_min,kThe minimum speed limit expressed as the lane in which it is located; v_max,kThe speed is expressed as the maximum speed limit of the lane;

denoted as traffic signal, which determines whether the vehicle needs to stop at the end by a (0,1) signal;

indicating the position of a navigation target point of the intelligent vehicle; tau is_sThe driving boundary of the traffic scene is represented, the boundary is formed by point rows which are connected in sequence, and the points are connected by straight lines.

In summary, the state space s (t) of the traffic scene can be represented as:

s(t)＝{s_AV(t),s_OA(t),s_TR(t)} (5)

2) decision action a (t) of intelligent vehicle

For the future driving decision action set of the intelligent vehicle, the future driving decision action set mainly comprises the longitudinal acceleration a of the intelligent vehicle_L(t) and front wheel Angle a_T(t) is asRepresented by the following formula (6):

a(t)＝{a_L(t),a_T(t)} (6)

in the formula, the longitudinal acceleration a is considered for the comfort of driving_L(t) has a value range of [ -3,2 ]]m/s²(ii) a Front wheel corner a_T(t) is in the range of [ -40 DEG, 40 DEG]。

3) Reward function r (t)

In the reinforcement learning process, a reward function needs to be designed to reward or punish the operation of the intelligent vehicle in the driving process, and the reward function design mainly considers the constraint r of a navigation target point₁(t) index of running safety r₂(t) travelable area constraint r₃(t) and lane constraint r₄(t), and the like.

A) Constraint r of navigation target points₁(t)

The motion decision of the intelligent vehicle in the driving process is subjected to a navigation target point to a certain extent

The vehicle needs to plan a reasonable path to reach the point in the travelable area, and the reward function r of the vehicle₁(t) can be expressed as:

B) running safety index r₂(t)

The avoidance of collision is the premise of a driving decision of the intelligent vehicle, and if the intelligent vehicle has a collision accident in the model training process, the model training of the round is finished. Wherein the running safety index r₂(t) can be expressed as:

r₂(t)＝-v_AV(t)²·φ{Collsion} (8)

in the formula, when the intelligent vehicle has a collision accident, the value of phi { Collision } is 1, and the value of phi is 0; as can be seen from the formula (8), the faster the intelligent vehicle speed, the more serious the accident.

C) FeasibleDriving area constraint r₃(t)

Similarly, the driving range of the intelligent vehicle should be within the state set of the drivable region, and the intelligent vehicle is punished when exceeding the state set range. Particularly, when the current person is considered, the intelligent vehicle needs to make an avoidance behavior, so that the constraint of a lane is not needed to be considered, and the constraint of a driving area is only needed to be considered. So that the driving area of the intelligent vehicle is restricted r₃The expression of (t) is as follows:

D) lane constraint r₄(t)

According to the driving rule, the driving direction of the intelligent vehicle is mostly consistent with the direction of the lane, otherwise, the intelligent vehicle is punished, and the lane constraint r of the intelligent vehicle₄The expression of (t) is as follows:

r₄(t)＝cosα(t)-sinα(t) (10)

in the formula, α represents an included angle between the driving direction of the intelligent vehicle and the lane direction, as shown in fig. 2.

In summary, the final reward function of the smart car is determined by r₁(t)、r₂(t)、r₃(t)、r₄(t) is given by the weighted sum of the following equation (11):

in the formula, ω_LRepresenting a weight parameter.

4) Random state transfer function p (s (t +1) | s (t), a (t))

Considering the interaction among the traffic participants, given the current state s (t) and the action a (t) of the selected intelligent vehicle, the random state transfer function p (s (t +1) | s (t), a (t)) of the scene at the moment t +1 is mainly composed of the state transfer probability distribution p(s) of the intelligent vehicle_AV(t+1)|s_AV(t), a (t) and the status of the surrounding participantsTransition probability distribution p(s)_OAThe product of (t +1) | s (t)), which is expressed by the following formula (12):

p(s(t+1)|s(t),a(t))＝p(s_AV(t+1)|s_AV(t),a(t))×p(s_OA(t+1)|s(t)) (12)

step 2): based on the driving simulation scene constructed above, the driving simulation scene information of the self-vehicle and the surrounding driving scene information of the self-vehicle is acquired through a GPS, a laser radar, a speed sensor, a camera and other multi-sensors which are arranged on the intelligent vehicle, and mainly comprises the speed v of the intelligent vehicle at the moment t_AV(t) speed v of the front traffic participant_FV(t) speed v of the vehicle in the adjacent lane_OV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in front_h(t) longitudinal speed of intelligent vehicle during lane change

And lateral velocity

Transverse distance d between lane and boundary line during intelligent lane change_ALLane width w_kAnd the like, and the information is used as the input of the driving condition evaluation model.

Step 3): constructing a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants based on the relevant information acquired by the multiple sensors in the step 2).

1) Collision risk model delta

As shown in fig. 3, the collision risk model δ mainly compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH) and the Time To Collision (TTC) indexes_h(t) and a safety distance D_s(t) the magnitude of the ratio, wherein the safety distance D_s(t) braking distance v is mainly determined by intelligent vehicle_AV(t) ρ, Final following distance

And longitudinal displacement of front traffic participants

Calculated as follows, formula (12):

in the formula, v_AV(t) and v_FV(t) respectively representing the speed of the intelligent vehicle and the front traffic participant at the moment t; v'_AV(t) and v'_FV(t) respectively representing the deceleration of the intelligent vehicle and the front traffic participant at the moment t, wherein the deceleration of the intelligent vehicle and the front traffic participant is the same as the deceleration of the intelligent vehicle and the front traffic participant in value for the passenger vehicle; rho represents the reaction time of the intelligent vehicle, including the reaction time rho of the system₁And brake response time ρ₂。

The collision risk model δ of the smart car with the surrounding traffic participants can be expressed as:

in the formula, when delta is larger than or equal to 1, the intelligent vehicle has collision risk, otherwise, the intelligent vehicle does not have collision risk.

Finally, by using the Time Headway (TH), the Time To Collision (TTC) and other indexes, which are defined as the following formula (14), and combining the above formulas (12) and (13), the final intelligent vehicle collision risk model δ is shown as the following formula (15).

2) Lane change risk model η

As shown in fig. 4 and 5, the lane change risk model η is mainly obtained by comparing the distance D between two vehicles after the lane change of the intelligent vehicle_LFSelf-adaptive braking distance D with rear vehicle_bWherein the adaptive braking distance of the rear vehicle is mainly accumulatedDriving distance D of rear vehicle driver in reaction stage₁And the driving distance D of the rear vehicle in the braking response stage₂And the driving distance D at the rear vehicle braking force increasing stage₃Driving distance D between the vehicle and the rear vehicle in the continuous braking stage₄And (6) obtaining.

A) Distance D between two vehicles after lane change of intelligent vehicle_LFComputing

Before the intelligent vehicle changes lane, the longitudinal distance D between the intelligent vehicle and the rear vehicle_LBCan be expressed as:

D_LB＝y_AV-y_OV (16)

according to the transverse speed of the intelligent vehicle

And lateral acceleration

Solving the time t from the intelligent vehicle to the center line of the target lane_LC：

In the formula, w_kIndicates the lane width, d_ALThen the lateral distance of the smart vehicle from the lane boundary is indicated.

Then at t_LCThe longitudinal displacement of the rear vehicle over the time period may be expressed as:

similarly, the intelligent vehicle is at t_LCThe longitudinal displacement over the time period can then be expressed as:

finally, after the lane change of the intelligent vehicle is completed, the distance D between the intelligent vehicle and the rear vehicle_LFIt can be expressed as:

B) self-adaptive braking distance D of rear vehicle_bComputing

Adaptive braking distance D for rear vehicle_bThe calculation of (2) mainly takes into account the rear vehicle speed, the rear vehicle braking performance, the driver and the response time of the system, and is described in detail as follows:

step 1: suppose driver reaction time t₁(1s), the driving distance D of the driver of the rear vehicle in the reaction stage₁Then it is:

D₁＝v_OV(t)×t₁ (21)

step 2: in the response phase of rear vehicle braking, the response time is assumed to be t₂(0.2s), the driving distance D in the brake response stage of the rear vehicle₂Then it is:

D₂＝v_OV(t)×t₂ (22)

step 3: at the rear vehicle braking force increasing stage t₃In this case, the change in the deceleration of the rear vehicle is approximately a linear change, assuming that the rear vehicle is at a comfortable deceleration (a)_soft) The speed is reduced, and the driving distance D in the process of increasing the braking force of the vehicle at the later stage is obtained₃Can be expressed as:

step 4: in the continuous braking stage of the rear vehicle, the rear vehicle uses a_softThe deceleration of the magnitude is reduced to reduce the speed of the following vehicle to zero, and the driving distance D of the following vehicle is obtained in the stage₄Can be expressed as:

step 5: finally, the driving distance D of the driver in the reaction stage of the rear vehicle is accumulated₁And the driving distance D of the rear vehicle in the braking response stage₂And the driving distance D at the rear vehicle braking force increasing stage₃Driving distance D between the vehicle and the rear vehicle in the continuous braking stage₄Solving the self-adaptive braking distance D of the rear vehicle_bAs shown in the following formula:

in summary, the lane change risk model η of the smart car can be expressed as:

η＝D_LF-D_b (26)

in the formula, when eta is less than or equal to 0, the lane change risk exists in the intelligent vehicle, otherwise, the lane change risk does not exist.

Meanwhile, based on the constructed collision risk model delta and lane change risk model eta of the intelligent vehicle and surrounding traffic participants, the driving condition of the intelligent vehicle is evaluated according to the following evaluation criteria:

And 4) setting hyper-parameters of the DQN model including the learning rate beta of the model, the training round N and the discount rate gamma of the model and the initial speed ranges of the vehicles and pedestrians in the training of the intelligent vehicle decision model. The initial learning rate beta of the DQN model is set to be 0.002, the model structure is formed by a five-layer fully-connected network, each hidden layer of the network contains 100 neuron nodes, and the initial training turn N and the discount rate gamma of the model are set to be 10000 and 0.9 respectively. And the ranges of the initial speeds of the vehicles and pedestrians in the simulation scene are [15,65] km/h and [0,5] km/h, respectively.

Step 5) then randomly initializing weight parameters omega, TD (time difference algorithm) target of Q network

Weight parameter ω of^-ω, and a storage space V for model training samples.

And 6) in the process of N times of model training, at each time step t being 0,1 and 2 …, the intelligent vehicle observes the state space s (t) of the traffic scene through the multiple sensors and constructs a reward function r (t) corresponding to the current state.

And decision action a (t) of the smart car.

The decision algorithm based on the driving rules is mainly realized by setting a larger driving safety distance for the intelligent vehicle in a simulation scene, executing operations such as braking or steering when meeting static obstacles, avoiding pedestrians, following normal driving rules when running straight or turning at a crossroad and the like from the aspects of driving safety rules, driving obstacle avoidance rules, giving away pedestrians and other rules. In order to reduce the dimensional requirement on complex environment perception, the decision algorithm based on the driving rule mainly combines the IF-THEN information triggering event mode and judges the special position information P of the intelligent vehicle^*(t) (e.g., vicinity of intersection), navigation target point position information

And current state information of the intelligent vehicle

Generating a desired motion space

And decision actions a (t) of the intelligent vehicle, wherein an action space is expected

Is represented as follows:

in the formula (I), the compound is shown in the specification,

representing a desired motion space

The intelligent vehicle longitudinal action set;

representing a desired motion space

The intelligent vehicle transverse motion set.

Weight parameter ω of^-＝ω；

For the training of the DQN model, the DQN model is mainly trained by a time difference algorithm (TD), and the rough procedure is as follows:

A) firstly, based on the training sample data (s (t), a (t), r (t), s (t +1)) in the storage space V in the step 8) and the optimal Bellman equationSolving to obtain the optimal value action function Q_*(s (t), a (t)) as shown in the following formula (29):

in the formula (I), the compound is shown in the specification,

representing the expectation of accumulated rewards of the intelligent vehicle at the moment t + 1; and a represents the motion space set of the smart car.

B) Secondly, considering that in practical problems it is not feasible to solve the optimal strategy by iteration, especially in case of large state space, the computation of the method is large. So the optimum merit function Q is used here_*(s (t), a (t)) is replaced with a neural network Q (s (t), a (t) | ω) in the form:

in the formula, Q (s (t), a (t) | ω) represents the prediction of the maximum accumulated return value of all decision actions of the intelligent vehicle by the neural network at the time t, and no factual component is considered; while

(also denoted as objective function of TD algorithm)

) It represents the prediction of the maximum cumulative return of all decision-making actions of the smart vehicle by the neural network at time t +1, which is based in part on the real observed reward r (t).

C) Then, consider that

And Q (s (t), a (t) | ω) are all values for optimal action Q_*(s (t), a (t)), but

Partly based on the fact that Q (s (t), a (t) omega) should be as close as possible to

Therefore, by using the objective function of the TD algorithm

And calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error:

D) finally, updating the weight parameter ω in the training iteration process of the DQN model by using a TD algorithm, as follows:

in the formula, β represents the learning rate of the model;

an error of the TD algorithm is indicated;

it means that the neural network Q (s (t), a (t) | ω) derives the weight parameter ω.

And finally obtaining the intelligent vehicle by combining a transfer learning algorithm based on feature space mappingOptimal value action state in real driving scene

As shown in fig. 6, the feature space mapping-based transfer learning algorithm mainly considers that, in the case of making a decision in the driving scenes with the same actions, the same reward functions, and the similar driving scenes, the feature probability distribution of the optimal driving decision state mapping space of the smart vehicle should be the same, i.e., the feature probability distribution is the same, regardless of whether the smart vehicle is in the simulated driving scene or the real driving scene, i.e., the smart vehicle makes a decision in the driving scenes with the same actions, the same reward functions, and the similar driving scenes

Wherein f and g represent neural network functions of feature space mapping, and similarity measurement indexes (2-norm) are adopted to optimize the neural network functions, and the specific formula is as follows:

in the formula (I), the compound is shown in the specification,

representing the optimal value action state set of the intelligent vehicle in the simulation environment;

representing the optimal value action state set of the intelligent vehicle in the real driving environment;

a neural network function representing a feature space mapping in the source domain (in a simulated driving environment);

a neural network function representing a feature space mapping within a target domain (in a real driving environment); and omega_fAnd ω_gThe weight parameters of the neural network functions f and g are represented, respectively.

Objectively speaking, the mapping function f andg should be invertible, in order to maximize the mapping functions f and g as much as possible, and to preserve the invariant information of the respective domains, here mainly by training the decoder network to reconstruct the optimal value action state sets from the mapping feature space, respectively

And

the optimization objective of the decoder network training is then as follows:

in the formula (I), the compound is shown in the specification,

representing a reconstruction target of a decoder in a source domain;

representing a reconstruction target of the decoder within the target domain; wherein ω is_SAnd ω_TThe weight parameters of the two decoders are represented, respectively.

In summary, the optimization objective of the transfer learning algorithm model based on the feature space mapping is shown in the following formula (35), and meanwhile, according to step 11), the optimal value action state of the smart vehicle in the simulation scene at the time t is obtained

On the premise of (1), the optimal value action state of the intelligent vehicle in the real driving scene at the moment t can be solved by combining the neural network functions f and g mapped by the feature space

As shown in the following formula (36):

in the formula, psi represents the reward weight of the intelligent vehicle optimal value decision migration.

The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. An intelligent vehicle coupling decision model considering dangerous driving conditions, comprising: the system comprises a traffic scene model, a driving condition evaluation model and a behavior decision model;

the traffic scene model adopts a Markov model according to the position, speed and orientation angle information, lane environment structure information and traffic rule information of the intelligent vehicle and surrounding traffic participants;

the driving condition evaluation model divides driving conditions into general driving conditions and dangerous driving conditions based on the collision risk with surrounding traffic participants and the lane change risk of the intelligent vehicle when the intelligent vehicle runs;

the behavior decision model adopts a decision based on coupling of rules and a deep reinforcement learning algorithm, and on one hand, a decision algorithm based on driving rules is constructed from the angles of driving safety rules, danger obstacle avoidance rules and rules that pedestrians have the highest priority, so that the driving decision under the general driving working condition is processed; on the other hand, for dangerous driving conditions, a Deep Q Network (DQN) model with a constrained action space is adopted to enable the intelligent vehicle to autonomously learn the optimal driving action strategy in an interactive scene.

2. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the traffic scene model is specifically as follows:

the method comprises a state space s (t) of a scene, a decision action a (t) of the intelligent vehicle, a reward function r (t) and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at a time t +1, wherein the state space s (t) of the traffic scene is formed by state information s (t) of the intelligent vehicle_AV(t) status information s of surrounding traffic participants_OA(t), Lane Structure and traffic rules s_TR(t) information composition; the decision action a (t) of the intelligent vehicle is to control the longitudinal acceleration a of the intelligent vehicle according to the behavior decision model_L(t) and front wheel Angle a_T(t) a decision action set of the intelligent vehicle is correspondingly formed; the design of the reward function r (t) fuses the constraint r of the navigation target point₁(t) index of running safety r₂(t) travelable area constraint r₃(t) and lane constraint r₄(t) information; the random state transfer function p (s (t +1) | s (t), a (t)) is the state transfer probability distribution p (s +1) | s (t)) of the intelligent vehicle_AV(t+1)|s_AV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participants_OA(t +1) | s (t)).

3. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the input information of the driving condition evaluation model comprises speed v of the intelligent vehicle at time t_AV(t) speed v of the front traffic participant_FV(t) speed v of the vehicle in the adjacent lane_OV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in front_h(t) longitudinal speed of intelligent vehicle during lane change

And lateral velocity

Boundary with lane when changing lane intelligentlyTransverse distance d of the wire_ALLane width w_kAnd (4) information.

4. An intelligent vehicle coupling decision model considering dangerous driving conditions according to claim 1, characterized by comprising a collision risk model δ and a lane change risk model η;

the collision risk model delta compares the actual distance D between the intelligent vehicle and the front traffic participant by using the time before vehicle (TH) and Time To Collision (TTC) indexes_h(t) and a safety distance D_s(t), wherein the safety distance is mainly determined by the braking distance of the intelligent vehicle from the driving condition evaluation model v_AV(t) ρ, Final following distance

And longitudinal displacement of front traffic participants

Calculating to obtain;

the lane change risk model eta compares the distance D between two vehicles after the lane change of the intelligent vehicle_LFSelf-adaptive braking distance D with rear vehicle_bThe lane change risk is judged, wherein the self-adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle₁And the driving distance D of the rear vehicle in the braking response stage₂And the driving distance D at the rear vehicle braking force increasing stage₃Driving distance D between the vehicle and the rear vehicle in the continuous braking stage₄。

5. The intelligent vehicle coupling decision model considering the dangerous driving condition as claimed in claim 4, wherein the driving condition evaluation model divides the driving condition of the intelligent vehicle according to the collision risk model δ and the lane change risk model η as shown in the following formula (1):

6. The intelligent vehicle coupling decision model considering dangerous driving conditions as claimed in claim 1, wherein the behavior decision model:

when the driving working condition is a common driving working condition, a decision algorithm based on driving rules is adopted to realize the transverse and longitudinal decisions of the intelligent vehicle and generate a corresponding expected action space

And decision actions a (t) of the intelligent vehicle; storing decision actions a (t), a reward function r (t), a scene state s (t) and a scene state s (t +1) at the time t +1, which are selected by the intelligent vehicle at the time t, in a form of quadruples (s (t), a (t), r (t), s (t + 1)); training a DQN model by randomly sampling a plurality of groups of sample data from a storage space V at each iteration to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating the weight parameters omega of the Q network and the objective function of the TD in the training iteration process of the DQN model

Weight parameter ω of^-＝ω；

When the driving working condition is a dangerous driving working condition, randomly selecting a decision action a (t) of the intelligent vehicle, and storing the decision action a (t), the reward function r (t), the scene state s (t) and the scene state s (t +1) of the intelligent vehicle at the time t +1 in a form of quadruple (s (t), a (t), r (t), s (t +1)) in V; training a DQN model by randomly sampling 64 groups of sample data from each iteration in a storage space V to calculate the reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating a Q network in the training iteration process of the DQN modelAnd the weight parameters ω and TD (time difference algorithm) of

Weight parameter ω of^-＝ω；

Solving the optimal value action state of the intelligent vehicle in the simulation scene at the moment t

The feature space mapping transfer learning algorithm considers that the feature probability distribution of the optimal driving decision state mapping space of the intelligent vehicle is the same when the intelligent vehicle makes decisions under the driving scenes with the same action, the same reward function and the similar driving scenes no matter in a simulation driving scene or a real driving scene, namely the feature probability distribution is the same

Wherein f and g represent neural network functions of feature space mapping, and similarity measurement indexes are adopted to optimize the neural network functions, and the specific formula is as follows:

in the formula (I), the compound is shown in the specification,

7. An intelligent vehicle coupling decision-making method considering dangerous driving conditions is characterized by comprising the following steps:

step 2) collecting the information of the self vehicle and the driving scene through a GPS, a laser radar, a speed sensor and a camera multi-sensor which are arranged on the intelligent vehicle, and taking the information as the input of a driving condition evaluation model;

step 3) constructing a driving condition evaluation model based on the relevant information acquired by the multiple sensors in the step 2), wherein the driving condition evaluation model comprises a collision risk model delta and a lane change risk model eta of the intelligent vehicle and surrounding traffic participants, and dividing the driving conditions of the intelligent vehicle according to the collision risk model delta and the lane change risk model eta, as shown in the following formula (1):

Step 4) training an intelligent vehicle decision model, firstly setting hyper-parameters of the DQN model, including the learning rate beta of the model, the training round N and the discount rate gamma of the model, and the initial speed ranges of vehicles and pedestrians;

step 5) randomly initializing the Q networkHeavy parameter omega, TD (time difference algorithm) target

Weight parameter ω of^-ω, and a storage space V for model training samples;

And decision actions a (t) of the intelligent vehicle;

the decision algorithm based on the driving rule is based on the driving safety rule, the driving obstacle avoidance rule and the courtesy pedestrian rule, combines the IF-THEN information to trigger an event, and passes through the special position information P of the intelligent vehicle^*(t) navigation target Point position information

And current state information of the intelligent vehicle

Generating a desired motion space

And decision actions a (t) of the intelligent vehicle;

step 9) from the storage space V each timeThe method comprises the steps of training a DQN model by using 64 groups of sample data through sub-iteration random sampling to calculate reward values of all decision-making actions of the intelligent vehicle in each moment state, selecting the action with the optimal value as the decision-making action of the intelligent vehicle in the current scene state, and synchronously updating weight parameters omega of a Q network and an objective function of TD (time difference algorithm) in the training iteration process of the DQN model

Weight parameter ω of^-＝ω；

8. The intelligent vehicle coupling decision method considering dangerous driving conditions according to claim 7, wherein the modeling of the simulated traffic scene in step 1) is a Markov decision process, specifically as follows:

constructing a state space s (t) of a scene, a decision action a (t) of the intelligent vehicle, a reward function r (t) and a random state transfer function p (s (t +1) | s (t), a (t)) of the scene at the time t +1, wherein the state space s (t) of the traffic scene is formed by state information s (t) of the intelligent vehicle_AV(t) status information s of surrounding traffic participants_OA(t), Lane Structure and traffic rules s_TR(t) and the like; the driving decision of the intelligent vehicle is made by controlling the longitudinal acceleration a of the intelligent vehicle_L(t) and front wheel Angle a_T(t) and correspondingly forming a decision action set a (t) of the intelligent vehicle; in addition, the reward function r (t) is constructed by considering the constraint r of the navigation target point₁(t) index of running safety r₂(t) travelable area constraint r₃(t) and lane constraint r₄(t), etc.; finally, the random state transition function is the state transition probability distribution p(s) of the intelligent vehicle_AV(t+1)|s_AV(t), a (t) and the state transition probability distribution p(s) of the surrounding traffic participants_OA(t +1) | s (t)).

9. The intelligent vehicle coupling decision method considering dangerous driving conditions as claimed in claim 7, wherein the input information of the driving condition evaluation model in step 3) comprises the speed v of the intelligent vehicle at time t_AV(t) speed v of the front traffic participant_FV(t) speed v of the vehicle in the adjacent lane_OV(t), collision reaction time rho of intelligent vehicle, and actual distance D between intelligent vehicle and surrounding traffic participants in front_h(t) longitudinal speed of intelligent vehicle during lane change

And lateral velocity

Transverse distance d between lane and boundary line during intelligent lane change_ALLane width w_kAnd the like;

the collision risk model delta compares the actual distance D between the intelligent vehicle and the front traffic participant by using the Time Headway (TH), the collision time (TTC) and other indexes_h(t) and a safety distance D_s(t) wherein the safety distance is mainly the braking distance v of the intelligent vehicle_AV(t) ρ, Final following distance

And longitudinal displacement of front traffic participants

Calculating to obtain;

the lane change risk model eta compares the distance D between two vehicles after the lane change of the intelligent vehicle_LFSelf-adaptive braking distance D with rear vehicle_bThe adaptive braking distance of the rear vehicle is mainly obtained by accumulating the driving distance D of the reaction stage of the driver of the rear vehicle₁And the driving distance D of the rear vehicle in the braking response stage₂And the driving distance D at the rear vehicle braking force increasing stage₃Driving distance D between the vehicle and the rear vehicle in the continuous braking stage₄。

10. The intelligent vehicle coupling decision method considering the dangerous driving condition as claimed in claim 7, wherein the initial learning rate β of the DQN model in step 4) is set to 0.002, the model structure is formed by a five-layer fully-connected network, and each hidden layer of the network contains 100 neuron nodes, while the initial training round N and the discount rate γ of the model are set to 10000 and 0.9, respectively, and the range of the initial speed of the vehicle and the pedestrian in the simulation scene is [15,65] km/h, [0,5] km/h, respectively;

the model training in step 9) is to train the DQN model by a time difference algorithm (TD), and the procedure is as follows: firstly, an optimal value action function Q is obtained by solving based on sample data (s (t), a (t), r (t), s (t +1)) and an optimal Bellman equation_*(s (t), a (t)), and replacing it with a neural network Q (s (t), a (t) | ω); then, the objective function of the TD algorithm is set

Calculating the error of the TD algorithm by taking the difference with Q(s), (t), a (t) omega), and constructing a training loss function L (omega) of the DQN model according to the error;

the transfer learning algorithm based on the feature space mapping in the step 11) considers that the feature probability distribution of the optimal driving decision state mapping space is the same when the intelligent vehicle makes a decision in the driving scenes with the same action, the same reward function and the similar driving scenes no matter in a simulated driving scene or a real driving scene,namely, it is