WO2022126940A1 - 一种重型营运车辆的后向防撞驾驶决策方法 - Google Patents

一种重型营运车辆的后向防撞驾驶决策方法 Download PDF

Info

Publication number
WO2022126940A1
WO2022126940A1 PCT/CN2021/086570 CN2021086570W WO2022126940A1 WO 2022126940 A1 WO2022126940 A1 WO 2022126940A1 CN 2021086570 W CN2021086570 W CN 2021086570W WO 2022126940 A1 WO2022126940 A1 WO 2022126940A1
Authority
WO
WIPO (PCT)
Prior art keywords
vehicle
decision
sub
collision
heavy
Prior art date
Application number
PCT/CN2021/086570
Other languages
English (en)
French (fr)
Inventor
李旭
胡玮明
胡锦超
祝雪芬
Original Assignee
东南大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东南大学 filed Critical 东南大学
Priority to US17/766,870 priority Critical patent/US11964655B2/en
Publication of WO2022126940A1 publication Critical patent/WO2022126940A1/zh

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • B60W30/095Predicting travel path or likelihood of collision
    • B60W30/0956Predicting travel path or likelihood of collision the prediction being responsive to traffic or environmental parameters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07CTIME OR ATTENDANCE REGISTERS; REGISTERING OR INDICATING THE WORKING OF MACHINES; GENERATING RANDOM NUMBERS; VOTING OR LOTTERY APPARATUS; ARRANGEMENTS, SYSTEMS OR APPARATUS FOR CHECKING NOT PROVIDED FOR ELSEWHERE
    • G07C5/00Registering or indicating the working of vehicles
    • G07C5/08Registering or indicating performance data other than driving, working, idle, or waiting time, with or without registering driving, working, idle or waiting time
    • G07C5/0808Diagnosing performance data
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2300/00Indexing codes relating to the type of vehicle
    • B60W2300/12Trucks; Load vehicles
    • B60W2300/125Heavy duty trucks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/408Radar; Laser, e.g. lidar
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2520/00Input parameters relating to overall vehicle dynamics
    • B60W2520/06Direction of travel
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/404Characteristics
    • B60W2554/4041Position
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/80Spatial relation or speed relative to objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Definitions

  • the invention relates to an anti-collision driving decision-making method, in particular to a backward anti-collision driving decision-making method for heavy-duty commercial vehicles, and belongs to the technical field of automobile safety.
  • driving decision-making methods for passenger vehicles are difficult to apply to heavy-duty commercial vehicles.
  • the existing research does not address the driving decision-making of rear collision avoidance of heavy-duty commercial vehicles, especially the driving decision-making research of rear collision avoidance of heavy-duty commercial vehicles that lacks effective, reliable, and adaptive traffic environment characteristics.
  • the present invention discloses a rearward collision avoidance driving decision-making method of a heavy-duty commercial vehicle.
  • the method overcomes the deficiency of the existing methods that lack the decision-making strategy for rearward collision avoidance of heavy-duty commercial vehicles, and can quantitatively output a reasonable steering wheel angle and throttle opening control amount to provide drivers with effective and reliable rearward collision avoidance driving suggestions. , enabling efficient, reliable and adaptive traffic environment rear collision avoidance driving decisions for heavy duty commercial vehicles.
  • the present invention proposes a backward collision avoidance driving decision-making method based on deep reinforcement learning for heavy-duty commercial vehicles, such as semi-trailer tankers and semi-trailer trains.
  • a virtual traffic environment model is established to collect the motion state information of heavy-duty commercial vehicles and the vehicles behind them.
  • a backward collision risk assessment model based on backward distance collision time is established to accurately quantify the backward collision risk.
  • the backward collision avoidance driving decision problem is described as a Markov decision process under a certain reward function, and a backward collision avoidance driving decision model based on deep reinforcement learning is established to obtain an effective, reliable and adaptive backward prevention Crash driving decision-making strategies.
  • Step 1 Establish a virtual traffic environment model
  • the present invention proposes a decision-making method for rearward collision avoidance driving.
  • a decision-making method for rearward collision avoidance driving.
  • the driver should be provided with decision-making strategies such as acceleration and steering in an effective and timely manner to avoid the occurrence of collision accidents.
  • the relevant test of heavy-duty commercial vehicles has high test costs and dangers.
  • the present invention faces high-grade highways and establishes a virtual traffic environment model, that is, a three-lane virtual environment model including straight roads and curves.
  • the heavy-duty operating vehicle moves in the traffic environment model, and the target vehicle (including 3 types of small, medium and large vehicles) follows the vehicle to move, including acceleration, deceleration, constant speed, and lane change 4 different driving conditions.
  • the motion status information can be obtained in real time, including: the position, speed, acceleration, relative distance, and relative speed of the two vehicles;
  • the visual sensor at the rear of the vehicle can obtain the type of the target vehicle in real time;
  • the driver's manipulation information can be read through the CAN bus, including: the throttle opening of the vehicle and the steering wheel angle.
  • the target vehicle refers to the vehicle that is located behind the road where the heavy commercial vehicle travels, is located in the same lane, travels in the same direction, and has the closest distance.
  • Step 2 Establish a rear collision risk assessment model
  • RTTC(t) represents the backward distance collision time at time t, in seconds
  • x c (t) is the inter-vehicle distance, in meters
  • v F (t) represent respectively The speed of the heavy service vehicle and the target vehicle
  • v r (t) is the relative speed of the two vehicles, both in meters per second
  • v r (t) v F (t)-v R (t).
  • the degree of rear collision risk is calculated. According to the national standard "Performance Requirements and Test Procedures for Rearward Collision Warning System for Commercial Vehicles", when the backward distance collision time is not less than 2.1 seconds and not greater than 4.4 seconds, a rearward collision warning is issued, indicating that the rearward collision warning system has passed the test. Based on this, the degree of rear collision risk is quantified:
  • ⁇ w is the quantified value of the rear collision risk.
  • ⁇ w ⁇ 1 it means that there is no danger of rear collision;
  • 0.5 ⁇ w ⁇ 1 it means that there is a danger of rear collision;
  • 0 ⁇ w ⁇ 0.5 it means that the degree of danger of rear collision is very high.
  • Step 3 Establish a decision-making model for rear collision avoidance driving for heavy-duty commercial vehicles
  • the present invention comprehensively considers the influence of traffic environment, vehicle operating state, rear vehicle type, and rear collision risk degree on rear collision, and establishes a heavy operating vehicle Rear collision avoidance driving decision model.
  • Common driving decision-making methods include rule-based and data-based decision-making algorithms.
  • the rule-based decision-making algorithm uses a limited directed connected graph to describe different driving states and the transition relationship between states, so as to generate driving actions according to the transition of driving states.
  • the rule-based decision-making algorithm uses a limited directed connected graph to describe different driving states and the transition relationship between states, so as to generate driving actions according to the transition of driving states.
  • there are uncertainties in vehicle motion parameters, road conditions and rear traffic conditions and it is difficult to make rules to traverse all scenarios, and it is difficult to ensure the effectiveness and adaptability of decision-making.
  • Decision-making algorithms based on data learning use algorithms to imitate the learning process of human knowledge or skills, and achieve continuous improvement of their own learning performance through an interactive self-learning mechanism.
  • the method based on deep reinforcement learning combines the perception ability of deep learning and the decision-making ability of reinforcement learning, and has the characteristics of adaptability to uncertain problems, so as to meet the adaptability of collision avoidance decision-making to the traffic environment and driving conditions. Therefore, the present invention adopts a deep reinforcement learning algorithm to establish a decision-making model for backward collision avoidance driving.
  • Decision methods based on deep reinforcement learning mainly include three categories: value function-based, policy search-based and Actor-Critic architecture-based decision-making methods.
  • the value-based deep reinforcement learning algorithm cannot deal with the problem of continuous output, and cannot meet the needs of continuous output driving strategy in collision avoidance decision-making.
  • the method based on policy search has some shortcomings, such as being sensitive to the step size and difficult to select the step size.
  • the decision method based on Actor-Critic architecture combines value function estimation and policy search, and has a fast update speed.
  • Proximal Policy Optimization PPO solves the problems of slow parameter update and difficult step size determination, and has achieved good results in outputting continuous action space.
  • the present invention adopts the PPO algorithm to establish a decision-making model for backward collision avoidance, and obtains the optimal decision for backward collision avoidance through interactive iterative learning with the random process model of the target vehicle motion. Specifically, it includes the following four sub-steps:
  • Sub-step 1 Define the basic parameters of the backward collision avoidance driving decision model
  • the backward collision avoidance driving decision problem is described as a Markov decision process (S, A, P, r) under a certain reward function, where S is the state space, A is the backward collision avoidance action decision, and P represents The state transition probability due to the uncertainty of the target vehicle motion, r is the reward function.
  • S is the state space
  • A is the backward collision avoidance action decision
  • P represents The state transition probability due to the uncertainty of the target vehicle motion
  • r is the reward function.
  • S t is the state space at time t
  • v F_lon , v r_lon represent the longitudinal speed of the vehicle and the relative longitudinal speed of the two vehicles, respectively, in meters per second
  • a F_lon , a r_lon respectively represent the longitudinal direction of the vehicle Acceleration and the relative longitudinal acceleration of the two vehicles, in meters per second squared
  • ⁇ str is the steering wheel angle of the vehicle, in degrees
  • p thr is the throttle opening
  • L r is the relative distance between vehicles, in units is m
  • ⁇ w , T m respectively represent the degree of rear collision risk and the target vehicle type
  • the present invention uses steering wheel angle and throttle opening as control quantities to define the driving strategy output by the decision-making model, that is, action decision:
  • a t is the action decision at time t
  • ⁇ str_out represents the normalized steering wheel angle control
  • the range is [-1, 1]
  • p thr_out represents the normalized throttle opening control Amount in the range [0,1].
  • the evaluation is made concrete and numerical by establishing a reward function.
  • the backward collision avoidance driving decision is a multi-objective optimization problem involving objectives such as safety and comfort
  • the present invention designs the reward function as:
  • r t is the reward function at time t
  • r 1 is the safety distance reward function
  • r 2 is the comfort reward function
  • r 3 is the penalty function.
  • ⁇ * is the policy with the largest expectation
  • is the backward collision avoidance decision strategy
  • is the discount factor
  • ⁇ (0,1), ⁇ ( ⁇ ) represents the trajectory distribution under the strategy ⁇ .
  • Sub-step 2 Design the network architecture of the backward collision avoidance driving decision model
  • the Actor network takes the state space information as input, and outputs the action decision, that is, the throttle opening and steering wheel angle control amount of the heavy-duty commercial vehicle.
  • the Critic network takes state space information and action decisions as input, and outputs the current "state-action" value. specifically:
  • a hierarchical encoder structure is established, and features are extracted for various types of information in the state space.
  • First construct 3 serially connected convolutional layers (C F1 , C F2 , C F3 ) and 1 max-pooling layer (P 1 ) to provide information on the motion state of the vehicle (longitudinal velocity, longitudinal acceleration, steering wheel angle, Throttle opening) for feature extraction and encoding it as an intermediate feature vector h1 ; using the same structure, namely 3 serially connected convolutional layers (C R1 , C R2 , C R3 ) and 1 max pooling
  • the transformation layer (P 2 ) performs feature extraction on the relative motion state information (relative longitudinal speed, relative longitudinal acceleration, relative distance between vehicles) of the front and rear vehicles, and encodes it as an intermediate feature vector h 2 ; using the convolution layer C W1 And the max pooling layer P 3 performs feature extraction on the degree of collision risk and the target vehicle type, and encodes it as an intermediate feature vector h 3 .
  • a Critic network is established by using a neural network with multiple hidden layer structures. First, the state space S t is input into the hidden layer FC C1 ; meanwhile, the action decision A t is input into the hidden layer FC C2 . Second, the hidden layers FC C1 and FC C2 are merged by tensor addition. Finally, after passing through the fully connected layers FC C3 and FC C4 in turn, the value of the Critic network is output.
  • the number of neurons in the FC C1 layer and the FC C2 layer is set to 400, the number of neurons in the remaining hidden layers is 200, and the activation function of each layer is ReLU.
  • Sub-step 3 Train the backward collision avoidance driving decision model
  • Sub-step 3.1 Initialize Actor network and Critic network
  • Sub-step 3.2 perform an iterative solution, each iteration includes sub-step 3.21 to sub-step 3.4, specifically:
  • Sub-step 3.21 Perform an iterative solution, each iteration includes sub-steps 3.211 to 3.213, specifically:
  • Sub-step 3.211 use the virtual traffic environment model of step 1 to obtain the motion control operation of the vehicle;
  • Sub-step 3.212 Obtain sample data (S t , A t , r t ) by using Actor network;
  • Sub-step 3.213 End the loop and get the sample point set [(S 1 ,A 1 ,r 1 ),(S 2 ,A 2 ,r 2 ),...,(S t ,A t ,r t )];
  • Substep 3.22 Calculate the advantage function
  • V(S t ) represents the value function of the state S t ; indicates that the probability of taking the current action should be increased, Indicates that the probability of taking this action should be reduced.
  • Sub-step 3.23 Perform iterative solution, each iteration includes sub-steps 3.231 to 3.233, specifically:
  • Sub-step 3.231 Calculate the objective function of the Actor network
  • p t ( ⁇ ) represents the ratio of the new policy ⁇ ⁇ to the old policy ⁇ ⁇ _old on the action decision distribution during the policy update process
  • clip( ⁇ ) represents a clipping function
  • Substep 3.233 Update the Critic network parameter J critic :
  • Substep 3.234 End the loop.
  • Sub-step 3.3 Iteratively update according to the method provided in sub-step 3.2, so that the Actor network and the Critic network gradually converge.
  • the current round will be terminated and a new round will be started for training.
  • the training ends when the iteration reaches the maximum number of steps or the model is able to stably and accurately make backward collision avoidance driving decisions.
  • Sub-step 4 Use the backward collision avoidance decision model to output the decision policy
  • Input the information obtained by centimeter-level high-precision differential GPS, inertial measurement unit, millimeter-wave radar, and CAN bus into the trained decision-making model for backward collision avoidance, which can quantitatively output a reasonable steering wheel angle and throttle opening control amount.
  • the method proposed by the present invention realizes the decision-making of backward collision avoidance driving of heavy-duty commercial vehicles, and can provide the driver with an effective and reliable backward collision avoidance driving decision-making strategy;
  • the method proposed by the present invention comprehensively considers the influence of traffic environment, vehicle operating state, rear vehicle type, and rear collision risk degree on rear collision, and accurately quantifies driving strategies such as steering wheel angle and throttle opening in the form of numerical values , the output driving strategy can be adaptively adjusted according to the traffic environment and the driver's operation, which improves the effectiveness, reliability and environmental adaptability of the rear collision avoidance driving decision of heavy-duty commercial vehicles;
  • Fig. 1 is the technical route schematic diagram of the present invention
  • FIG. 2 is a schematic diagram of the network architecture of the backward collision avoidance driving decision-making model established by the present invention.
  • the invention proposes a backward collision avoidance driving decision-making method based on deep reinforcement learning for heavy-duty operating vehicles, such as semi-trailer tankers and semi-trailer trains.
  • a virtual traffic environment model is established to collect the motion state information of heavy-duty commercial vehicles and the vehicles behind them.
  • a backward collision risk assessment model based on backward distance collision time is established to accurately quantify the backward collision risk.
  • the backward collision avoidance driving decision problem is described as a Markov decision process under a certain reward function, and a backward collision avoidance driving decision model based on deep reinforcement learning is established to obtain an effective, reliable and adaptive backward prevention Crash driving decision-making strategies.
  • the technical route of the present invention is shown in Figure 1, and the concrete steps are as follows:
  • Step 1 Establish a virtual traffic environment model
  • the present invention proposes a decision-making method for rearward collision avoidance driving.
  • a decision-making method for rearward collision avoidance driving.
  • the driver should be provided with decision-making strategies such as acceleration and steering in an effective and timely manner to avoid the occurrence of collision accidents.
  • the collision avoidance test of heavy-duty commercial vehicles has high test costs and dangers.
  • the present invention faces high-grade highways and establishes a virtual traffic environment model, that is, a three-lane virtual environment model including straight roads and curves.
  • the heavy-duty operating vehicle moves in the traffic environment model, and the target vehicle (including 3 types of small, medium and large vehicles) follows the vehicle to move, including acceleration, deceleration, constant speed, and lane change 4 different driving conditions.
  • the motion status information can be obtained in real time, including: the position, speed, acceleration, relative distance, and relative speed information of the two vehicles;
  • the visual sensor installed at the rear of the vehicle can obtain the type of the target vehicle in real time;
  • the driver's manipulation information can be read through the CAN bus, including: the throttle opening of the vehicle and the steering wheel angle.
  • the target vehicle refers to the vehicle that is located behind the road where the heavy commercial vehicle travels, is located in the same lane, travels in the same direction, and has the closest distance.
  • Step 2 Establish a rear collision risk assessment model
  • RTTC(t) represents the backward distance collision time at time t, in seconds
  • x c (t) is the inter-vehicle distance, in meters
  • v F (t) represent respectively The speed of the heavy service vehicle and the target vehicle
  • v r (t) is the relative speed of the two vehicles, both in meters per second
  • v r (t) v F (t)-v R (t).
  • the degree of rear collision risk is calculated. According to the national standard "Performance Requirements and Test Procedures for Rearward Collision Warning System for Commercial Vehicles", when the backward distance collision time is not less than 2.1 seconds and not greater than 4.4 seconds, a rearward collision warning is issued, indicating that the rearward collision warning system has passed the test. Based on this, the degree of rear collision risk is quantified:
  • ⁇ w is the quantified value of the rear collision risk.
  • ⁇ w ⁇ 1 it means that there is no danger of rear collision;
  • 0.5 ⁇ w ⁇ 1 it means that there is a danger of rear collision;
  • 0 ⁇ w ⁇ 0.5 it means that the degree of danger of rear collision is very high.
  • Step 3 Establish a decision-making model for rear collision avoidance driving for heavy-duty commercial vehicles
  • the present invention comprehensively considers the influence of traffic environment, vehicle operating state, rear vehicle type, and rear collision risk degree on rear collision, and establishes a heavy operating vehicle Rear collision avoidance driving decision model.
  • Common driving decision-making methods include rule-based and data-based decision-making algorithms.
  • the rule-based decision-making algorithm uses a limited directed connected graph to describe different driving states and the transition relationship between states, so as to generate driving actions according to the transition of driving states.
  • the rule-based decision-making algorithm uses a limited directed connected graph to describe different driving states and the transition relationship between states, so as to generate driving actions according to the transition of driving states.
  • there are uncertainties in vehicle motion parameters, road conditions and rear traffic conditions and it is difficult to make rules to traverse all scenarios, and it is difficult to ensure the effectiveness and adaptability of decision-making.
  • Decision-making algorithms based on data learning use algorithms to imitate the learning process of human knowledge or skills, and achieve continuous improvement of their own learning performance through an interactive self-learning mechanism.
  • the method based on deep reinforcement learning combines the perception ability of deep learning and the decision-making ability of reinforcement learning, and has the characteristics of adaptability to uncertain problems, so as to meet the adaptability of collision avoidance decision-making to the traffic environment and driving conditions. Therefore, the present invention adopts a deep reinforcement learning algorithm to establish a decision-making model for backward collision avoidance driving.
  • Decision methods based on deep reinforcement learning mainly include three categories: value function-based, policy search-based and Actor-Critic architecture-based decision-making methods.
  • the value-based deep reinforcement learning algorithm cannot deal with the problem of continuous output, and cannot meet the needs of continuous output driving strategy in collision avoidance decision-making.
  • the method based on policy search has some shortcomings, such as being sensitive to the step size and difficult to select the step size.
  • the decision method based on Actor-Critic architecture combines value function estimation and policy search, and has a fast update speed.
  • Proximal Policy Optimization PPO solves the problems of slow parameter update and difficult step size determination, and has achieved good results in outputting continuous action space.
  • the present invention adopts the PPO algorithm to establish a decision-making model for backward collision avoidance, and obtains the optimal decision for backward collision avoidance through interactive iterative learning with the random process model of the target vehicle motion. Specifically, it includes the following four sub-steps:
  • Sub-step 1 Define the basic parameters of the backward collision avoidance driving decision model
  • the backward collision avoidance driving decision problem is described as a Markov decision process (S, A, P, r) under a certain reward function, where S is the state space, A is the backward collision avoidance action decision, and P represents The state transition probability due to the uncertainty of the target vehicle motion, r is the reward function.
  • S is the state space
  • A is the backward collision avoidance action decision
  • P represents The state transition probability due to the uncertainty of the target vehicle motion
  • r is the reward function.
  • S t is the state space at time t
  • v F_lon , v r_lon represent the longitudinal speed of the vehicle and the relative longitudinal speed of the two vehicles, respectively, in meters per second
  • a F_lon , a r_lon respectively represent the longitudinal direction of the vehicle Acceleration and the relative longitudinal acceleration of the two vehicles, in meters per second squared
  • ⁇ str is the steering wheel angle of the vehicle, in degrees
  • p thr is the throttle opening
  • L r is the relative distance between vehicles, in units is m
  • ⁇ w , T m respectively represent the degree of rear collision risk and the target vehicle type
  • the present invention uses steering wheel angle and throttle opening as control quantities to define the driving strategy output by the decision-making model, that is, action decision:
  • a t is the action decision at time t
  • ⁇ str_out represents the normalized steering wheel angle control
  • the range is [-1, 1]
  • p thr_out represents the normalized throttle opening control Amount in the range [0,1].
  • the evaluation is made concrete and numerical by establishing a reward function.
  • the backward collision avoidance driving decision is a multi-objective optimization problem involving objectives such as safety and comfort
  • the present invention designs the reward function as:
  • r t is the reward function at time t
  • r 1 is the safety distance reward function
  • r 2 is the comfort reward function
  • r 3 is the penalty function.
  • ⁇ * is the policy with the largest expectation
  • is the backward collision avoidance decision strategy
  • is the discount factor
  • ⁇ (0,1), ⁇ ( ⁇ ) represents the trajectory distribution under the strategy ⁇ .
  • Sub-step 2 Design the network architecture of the backward collision avoidance driving decision model
  • the Actor network takes the state space information as input, and outputs the action decision, that is, the throttle opening and steering wheel angle control amount of the heavy-duty commercial vehicle.
  • the Critic network takes state space information and action decisions as input, and outputs the current "state-action” value.
  • the network architecture is shown in Figure 2, specifically:
  • a hierarchical encoder structure is established, and features are extracted for various types of information in the state space.
  • First construct 3 serially connected convolutional layers (C F1 , C F2 , C F3 ) and 1 max-pooling layer (P 1 ) to provide information on the motion state of the vehicle (longitudinal velocity, longitudinal acceleration, steering wheel angle, Throttle opening) for feature extraction and encoding it as an intermediate feature vector h1 ; using the same structure, namely 3 serially connected convolutional layers (C R1 , C R2 , C R3 ) and 1 max pooling
  • the transformation layer (P 2 ) performs feature extraction on the relative motion state information (relative longitudinal speed, relative longitudinal acceleration, relative distance between vehicles) of the front and rear vehicles, and encodes it as an intermediate feature vector h 2 ; using the convolution layer C W1 And the max pooling layer P 3 performs feature extraction on the degree of collision risk and the target vehicle type, and encodes it as an intermediate feature vector h 3 .
  • a Critic network is established by using a neural network with multiple hidden layer structures. First, the state space S t is input into the hidden layer FC C1 ; meanwhile, the action decision A t is input into the hidden layer FC C2 . Second, the hidden layers FC C1 and FC C2 are merged by tensor addition. Finally, after passing through the fully connected layers FC C3 and FC C4 in turn, the value of the Critic network is output.
  • the number of neurons in the FC C1 layer and the FC C2 layer is set to 400, the number of neurons in the remaining hidden layers is 200, and the activation function of each layer is ReLU.
  • Sub-step 3 Train the backward collision avoidance driving decision model
  • Sub-step 3.1 Initialize Actor network and Critic network
  • Sub-step 3.2 perform an iterative solution, each iteration includes sub-step 3.21 to sub-step 3.4, specifically:
  • Sub-step 3.21 Perform an iterative solution, each iteration includes sub-steps 3.211 to 3.213, specifically:
  • Sub-step 3.211 use the virtual traffic environment model of step 1 to obtain the motion control operation of the vehicle;
  • Sub-step 3.212 Obtain sample data (S t , A t , r t ) by using Actor network;
  • Sub-step 3.213 End the loop and get the sample point set [(S 1 ,A 1 ,r 1 ),(S 2 ,A 2 ,r 2 ),...,(S t ,A t ,r t )];
  • Substep 3.22 Calculate the advantage function
  • V(S t ) represents the value function of the state S t ; indicates that the probability of taking the current action should be increased, Indicates that the probability of taking this action should be reduced.
  • Sub-step 3.23 Perform iterative solution, each iteration includes sub-steps 3.231 to 3.233, specifically:
  • Sub-step 3.231 Calculate the objective function of the Actor network
  • p t ( ⁇ ) represents the ratio of the new policy ⁇ ⁇ to the old policy ⁇ ⁇ _old on the action decision distribution during the policy update process
  • clip( ⁇ ) represents a clipping function
  • Substep 3.233 Update the Critic network parameter J critic :
  • Substep 3.234 End the loop.
  • Sub-step 3.3 Iteratively update according to the method provided in sub-step 3.2, so that the Actor network and the Critic network gradually converge.
  • the current round will be terminated and a new round will be started for training.
  • the training ends when the iteration reaches the maximum number of steps or the model is able to stably and accurately make backward collision avoidance driving decisions.
  • Sub-step 4 Use the backward collision avoidance decision model to output the decision policy
  • centimeter-level high-precision differential GPS, inertial measurement unit, millimeter-wave radar and CAN bus is input into the trained decision-making model for backward collision avoidance, which can quantitatively output reasonable steering wheel angle and throttle opening control amount.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

一种重型营运车辆的后向防撞驾驶决策方法。首先,建立交通环境模型,采集重型营运车辆及其后方车辆的运动状态信息。其次,建立基于后向距离碰撞时间的后向碰撞危险评估模型,将后向碰撞危险精确量化。最后,将后向防撞驾驶决策问题描述为一定奖励函数下的马尔科夫决策过程,建立基于深度强化学习的后向防撞驾驶决策模型,得到有效、可靠、具有自适应性的后向防撞驾驶决策策略。该方法,克服了现有方法中缺乏重型营运车辆后向防撞驾驶决策研究的不足,可以定量输出合理的方向盘转角和节气门开度控制量,为驾驶员提供有效、可靠的后向防撞驾驶建议,减少后向碰撞事故的发生。

Description

一种重型营运车辆的后向防撞驾驶决策方法 技术领域
本发明涉及一种防碰撞驾驶决策方法,尤其是涉及一种重型营运车辆的后向防撞驾驶决策方法,属于汽车安全技术领域。
背景技术
营运车辆作为道路运输的主要承担者,其安全状况直接影响我国道路交通运输安全。车辆碰撞是道路运输过程中最主要的事故形态。以危险品运输罐车为代表的重型营运车辆,其罐内装载多为易燃易爆、剧毒(甲醇、丙烯腈)等危险化学品,相比于前向碰撞,后向碰撞更容易导致罐体破损,进而引发罐内危险品泄漏、燃烧、爆炸等严重后果,产生的次生伤害远远超过碰撞事故本身所造成的伤害,具有更高的危险性。驾驶决策作为后向碰撞主动防控的重要一环,如果能在后向碰撞事故发生前对驾驶员进行预警并提醒驾驶员采取合理的加速、变道等措施,可以大幅度降低因后向碰撞造成的交通事故发生频率或减轻其造成的伤害。因此,研究重型营运车辆的后向防撞驾驶决策方法,对于保障道路交通安全具有重要的社会意义和实用价值。
目前,已有标准、专利和文献对车辆后向防撞进行了研究。在标准方面,交通运输部发布了交通行业标准《营运车辆后向碰撞预警系统性能要求和测试规程》,对安装在营运车辆上的后向碰撞预警系统性能进行了规定,但仅限于碰撞预警层面,未涉及后向防撞驾驶决策。在专利文献方面,后向防撞研究大多面向小型乘用车辆。相比于乘用 车辆,重型营运车辆具有质心位置较高、载重量较大等特点,在急转弯或紧急变道过程中,罐体或挂车的晃动会进一步增加车辆的不稳定性,极易失稳而发生侧翻。因此,针对乘用车辆的驾驶决策方法难以适用于重型营运车辆。总体而言,现有研究未涉及重型营运车辆后向防撞的驾驶决策,特别是缺乏有效、可靠、自适应交通环境特性的重型营运车辆后向防撞驾驶决策研究。
发明内容
发明目的:为了实现有效、可靠、自适应交通环境特性的重型营运车辆后向防撞驾驶决策方法,本发明公开了一种重型营运车辆的后向防撞驾驶决策方法。该方法克服了现有方法中缺乏重型营运车辆后向防撞决策策略的不足,可以定量输出合理的方向盘转角和节气门开度控制量,为驾驶员提供有效、可靠的后向防撞驾驶建议,实现了有效、可靠和自适应交通环境的重型营运车辆后向防撞驾驶决策。
技术方案:本发明针对重型营运车辆,如半挂罐车、半挂列车,提出了一种基于深度强化学习的后向防撞驾驶决策方法。首先,建立虚拟交通环境模型,采集重型营运车辆及其后方车辆的运动状态信息。其次,建立基于后向距离碰撞时间的后向碰撞危险评估模型,将后向碰撞危险精确量化。最后,将后向防撞驾驶决策问题描述为一定奖励函数下的马尔科夫决策过程,建立基于深度强化学习的后向防撞驾驶决策模型,得到有效、可靠、具有自适应性的后向防撞驾驶决策策略。包括以下步骤:
步骤一:建立虚拟交通环境模型
为了降低因后向碰撞造成的交通事故发生频率,提高重型营运车辆的安全性,本发明提出了一种后向防撞驾驶决策方法,其适用的场景为:在重型营运车辆行驶过程中,车辆前方无障碍物等干扰因素,为了防止与后方车辆发生后向碰撞,应有效、及时地为驾驶员提供加速、转向等决策策略,以避免碰撞事故的发生。
在实际道路试验过程中,重型营运车辆的相关试验具有较高的试验成本和危险性。为了降低试验成本和风险,同时兼顾试验效率,本发明面向高等级公路,建立虚拟交通环境模型,即包含直道和弯道的三车道虚拟环境模型。重型营运车辆在交通环境模型中运动,目标车辆(包含小型、中型和大型车辆3种类型)跟随车辆进行运动,期间包括加速、减速、匀速、变道4种不同的行驶工况。
通过安装在每辆车上的厘米级高精度差分GPS、惯性测量单元和毫米波雷达,可以实时地获取运动状态信息,包括:两车的位置、速度、加速度、相对间距、相对速度;通过安装在车辆尾部的视觉传感器,可以实时地获取目标车辆的类型;通过CAN总线可以读取驾驶员操控信息,包括:车辆的节气门开度、方向盘转角。
在本发明中,目标车辆是指位于重型营运车辆行驶道路后方,且位于同一车道线内、行驶方向相同、距离最近的车辆。
步骤二:建立后向碰撞危险评估模型
为了合理、有效地输出后向防撞决策策略,需准确、实时地评估重型营运车辆的后向碰撞危险程度。首先,计算重型营运车辆与目标 车辆发生碰撞所需的时间:
Figure PCTCN2021086570-appb-000001
式(1)中,RTTC(t)表示t时刻的后向距离碰撞时间,单位为秒,x c(t)为车间距离,单位为米,v F(t),v R(t)分别表示重型营运车辆和目标车辆的速度,v r(t)为两车的相对速度,单位均为米每秒,且v r(t)=v F(t)-v R(t)。
其次,计算后向碰撞危险程度。根据国家标准《营运车辆后向碰撞预警系统性能要求和测试规程》,当后向距离碰撞时间不小于2.1秒,且不大于4.4秒时发出后向碰撞报警,表示后向碰撞预警系统测试通过。基于此,对后向碰撞危险程度进行量化:
Figure PCTCN2021086570-appb-000002
式(2)中,δ w为后向碰撞危险的量化值。当δ w≥1时,表示无后向碰撞危险;当0.5≤δ w≤1时,表示存在后向碰撞危险;当0≤δ w≤0.5时,表示后向碰撞危险程度非常高。
步骤三:建立重型营运车辆的后向防撞驾驶决策模型
为了实现有效、可靠、自适应交通环境的后向防撞驾驶决策,本发明综合考虑交通环境、车辆运行状态、后方车辆类型、后向碰撞危险程度对后向碰撞的影响,建立重型营运车辆的后向防撞驾驶决策模型。
常见的驾驶决策方法包括基于规则和基于数据学习的决策算法两类。(1)基于规则的决策算法,是利用有限的有向连通图描述不同的驾驶状态以及状态之间的转移关系,从而根据驾驶状态的迁移生 成驾驶动作。然而,在车辆运动过程中,车辆运动参数、道路条件和后方交通状态均存在不确定性,制定的规则难以遍历所有场景,难以保证决策的有效性和适应性。(2)基于数据学习的决策算法,是利用算法模仿人类对知识或技能的学习过程,通过交互式的自学习机制实现自身学习性能的不断改进。其中,基于深度强化学习的方法将深度学习的感知能力和强化学习的决策能力相结合,对不确定性问题的适应性特点,满足防撞决策对于交通环境、行驶工况具有适应性的需求。因此,本发明采用深度强化学习算法建立后向防撞驾驶决策模型。
基于深度强化学习的决策方法主要包括:基于值函数、基于策略搜索和基于Actor-Critic架构的决策方法三类。基于值的深度强化学习算法不能处理连续输出的问题,无法满足防撞决策中连续输出驾驶策略的需求。基于策略搜索的方法,存在着对步长敏感、步长选择较为困难等不足。基于Actor-Critic架构的决策方法结合了值函数估计和策略搜索,具有较快的更新速度。其中的近端策略优化(Proximal Policy Optimization,PPO)解决了参数更新慢和步长难以确定的问题,在输出连续动作空间方面取得了较好的效果。因此,本发明采用PPO算法建立后向防撞驾驶决策模型,通过与目标车辆运动随机过程模型进行交互式迭代学习,得到后向防撞的最优决策。具体包括以下4个子步骤:
子步骤1:定义后向防撞驾驶决策模型的基础参数
首先,将后向防撞驾驶决策问题描述为一定奖励函数下的马尔科夫决策过程(S,A,P,r),其中,S为状态空间,A为后向防撞动作决策, P表示由于目标车辆运动不确定性而导致的状态转移概率,r为奖励函数。其次,对马尔科夫决策过程的基础参数进行定义,具体地:
(1)定义状态空间
利用步骤一输出的车辆运动状态信息和步骤二输出的后向碰撞危险程度,建立状态空间表达:
S t=(v F_lon,a F_lon,v r_lon,a r_lonstr,p thr,L rw,T m)     (3)
式(3)中,S t为t时刻的状态空间,v F_lon,v r_lon分别表示车辆的纵向速度和两车的相对纵向速度,单位为米每秒,a F_lon,a r_lon分别表示车辆的纵向加速度和两车的相对纵向加速度,单位为米每二次方秒,θ str为车辆的方向盘转角,单位为度,p thr为节气门开度,单位为百分数,L r为相对车间距,单位为米,δ w,T m分别表示后向碰撞危险程度和目标车辆类型,m=1,2,3分别表示目标车辆为大型车辆、中型车辆和小型车辆,在本发明中,取T m=m。
(2)定义动作决策
为了综合考虑横向运动和纵向运动对后向碰撞的影响,本发明将方向盘转角和节气门开度作为控制量,定义决策模型输出的驾驶策略,即动作决策:
A t=[θ str_out,p thr_out]     (4)
式(4)中,A t为t时刻的动作决策,θ str_out表示归一化后的方向盘转角控制量,范围为[-1,1],p thr_out表示归一化后的节气门开度控制量,范围为[0,1]。当p thr_out=0时,表示车辆未进行加速,当δ brake=1时,表示车辆以最大加速度进行加速。
(3)建立奖励函数
为了评价动作决策的优劣程度,通过建立回报函数的方式,将评价具体化和数值化。考虑到后向防撞驾驶决策是一个涉及安全性、舒适性等目标在内的多目标优化问题,本发明将奖励函数设计为:
r t=r 1+r 2+r 3     (5)
式(5)中,r t为t时刻的奖励函数,r 1为安全距离奖励函数,r 2为舒适性奖励函数,r 3为惩罚函数。
首先,设计安全距离奖励函数r 1
Figure PCTCN2021086570-appb-000003
式(6)中,L r,L s分别表示相对车间距和安全距离阈值,ω d为安全距离权重系数,在本发明中,取ω d=0.85。
其次,为了保证车辆的驾驶舒适性,应尽可能地避免出现过大的冲击度,设计舒适性奖励函数r 2
r 2=ω j|a F_lon(t+1)-a F_lon(t)|    (7)
式(7)中,ω j为舒适性权重系数,在本发明中,取ω j=0.95。
最后,设计惩罚函数r 3
Figure PCTCN2021086570-appb-000004
(4)设计期望最大的策略
Figure PCTCN2021086570-appb-000005
式(9)中,π *为期望最大的策略,π为后向防撞决策策略,γ为折 扣因子,且γ∈(0,1),τ(π)表示在策略 π下的轨迹分布。
子步骤2:设计后向防撞驾驶决策模型的网络架构
利用“Actor-Critic”网络框架搭建后向防撞驾驶决策网络,包括Actor网络和Critic网络两部分。其中,Actor网络将状态空间信息作为输入,输出动作决策,即重型营运车辆的节气门开度和方向盘转角控制量。Critic网络将状态空间信息和动作决策作为输入,输出当前“状态-动作”的价值。具体地:
(1)设计Actor网络
建立分层级编码器结构,分别对状态空间中的各类信息进行特征提取。首先,构建3个串行连接的卷积层(C F1,C F2,C F3)和1个最大池化层(P 1),对车辆的运动状态信息(纵向速度、纵向加速度、方向盘转角、节气门开度)进行特征提取,并将其编码为中间特征向量h 1;利用相同的结构,即3个串行连接的卷积层(C R1,C R2,C R3)和1个最大池化层(P 2),对前后两车的相对运动状态信息(相对纵向速度、相对纵向加速度、相对车间距)进行特征提取,并将其编码为中间特征向量h 2;利用卷积层C W1和最大池化层P 3对碰撞危险程度和目标车辆类型进行特征提取,并将其编码为中间特征向量h 3。其次,将特征h 1,h 2,h 3结合并连接全连接层FC 4和FC 5,输出动作决策。
其中,设置卷积层C F1,C F2,C F3,C R1,C R2,C R3,C W1的神经元数量分别为20,20,10,20,20,10,20,设置全连接层FC 4,FC 5的神经元数量为200。各卷积层和全连接层的激活函数均为线性整流单元(Rectified Linear Unit,ReLU),其表达式为f(x)=max(0,x)。
(2)设计Critic网络
利用多个隐藏层结构的神经网络建立Critic网络。首先,将状态空间S t输入到隐藏层FC C1中;同时,将动作决策A t输入到隐藏层FC C2中。其次,隐藏层FC C1和FC C2通过张量相加的方式进行合并。最后,依次通过全连接层FC C3和FC C4后,输出Critic网络的值。
其中,设置FC C1层和FC C2层的神经元数量为400,其余隐藏层的神经元数量均为200,各层的激活函数均为ReLU。
子步骤3:训练后向防撞驾驶决策模型
利用损失函数J actor和J critic对网络参数进行梯度更新,具体训练过程如下:
子步骤3.1:初始化Actor网络和Critic网络;
子步骤3.2:进行迭代求解,每一次迭代包括子步骤3.21至子步骤3.4,具体地:
子步骤3.21:进行迭代求解,每一次迭代包括子步骤3.211至3.213,具体地:
子步骤3.211:利用步骤一的虚拟交通环境模型得到车辆的运动控制操作;
子步骤3.212:利用Actor网络得到样本数据(S t,A t,r t);
子步骤3.213:结束循环,得到样本点集[(S 1,A 1,r 1),(S 2,A 2,r 2),...,(S t,A t,r t)];
子步骤3.22:计算优势函数;
Figure PCTCN2021086570-appb-000006
式(10)中,
Figure PCTCN2021086570-appb-000007
为优势函数,V(S t)表示状态S t的值函数;
Figure PCTCN2021086570-appb-000008
表示应该增加采取当前动作的可能性,
Figure PCTCN2021086570-appb-000009
表示应该减小采取该动作的可能性。
子步骤3.23:进行迭代求解,每一次迭代包括子步骤3.231至3.233,具体地:
子步骤3.231:计算Actor网络的目标函数;
子步骤3.232:更新Actor网络参数J actor
Figure PCTCN2021086570-appb-000010
式(11)中,p t(θ)表示在策略更新过程中新策略π θ与旧策略π θ_old在动作决策分布上的比值,且
Figure PCTCN2021086570-appb-000011
clip(·)表示裁剪函数,ε为常数,在本发明中,取ε=0.25。
子步骤3.233:更新Critic网络参数J critic
Figure PCTCN2021086570-appb-000012
子步骤3.234:结束循环。
子步骤3.3:按照子步骤3.2提供的方法进行迭代更新,使Actor网络和Critic网络逐步收敛。在训练过程中,若车辆发生后向碰撞或侧翻,则终止当前回合并开始新的回合进行训练。当迭代达到最大步数或模型能够稳定准确地进行后向防撞驾驶决策时,训练结束。
子步骤4:利用后向防撞决策模型输出决策策略
将厘米级高精度差分GPS、惯性测量单元、毫米波雷达、CAN总线获取的信息输入到已训练的后向防撞驾驶决策模型中,可以定量输 出合理的方向盘转角和节气门开度控制量,为驾驶员提供有效、可靠的后向防撞驾驶建议,从而实现了有效、可靠、具有自适应性的重型营运车辆后向防撞驾驶决策。
有益效果
与现有技术相比,本发明的技术方案具有以下有益技术效果,具体体现在:
(1)本发明提出的方法实现了重型营运车辆的后向防撞驾驶决策,可以为驾驶员提供有效、可靠的后向防撞驾驶决策策略;
(2)本发明提出的方法综合考虑交通环境、车辆运行状态、后方车辆类型、后向碰撞危险程度对后向碰撞的影响,以数值的形式将方向盘转角、节气门开度等驾驶策略精确量化,输出的驾驶策略能够根据交通环境和驾驶员操作自适应调整,提高了重型营运车辆后向防撞驾驶决策的有效性、可靠性和环境适应性;
(3)本发明提出的方法无需考虑复杂的车辆动力学方程和车身参数,计算方法简单清晰。
附图说明
图1是本发明的技术路线示意图;
图2是本发明建立的后向防撞驾驶决策模型的网络架构示意图。
具体实施方式
下面结合附图对本发明的技术方案作进一步的说明。
为了建立有效、可靠、自适应交通环境的后向防撞决策策略,实现重型营运车辆的后向防撞驾驶决策,以填补实际应用中重型营运车辆后向防撞驾驶决策技术的空白。本发明针对重型营运车辆,如半挂罐车、半挂列车,提出了一种基于深度强化学习的后向防撞驾驶决策方法。首先,建立虚拟交通环境模型,采集重型营运车辆及其后方车辆的运动状态信息。其次,建立基于后向距离碰撞时间的后向碰撞危险评估模型,将后向碰撞危险精确量化。最后,将后向防撞驾驶决策问题描述为一定奖励函数下的马尔科夫决策过程,建立基于深度强化学习的后向防撞驾驶决策模型,得到有效、可靠、具有自适应性的后向防撞驾驶决策策略。本发明的技术路线如图1所示,具体步骤如下:
步骤一:建立虚拟交通环境模型
为了降低因后向碰撞造成的交通事故发生频率,提高重型营运车辆的安全性,本发明提出了一种后向防撞驾驶决策方法,其适用的场景为:在重型营运车辆行驶过程中,车辆前方无障碍物等干扰因素,为了防止与后方车辆发生后向碰撞,应有效、及时地为驾驶员提供加速、转向等决策策略,以避免碰撞事故的发生。
在实际道路试验过程中,重型营运车辆的防撞试验具有较高的试验成本和危险性。为了降低试验成本和风险,同时兼顾试验效率,本发明面向高等级公路,建立虚拟交通环境模型,即包含直道和弯道的三车道虚拟环境模型。重型营运车辆在交通环境模型中运动,目标车辆(包含小型、中型和大型车辆3种类型)跟随车辆进行运动,期间包括加速、减速、匀速、变道4种不同的行驶工况。
通过安装在每辆车上的厘米级高精度差分GPS、惯性测量单元和毫米波雷达,可以实时地获取运动状态信息,包括:两车的位置、速度、加速度、相对间距、相对速度信息;通过安装在车辆尾部的视觉传感器,可以实时地获取目标车辆的类型;通过CAN总线可以读取驾驶员操控信息,包括:车辆的节气门开度、方向盘转角。
在本发明中,目标车辆是指位于重型营运车辆行驶道路后方,且位于同一车道线内、行驶方向相同、距离最近的车辆。
步骤二:建立后向碰撞危险评估模型
为了合理、有效地输出后向防撞决策策略,需准确、实时地评估重型营运车辆的后向碰撞危险程度。首先,计算重型营运车辆与目标车辆发生碰撞所需的时间:
Figure PCTCN2021086570-appb-000013
式(1)中,RTTC(t)表示t时刻的后向距离碰撞时间,单位为秒,x c(t)为车间距离,单位为米,v F(t),v R(t)分别表示重型营运车辆和目标车辆的速度,v r(t)为两车的相对速度,单位均为米每秒,且v r(t)=v F(t)-v R(t)。
其次,计算后向碰撞危险程度。根据国家标准《营运车辆后向碰撞预警系统性能要求和测试规程》,当后向距离碰撞时间不小于2.1秒,且不大于4.4秒时发出后向碰撞报警,表示后向碰撞预警系统测试通过。基于此,对后向碰撞危险程度进行量化:
Figure PCTCN2021086570-appb-000014
式(2)中,δ w为后向碰撞危险的量化值。当δ w≥1时,表示无后向 碰撞危险;当0.5≤δ w≤1时,表示存在后向碰撞危险;当0≤δ w≤0.5时,表示后向碰撞危险程度非常高。
步骤三:建立重型营运车辆的后向防撞驾驶决策模型
为了实现有效、可靠、自适应交通环境的后向防撞驾驶决策,本发明综合考虑交通环境、车辆运行状态、后方车辆类型、后向碰撞危险程度对后向碰撞的影响,建立重型营运车辆的后向防撞驾驶决策模型。
常见的驾驶决策方法包括基于规则和基于数据学习的决策算法两类。(1)基于规则的决策算法,是利用有限的有向连通图描述不同的驾驶状态以及状态之间的转移关系,从而根据驾驶状态的迁移生成驾驶动作。然而,在车辆运动过程中,车辆运动参数、道路条件和后方交通状态均存在不确定性,制定的规则难以遍历所有场景,难以保证决策的有效性和适应性。(2)基于数据学习的决策算法,是利用算法模仿人类对知识或技能的学习过程,通过交互式的自学习机制实现自身学习性能的不断改进。其中,基于深度强化学习的方法将深度学习的感知能力和强化学习的决策能力相结合,对不确定性问题的适应性特点,满足防撞决策对于交通环境、行驶工况具有适应性的需求。因此,本发明采用深度强化学习算法建立后向防撞驾驶决策模型。
基于深度强化学习的决策方法主要包括:基于值函数、基于策略搜索和基于Actor-Critic架构的决策方法三类。其中,基于值的深度强化学习算法不能处理连续输出的问题,无法满足防撞决策中连续输出驾驶策略的需求。基于策略搜索的方法,存在着对步长敏感、步 长选择较为困难等不足。基于Actor-Critic架构的决策方法结合了值函数估计和策略搜索,具有较快的更新速度。其中的近端策略优化(Proximal Policy Optimization,PPO)解决了参数更新慢和步长难以确定的问题,在输出连续动作空间方面取得了较好的效果。因此,本发明采用PPO算法建立后向防撞驾驶决策模型,通过与目标车辆运动随机过程模型进行交互式迭代学习,得到后向防撞的最优决策。具体包括以下4个子步骤:
子步骤1:定义后向防撞驾驶决策模型的基础参数
首先,将后向防撞驾驶决策问题描述为一定奖励函数下的马尔科夫决策过程(S,A,P,r),其中,S为状态空间,A为后向防撞动作决策,P表示由于目标车辆运动不确定性而导致的状态转移概率,r为奖励函数。其次,对马尔科夫决策过程的基础参数进行定义,具体地:
(1)定义状态空间
利用步骤一输出的车辆运动状态信息和步骤二输出的后向碰撞危险程度,建立状态空间表达:
S t=(v F_lon,a F_lon,v r_lon,a r_lonstr,p thr,L rw,T m)      (3)
式(3)中,S t为t时刻的状态空间,v F_lon,v r_lon分别表示车辆的纵向速度和两车的相对纵向速度,单位为米每秒,a F_lon,a r_lon分别表示车辆的纵向加速度和两车的相对纵向加速度,单位为米每二次方秒,θ str为车辆的方向盘转角,单位为度,p thr为节气门开度,单位为百分数,L r为相对车间距,单位为米,δ w,T m分别表示后向碰撞危险程度和目标车辆类型,m=1,2,3分别表示目标车辆为大型车辆、中型车辆和小型车 辆,在本发明中,取T m=m。
(2)定义动作决策
为了综合考虑横向运动和纵向运动对后向碰撞的影响,本发明将方向盘转角和节气门开度作为控制量,定义决策模型输出的驾驶策略,即动作决策:
A t=[θ str_out,p thr_out]    (4)
式(4)中,A t为t时刻的动作决策,θ str_out表示归一化后的方向盘转角控制量,范围为[-1,1],p thr_out表示归一化后的节气门开度控制量,范围为[0,1]。当p thr_out=0时,表示车辆未进行加速,当δ brake=1时,表示车辆以最大加速度进行加速。
(3)建立奖励函数
为了评价动作决策的优劣程度,通过建立回报函数的方式,将评价具体化和数值化。考虑到后向防撞驾驶决策是一个涉及安全性、舒适性等目标在内的多目标优化问题,本发明将奖励函数设计为:
r t=r 1+r 2+r 3     (5)
式(5)中,r t为t时刻的奖励函数,r 1为安全距离奖励函数,r 2为舒适性奖励函数,r 3为惩罚函数。
首先,设计安全距离奖励函数r 1
Figure PCTCN2021086570-appb-000015
式(6)中,L r,L s分别表示相对车间距和安全距离阈值,ω d为安全距离权重系数,在本发明中,取ω d=0.85。
其次,为了保证车辆的驾驶舒适性,应尽可能地避免出现过大的 冲击度,设计舒适性奖励函数r 2
r 2=ω j|a F_lon(t+1)-a F_lon(t)|       (7)
式(7)中,ω j为舒适性权重系数,在本发明中,取ω j=0.95。
最后,设计惩罚函数r 3
Figure PCTCN2021086570-appb-000016
(4)设计期望最大的策略
Figure PCTCN2021086570-appb-000017
式(9)中,π *为期望最大的策略,π为后向防撞决策策略,γ为折扣因子,且γ∈(0,1),τ(π)表示在策略π下的轨迹分布。
子步骤2:设计后向防撞驾驶决策模型的网络架构
利用“Actor-Critic”网络框架搭建后向防撞驾驶决策网络,包括Actor网络和Critic网络两部分。其中,Actor网络将状态空间信息作为输入,输出动作决策,即重型营运车辆的节气门开度和方向盘转角控制量。Critic网络将状态空间信息和动作决策作为输入,输出当前“状态-动作”的价值。网络架构如图2所示,具体地:
(1)设计Actor网络
建立分层级编码器结构,分别对状态空间中的各类信息进行特征提取。首先,构建3个串行连接的卷积层(C F1,C F2,C F3)和1个最大池化层(P 1),对车辆的运动状态信息(纵向速度、纵向加速度、方向盘转角、节气门开度)进行特征提取,并将其编码为中间特征向量h 1;利用相同的结构,即3个串行连接的卷积层(C R1,C R2,C R3)和1个最大池化 层(P 2),对前后两车的相对运动状态信息(相对纵向速度、相对纵向加速度、相对车间距)进行特征提取,并将其编码为中间特征向量h 2;利用卷积层C W1和最大池化层P 3对碰撞危险程度和目标车辆类型进行特征提取,并将其编码为中间特征向量h 3。其次,将特征h 1,h 2,h 3结合并连接全连接层FC 4和FC 5,输出动作决策。
其中,设置卷积层C F1,C F2,C F3,C R1,C R2,C R3,C W1的神经元数量分别为20,20,10,20,20,10,20,设置全连接层FC 4,FC 5的神经元数量为200。各卷积层和全连接层的激活函数均为线性整流单元(Rectified Linear Unit,ReLU),其表达式为f(x)=max(0,x)。
(2)设计Critic网络
利用多个隐藏层结构的神经网络建立Critic网络。首先,将状态空间S t输入到隐藏层FC C1中;同时,将动作决策A t输入到隐藏层FC C2中。其次,隐藏层FC C1和FC C2通过张量相加的方式进行合并。最后,依次通过全连接层FC C3和FC C4后,输出Critic网络的值。
其中,设置FC C1层和FC C2层的神经元数量为400,其余隐藏层的神经元数量均为200,各层的激活函数均为ReLU。
子步骤3:训练后向防撞驾驶决策模型
利用损失函数J actor和J critic对网络参数进行梯度更新,具体训练过程如下:
子步骤3.1:初始化Actor网络和Critic网络;
子步骤3.2:进行迭代求解,每一次迭代包括子步骤3.21至子步骤3.4,具体地:
子步骤3.21:进行迭代求解,每一次迭代包括子步骤3.211至3.213,具体地:
子步骤3.211:利用步骤一的虚拟交通环境模型得到车辆的运动控制操作;
子步骤3.212:利用Actor网络得到样本数据(S t,A t,r t);
子步骤3.213:结束循环,得到样本点集[(S 1,A 1,r 1),(S 2,A 2,r 2),...,(S t,A t,r t)];
子步骤3.22:计算优势函数;
Figure PCTCN2021086570-appb-000018
式(10)中,
Figure PCTCN2021086570-appb-000019
为t时刻的优势函数,V(S t)表示状态S t的值函数;
Figure PCTCN2021086570-appb-000020
表示应该增加采取当前动作的可能性,
Figure PCTCN2021086570-appb-000021
表示应该减小采取该动作的可能性。
子步骤3.23:进行迭代求解,每一次迭代包括子步骤3.231至3.233,具体地:
子步骤3.231:计算Actor网络的目标函数;
子步骤3.232:更新Actor网络参数J actor
Figure PCTCN2021086570-appb-000022
式(11)中,p t(θ)表示在策略更新过程中新策略π θ与旧策略π θ_old在动作决策分布上的比值,且
Figure PCTCN2021086570-appb-000023
clip(·)表示裁剪函数,ε为常数,在本发明中,取ε=0.25。
子步骤3.233:更新Critic网络参数J critic
Figure PCTCN2021086570-appb-000024
子步骤3.234:结束循环。
子步骤3.3:按照子步骤3.2提供的方法进行迭代更新,使Actor网络和Critic网络逐步收敛。在训练过程中,若车辆发生后向碰撞或侧翻,则终止当前回合并开始新的回合进行训练。当迭代达到最大步数或模型能够稳定准确地进行后向防撞驾驶决策时,训练结束。
子步骤4:利用后向防撞决策模型输出决策策略
将厘米级高精度差分GPS、惯性测量单元、毫米波雷达和CAN总线获取的信息输入到已训练的后向防撞驾驶决策模型中,可以定量输出合理的方向盘转角和节气门开度控制量,为驾驶员提供有效、可靠的后向防撞驾驶建议,从而实现了有效、可靠、具有自适应性的重型营运车辆后向防撞驾驶决策。

Claims (1)

  1. 一种重型营运车辆的后向防撞驾驶决策方法,其特征在于:该方法包括如下步骤:
    步骤一:建立虚拟交通环境模型:面向高等级公路,建立虚拟交通环境模型,即包含直道和弯道的三车道虚拟环境模型;重型营运车辆在交通环境模型中运动,目标车辆跟随重型营运车辆进行运动,期间包括加速、减速、匀速、变道4种不同的行驶工况;
    所述建立虚拟交通环境模型的过程中通过安装在每辆车上的厘米级高精度差分GPS、惯性测量单元和毫米波雷达,实时地获取车辆运动状态信息,包括:两车的位置、速度、加速度、相对间距、相对速度;通过安装在车辆尾部的视觉传感器,实时地获取目标车辆的类型;通过CAN总线读取驾驶员操控信息,包括:车辆的节气门开度、方向盘转角;
    所述目标车辆是指位于重型营运车辆行驶道路后方,且位于同一车道线内、行驶方向相同、距离最近的车辆,包含小型、中型和大型车辆3种类型;
    步骤二:建立后向碰撞危险评估模型;具体包括:
    首先,计算重型营运车辆与目标车辆发生碰撞所需的时间:
    Figure PCTCN2021086570-appb-100001
    式(1)中,RTTC(t)表示t时刻的后向距离碰撞时间,单位为秒,x c(t)为车间距离,单位为米,v F(t),v R(t)分别表示重型营运车辆和目标车辆的速度,v r(t)为两车的相对速度,单位均为米每秒,且 v r(t)=v F(t)-v R(t);
    其次,计算后向碰撞危险程度;当后向距离碰撞时间不小于2.1秒,且不大于4.4秒时发出后向碰撞报警,表示后向碰撞预警系统测试通过;基于此,对后向碰撞危险程度进行量化:
    Figure PCTCN2021086570-appb-100002
    式(2)中,δ w为后向碰撞危险的量化值;当δ w≥1时,表示无后向碰撞危险;当0.5≤δ w≤1时,表示存在后向碰撞危险;当0≤δ w≤0.5时,表示后向碰撞危险程度非常高;
    步骤三:建立重型营运车辆的后向防撞驾驶决策模型:综合考虑交通环境、车辆运行状态、后方车辆类型、后向碰撞危险程度对后向碰撞的影响,建立重型营运车辆的后向防撞驾驶决策模型,采用PPO算法建立后向防撞驾驶决策模型,通过与目标车辆运动随机过程模型进行交互式迭代学习,得到后向防撞的最优决策,具体包括以下4个子步骤:
    子步骤1:定义后向防撞驾驶决策模型的基础参数
    首先,将后向防撞驾驶决策问题描述为一定奖励函数下的马尔科夫决策过程(S,A,P,r),其中,S为状态空间,A为后向防撞动作决策,P表示由于目标车辆运动不确定性而导致的状态转移概率,r为奖励函数;其次,对马尔科夫决策过程的基础参数进行定义,具体地:
    (1)定义状态空间
    利用步骤一输出的车辆运动状态信息和步骤二输出的后向碰撞危险程度,建立状态空间表达:
    S t=(v F_lon,a F_lon,v r_lon,a r_lonstr,p thr,L rw,T m)  (3)
    式(3)中,S t为t时刻的状态空间,v F_lon,v r_lon分别表示重型营运车辆的纵向速度和两车的相对纵向速度,单位为米每秒,a F_lon,a r_lon分别表示重型营运车辆的纵向加速度和两车的相对纵向加速度,单位为米每二次方秒,θ str为车辆的方向盘转角,单位为度,p thr为节气门开度,单位为百分数,L r为相对车间距,单位为米,δ w,T m分别表示后向碰撞危险程度和目标车辆类型,m=1,2,3分别表示目标车辆为大型车辆、中型车辆和小型车辆,在本发明中,取T m=m;
    (2)定义动作决策
    为了综合考虑横向运动和纵向运动对后向碰撞的影响,本发明将方向盘转角和节气门开度作为控制量,定义决策模型输出的驾驶策略,即动作决策:
    A t=[θ str_out,p thr_out]  (4)
    式(4)中,A t为t时刻的动作决策,θ str_out表示归一化后的方向盘转角控制量,范围为[-1,1],p thr_out表示归一化后的节气门开度控制量,范围为[0,1];当p thr_out=0时,表示车辆未进行加速,当δ brake=1时,表示车辆以最大加速度进行加速;
    (3)建立奖励函数
    为了评价动作决策的优劣程度,通过建立回报函数的方式,将评价具体化和数值化;考虑到后向防撞驾驶决策是一个涉及安全性、舒适性等目标在内的多目标优化问题,本发明将奖励函数设计为:
    r t=r 1+r 2+r 3  (5)
    式(5)中,r t为t时刻的奖励函数,r 1为安全距离奖励函数,r 2为舒适性奖励函数,r 3为惩罚函数;
    首先,设计安全距离奖励函数r 1
    Figure PCTCN2021086570-appb-100003
    式(6)中,L r,L s分别表示相对车间距和安全距离阈值,ω d为安全距离权重系数,在本发明中,取ω d=0.85;
    其次,设计舒适性奖励函数r 2
    r 2=ω j|a F_lon(t+1)-a F_lon(t)|  (7)
    式(7)中,ω j为舒适性权重系数,在本发明中,取ω j=0.95;
    最后,设计惩罚函数r 3
    Figure PCTCN2021086570-appb-100004
    (4)设计期望最大的策略
    Figure PCTCN2021086570-appb-100005
    式(9)中,π *为期望最大的策略,π为后向防撞决策策略,γ为折扣因子,且γ∈(0,1),τ(π)表示在策略π下的轨迹分布;
    子步骤2:设计后向防撞驾驶决策模型的网络架构
    利用“Actor-Critic”网络框架搭建后向防撞驾驶决策网络,包括Actor网络和Critic网络两部分;其中,Actor网络将状态空间信息作为输入,输出动作决策,即重型营运车辆的节气门开度和方向盘转角控制量;Critic网络将状态空间信息和动作决策作为输入, 输出当前“状态-动作”的价值;具体地:
    (1)设计Actor网络
    建立分层级编码器结构,分别对状态空间中的各类信息进行特征提取;首先,构建3个串行连接的卷积层(C F1,C F2,C F3)和1个最大池化层(P 1),对车辆的运动状态信息(纵向速度、纵向加速度、方向盘转角、节气门开度)进行特征提取,并将其编码为中间特征向量h 1;利用相同的结构,即3个串行连接的卷积层(C R1,C R2,C R3)和1个最大池化层(P 2),对前后两车的相对运动状态信息(相对纵向速度、相对纵向加速度、相对车间距)进行特征提取,并将其编码为中间特征向量h 2;利用卷积层C W1和最大池化层P 3对碰撞危险程度和目标车辆类型进行特征提取,并将其编码为中间特征向量h 3;其次,将特征h 1,h 2,h 3结合并连接全连接层FC 4和FC 5,输出动作决策;
    其中,设置卷积层C F1,C F2,C F3,C R1,C R2,C R3,C W1的神经元数量分别为20,20,10,20,20,10,20,设置全连接层FC 4,FC 5的神经元数量为200;各卷积层和全连接层的激活函数均为线性整流单元(Rectified Linear Unit,ReLU),其表达式为f(x)=max(0,x);
    (2)设计Critic网络
    利用多个隐藏层结构的神经网络建立Critic网络;首先,将状态空间S t输入到隐藏层FC C1中;同时,将动作决策A t输入到隐藏层FC C2中;其次,隐藏层FC C1和FC C2通过张量相加的方式进行合并;最后,依次通过全连接层FC C3和FC C4后,输出Critic网络的值;
    其中,设置FC C1层和FC C2层的神经元数量为400,其余隐藏层的 神经元数量均为200,各层的激活函数均为ReLU;
    子步骤3:训练后向防撞驾驶决策模型
    利用损失函数J actor和J critic对网络参数进行梯度更新,具体训练过程如下:
    子步骤3.1:初始化Actor网络和Critic网络;
    子步骤3.2:进行迭代求解,每一次迭代包括子步骤3.21至子步骤3.4,具体地:
    子步骤3.21:进行迭代求解,每一次迭代包括子步骤3.211至3.213,具体地:
    子步骤3.211:利用步骤一的虚拟交通环境模型得到车辆的运动控制操作;
    子步骤3.212:利用Actor网络得到样本数据(S t,A t,r t);
    子步骤3.213:结束循环,得到样本点集[(S 1,A 1,r 1),(S 2,A 2,r 2),...,(S t,A t,r t)];
    子步骤3.22:计算优势函数;
    Figure PCTCN2021086570-appb-100006
    式(10)中,
    Figure PCTCN2021086570-appb-100007
    为优势函数,V(S t)表示状态S t的值函数;
    Figure PCTCN2021086570-appb-100008
    表示应该增加采取当前动作的可能性,
    Figure PCTCN2021086570-appb-100009
    表示应该减小采取该动作的可能性;
    子步骤3.23:进行迭代求解,每一次迭代包括子步骤3.231至3.233,具体地:
    子步骤3.231:计算Actor网络的目标函数;
    子步骤3.232:更新Actor网络参数J actor
    Figure PCTCN2021086570-appb-100010
    式(11)中,p t(θ)表示在策略更新过程中新策略π θ与旧策略π θ_old在动作决策分布上的比值,且
    Figure PCTCN2021086570-appb-100011
    clip(·)表示裁剪函数,ε为常数,在本发明中,取ε=0.25;
    子步骤3.233:更新Critic网络参数J critic
    Figure PCTCN2021086570-appb-100012
    子步骤3.234:结束循环;
    子步骤3.3:按照子步骤3.2提供的方法进行迭代更新,使Actor网络和Critic网络逐步收敛;在训练过程中,若车辆发生后向碰撞或侧翻,则终止当前回合并开始新的回合进行训练;当迭代达到最大步数或模型能够稳定准确地进行后向防撞驾驶决策时,训练结束;
    子步骤4:利用后向防撞决策模型输出决策策略
    将厘米级高精度差分GPS、惯性测量单元、毫米波雷达、CAN总线获取的信息输入到已训练的后向防撞驾驶决策模型中,可以定量输出合理的方向盘转角和节气门开度控制量,为驾驶员提供有效、可靠的后向防撞驾驶建议,从而实现了有效、可靠、具有自适应性的重型营运车辆后向防撞驾驶决策。
PCT/CN2021/086570 2020-12-20 2021-04-12 一种重型营运车辆的后向防撞驾驶决策方法 WO2022126940A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/766,870 US11964655B2 (en) 2020-12-20 2021-04-12 Backward anti-collision driving decision-making method for heavy commercial vehicle

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011512719.5 2020-12-20
CN202011512719.5A CN112633474B (zh) 2020-12-20 2020-12-20 一种重型营运车辆的后向防撞驾驶决策方法

Publications (1)

Publication Number Publication Date
WO2022126940A1 true WO2022126940A1 (zh) 2022-06-23

Family

ID=75317728

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/086570 WO2022126940A1 (zh) 2020-12-20 2021-04-12 一种重型营运车辆的后向防撞驾驶决策方法

Country Status (3)

Country Link
US (1) US11964655B2 (zh)
CN (1) CN112633474B (zh)
WO (1) WO2022126940A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273502A (zh) * 2022-07-28 2022-11-01 西安电子科技大学 一种交通信号协同控制方法
CN116823812A (zh) * 2023-08-25 2023-09-29 中国农业大学 一种青贮玉米田间生命检测方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633474B (zh) 2020-12-20 2022-04-05 东南大学 一种重型营运车辆的后向防撞驾驶决策方法
CN113460090B (zh) * 2021-08-18 2023-09-12 清华大学 自动驾驶车辆t型紧急避撞控制方法、系统、介质及设备
CN113954837B (zh) * 2021-11-06 2023-03-14 交通运输部公路科学研究所 一种基于深度学习的大型营运车辆车道变换决策方法
CN114407931B (zh) * 2022-02-21 2024-05-03 东南大学 一种高度类人的自动驾驶营运车辆安全驾驶决策方法
CN114379540B (zh) * 2022-02-21 2024-04-30 东南大学 考虑前方障碍物影响的大型营运车辆防侧翻驾驶决策方法
CN114863708B (zh) * 2022-05-09 2023-04-18 东南大学 一种面向营运车辆的道路合流区路侧实时精准诱导方法
CN115123159A (zh) * 2022-06-27 2022-09-30 重庆邮电大学 一种基于ddpg深度强化学习的aeb控制方法及系统
CN115291616B (zh) * 2022-07-25 2023-05-26 江苏海洋大学 一种基于近端策略优化算法的auv动态避障方法
CN116946162B (zh) * 2023-09-19 2023-12-15 东南大学 考虑路面附着条件的智能网联商用车安全驾驶决策方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108583571A (zh) * 2018-04-28 2018-09-28 深圳市商汤科技有限公司 碰撞控制方法及装置、电子设备和存储介质
CN110027553A (zh) * 2019-04-10 2019-07-19 湖南大学 一种基于深度强化学习的防碰撞控制方法
US20200239029A1 (en) * 2019-01-30 2020-07-30 StradVision, Inc. Learning method and learning device for determining whether to switch mode of vehicle from manual driving mode to autonomous driving mode by performing trajectory-based behavior analysis on recent driving route
CN111696387A (zh) * 2020-05-21 2020-09-22 东南大学 一种基于前向障碍物识别的自适应防撞分级预警方法
CN112633474A (zh) * 2020-12-20 2021-04-09 东南大学 一种重型营运车辆的后向防撞驾驶决策方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6690413B1 (en) * 1999-04-21 2004-02-10 Michael S. Moore Tractor-trailer viewing system
US6150932A (en) * 1999-10-04 2000-11-21 General Motors Corporation Vehicle operator alert process
JP4055656B2 (ja) * 2003-05-30 2008-03-05 トヨタ自動車株式会社 衝突予測装置
US8812226B2 (en) * 2009-01-26 2014-08-19 GM Global Technology Operations LLC Multiobject fusion module for collision preparation system
CN103531042B (zh) * 2013-10-25 2015-08-19 吉林大学 基于驾驶人类型的车辆追尾预警方法
WO2018011953A1 (ja) * 2016-07-14 2018-01-18 ギガフォトン株式会社 光学素子角度調整装置及び極端紫外光生成装置
CN108725440B (zh) * 2018-04-20 2020-11-27 深圳市商汤科技有限公司 前向碰撞控制方法和装置、电子设备、程序和介质
US11242050B2 (en) * 2019-01-31 2022-02-08 Honda Motor Co., Ltd. Reinforcement learning with scene decomposition for navigating complex environments
EP3800521B1 (en) * 2019-10-01 2023-07-26 Elektrobit Automotive GmbH Deep learning based motion control of a vehicle
CN110969848B (zh) * 2019-11-26 2022-06-17 武汉理工大学 一种对向双车道下基于强化学习的自动驾驶超车决策方法
CN111238825B (zh) * 2020-01-10 2021-05-18 东南大学 面向组合试验路面的智能驾驶自动紧急制动性能测试方法
KR20220051070A (ko) * 2020-10-16 2022-04-26 현대자동차주식회사 교차로에서 자율주행차의 좌우회전 시 차량 제어 방법
KR20220081521A (ko) * 2020-12-09 2022-06-16 현대자동차주식회사 도로형상분류 기반 맵 매칭을 통한 고정밀 위치추정 방법 및 자율주행 자동차

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108583571A (zh) * 2018-04-28 2018-09-28 深圳市商汤科技有限公司 碰撞控制方法及装置、电子设备和存储介质
US20200239029A1 (en) * 2019-01-30 2020-07-30 StradVision, Inc. Learning method and learning device for determining whether to switch mode of vehicle from manual driving mode to autonomous driving mode by performing trajectory-based behavior analysis on recent driving route
CN110027553A (zh) * 2019-04-10 2019-07-19 湖南大学 一种基于深度强化学习的防碰撞控制方法
CN111696387A (zh) * 2020-05-21 2020-09-22 东南大学 一种基于前向障碍物识别的自适应防撞分级预警方法
CN112633474A (zh) * 2020-12-20 2021-04-09 东南大学 一种重型营运车辆的后向防撞驾驶决策方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DING NAN, MENG XIANGHUA, XIA WEIGUO, WU DI, XU LI, CHEN BINGCAI: "Multivehicle Coordinated Lane Change Strategy in the Roundabout Under Internet of Vehicles Based on Game Theory and Cognitive Computing", IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, vol. 16, no. 8, 1 August 2020 (2020-08-01), US , pages 5435 - 5443, XP055944455, ISSN: 1551-3203, DOI: 10.1109/TII.2019.2959795 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115273502A (zh) * 2022-07-28 2022-11-01 西安电子科技大学 一种交通信号协同控制方法
CN115273502B (zh) * 2022-07-28 2023-06-30 西安电子科技大学 一种交通信号协同控制方法
CN116823812A (zh) * 2023-08-25 2023-09-29 中国农业大学 一种青贮玉米田间生命检测方法
CN116823812B (zh) * 2023-08-25 2023-10-31 中国农业大学 一种青贮玉米田间生命检测方法

Also Published As

Publication number Publication date
US11964655B2 (en) 2024-04-23
US20230182725A1 (en) 2023-06-15
CN112633474A (zh) 2021-04-09
CN112633474B (zh) 2022-04-05

Similar Documents

Publication Publication Date Title
WO2022126940A1 (zh) 一种重型营运车辆的后向防撞驾驶决策方法
CN112622886B (zh) 一种综合考虑前后障碍物的重型营运车辆防碰撞预警方法
CN112580148B (zh) 基于深度强化学习的重型营运车辆防侧翻驾驶决策方法
CN113753026B (zh) 一种考虑路面附着条件的大型营运车辆防侧翻决策方法
WO2023155231A1 (zh) 一种高度类人的自动驾驶营运车辆安全驾驶决策方法
CN113291308B (zh) 一种考虑驾驶行为特性的车辆自学习换道决策系统及方法
CN106874597A (zh) 一种应用于自动驾驶车辆的高速公路超车行为决策方法
CN113954837B (zh) 一种基于深度学习的大型营运车辆车道变换决策方法
CN114379540B (zh) 考虑前方障碍物影响的大型营运车辆防侧翻驾驶决策方法
Zhang et al. Multi-agent DRL-based lane change with right-of-way collaboration awareness
WO2023231569A1 (zh) 一种基于贝叶斯博弈的自动驾驶车辆换道行为车路协同决策算法
CN108860149A (zh) 一种用于智能车辆时间最短自由变道的运动轨迹设计方法
CN115257789A (zh) 城市低速环境下的营运车辆侧向防撞驾驶决策方法
Zheng et al. Research on control target of truck platoon based on maximizing fuel saving rate
Lai et al. Simulation analysis of automatic emergency braking system under constant steer conditions
Lin et al. Adaptive prediction-based control for an ecological cruise control system on curved and hilly roads
Guo et al. Variable time headway autonomous emergency braking control algorithm based on model predictive control
Suh et al. Stochastic predictive control based motion planning for lane change decision using a vehicle traffic simulator
Zhang et al. Simulation research on driving behaviour of autonomous vehicles on expressway ramp under the background of vehicle-road coordination
US11794780B2 (en) Reward function for vehicles
Ma et al. Legal Decision-making for Highway Automated Driving
Zhao et al. Adaptive Drift Control of Autonomous Electric Vehicles After Brake System Failures
Zhan et al. Risk-aware lane-change trajectory planning with rollover prevention for autonomous light trucks on curved roads
Guo et al. Intelligent Vehicle Path Planning Considering Side Slip of Surrounding Vehicles in Icy and Snowy Environment
CN114312830B (zh) 一种考虑危险驾驶工况的智能车耦合决策模型及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904879

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904879

Country of ref document: EP

Kind code of ref document: A1