CN115257789A - Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment - Google Patents

Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment Download PDF

Info

Publication number
CN115257789A
CN115257789A CN202211070522.XA CN202211070522A CN115257789A CN 115257789 A CN115257789 A CN 115257789A CN 202211070522 A CN202211070522 A CN 202211070522A CN 115257789 A CN115257789 A CN 115257789A
Authority
CN
China
Prior art keywords
driving
collision
substep
strategy
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211070522.XA
Other languages
Chinese (zh)
Inventor
李旭
胡玮明
胡锦超
胡悦
孔栋
徐启敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202211070522.XA priority Critical patent/CN115257789A/en
Publication of CN115257789A publication Critical patent/CN115257789A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • B60W2050/0029Mathematical model of the driver
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2300/00Indexing codes relating to the type of vehicle
    • B60W2300/12Trucks; Load vehicles

Abstract

The invention discloses a decision-making method for side anti-collision driving of an operating vehicle in an urban low-speed environment. Firstly, an urban traffic scene is constructed by utilizing a hardware-in-the-loop driving simulation platform, and safe driving behaviors under different driving conditions and driving conditions are simulated and collected. Secondly, the safe driving behavior of the human driver is simulated by a data set aggregation algorithm in a mode of simulating learning. And finally, further learning a lateral anti-collision strategy by using a near-end strategy optimization algorithm in an unsupervised learning mode, and realizing high-level decision output of lateral anti-collision driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and traffic participant types on the driving safety, provides a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and realizes the decision of lateral anti-collision driving of the commercial vehicles in the low-speed environment of cities.

Description

Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment
Technical Field
The invention relates to a decision-making method for driving of commercial vehicles, in particular to a decision-making method for lateral anti-collision driving of the commercial vehicles in an urban low-speed environment, and belongs to the technical field of automobile safety.
Background
In commercial vehicle traffic accidents in urban environments, the percentage of accidents caused by vision blind areas is highest. The reason is that under the influence of a plurality of factors such as the length of the vehicle body of a commercial vehicle, the high driving position, the large difference between the inner wheel and the outer wheel, the large right turning radius and the like, when the vehicle turns, particularly turns to the right, a dynamic vision blind area in a crescent shape is formed, and pedestrians and non-motor vehicles in the vision blind area are easy to collide and even roll. Therefore, the right side of the commercial vehicle is one of the most dangerous areas in all blind vision areas, and is the main area where serious safety accidents such as side collision, rolling and the like occur. Under the urban traffic environment with more types and dense quantity of traffic participants, especially when vehicles run at low speed (starting, turning to the right and the like), how to avoid the lateral collision caused by the visual blind areas of operating vehicles becomes a core problem for ensuring the safety of road traffic and transportation.
If the driver can be warned before collision and rolling accidents occur and the driver is reminded to take operations such as speed reduction and steering, the frequency of traffic accidents caused by visual blind areas can be greatly reduced or the damage caused by the traffic accidents can be reduced. Therefore, an effective and reliable operation vehicle lateral anti-collision driving decision method is researched for an urban low-speed environment with mixed operation of machines and non-machines, and the method plays an important role in reducing the frequency of vehicle lateral collision and improving the road traffic safety.
Generally, although the existing method can play a certain early warning role, certain defects still exist in the aspects of effectiveness and reliability of lateral collision prevention, and the research of lateral collision prevention driving strategies for providing specific driving suggestions such as driving speed, steering and the like is not involved, particularly the research of an effective and reliable decision-making method for lateral collision prevention driving of a commercial vehicle in a low-speed environment of a city is lacked.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a lateral anti-collision driving decision method of an operating vehicle in a low-speed city environment, aiming at large operating vehicles such as large buses, trucks and urban logistics vehicles, and aiming at realizing a lateral anti-collision driving decision of the operating vehicle in the low-speed city environment and ensuring the running safety of the vehicle. The method can simulate safe driving behaviors of human drivers, considers the influence of factors such as driving conditions, visual blind areas and traffic participant types on driving safety, can provide a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and further guarantees the running safety of the commercial vehicles. Meanwhile, the method does not need to consider complex vehicle dynamics equations and vehicle body parameters, the calculation method is simple and clear, the lateral collision avoidance decision strategy of the large-scale commercial vehicle can be output in real time, the cost of the used sensor is low, and the method is convenient for large-scale popularization.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a decision-making method for side anti-collision driving of commercial vehicles in an urban low-speed environment. Firstly, an urban traffic scene is constructed by utilizing a hardware-in-the-loop driving simulation platform, and safe driving behaviors under different driving conditions and driving conditions are simulated and collected. Secondly, the safe driving behavior of the human driver is simulated by a data set aggregation algorithm in a mode of simulating learning. And finally, further learning a lateral anti-collision strategy by using a near-end strategy optimization algorithm in an unsupervised learning mode, and realizing high-level decision output of lateral anti-collision driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and traffic participant types on the driving safety, provides a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and realizes the lateral anti-collision driving decision of the commercial vehicles in the low-speed environment of cities. The method specifically comprises the following three steps:
the method comprises the following steps: urban traffic scene construction by using driving simulation platform
In order to reduce the frequency of the occurrence of side collision accidents of the commercial vehicles caused by factors such as visual blind areas and the like and improve the driving safety of the commercial vehicles, the invention provides a decision-making method for side collision avoidance of the commercial vehicles in an urban low-speed environment, which is applicable to the following scenes: the commercial vehicle runs in a low-speed city environment, other traffic participants (motor vehicles, non-motor vehicles or pedestrians) exist on the left side or the right side of the commercial vehicle, and an effective and reliable lateral anti-collision driving strategy is provided for a driver in order to avoid a lateral collision accident.
According to the scene, firstly, an urban traffic scene is constructed by using a driving simulation platform, and a traffic flow and traffic participants with high randomness are set to cover a straight road, a curve and an intersection. Secondly, a plurality of drivers control the operating vehicle by using a driving simulator (a steering wheel, an accelerator and a brake pedal), and collect safe driving behaviors under 8 driving conditions of lane change, lane keeping, vehicle following, left steering, right steering, acceleration, deceleration, uniform speed and the like. And finally, constructing a safe driving behavior database D based on the collected safe driving behaviors.
Step two: simulation of safe driving behavior of driver by using imitation learning method
Data set Aggregation (DAgger) is a more advanced behavior cloning method, strategies can be actively selected from a safe driving behavior database, the safe driving behavior of a human driver can be matched easily in the subsequent training process, and the simulation learning capability is higher. Therefore, the invention utilizes the DAgger algorithm to simulate the safe driving behavior of a human driver. The safe driving behavior database D will continuously aggregate new data sets D at each time step i i The specific training process is as follows:
substep 1: initializing a parameter phi;
and substep 2: initializing a strategy pi;
substep 3: performing a loop of N time steps, each loop comprising sub-steps 3.1 to 3.5, in particular:
substep 3.1: the strategy is updated using the following equation:
Figure BDA0003829934830000031
in the formula, pi i Strategy for indicating the ith time, pi * Express expert strategy, beta i Representing the parameters for soft update of the policy at the ith iteration,
Figure BDA0003829934830000032
representing the optimal strategy at the ith moment;
substep 3.2: using a pi i Sampling the expert track;
substep 3.3: output by pi i Accessed data set D composed of policies and actions given by experts i ={(S t* (S t ))}, S t Representing the state space at time t;
substep 3.4: aggregating the data sets: d ← D & ÷ D & - i
Substep 3.5: training a strategy on a data set D
Figure BDA0003829934830000033
Wherein
Figure BDA0003829934830000034
The optimal strategy of the i +1 moment is represented;
substep 4: finally, returning to the optimal strategy at the moment of N +1
Figure BDA0003829934830000035
Step three: further learning collision avoidance strategies using unsupervised learning methods
In an actual driving decision task, it is difficult to effectively and accurately process driving conditions and driving conditions not involved in a safe driving behavior database by the lack of sufficient generalization ability of driving decisions based on the imitation learning. In order to further improve the effectiveness and reliability of the lateral collision avoidance decision, a decision network needs to be further trained. Deep reinforcement learning is used as an unsupervised learning method, understanding of traffic environment can be obtained through continuous exploration and trial and error, and improvement of a strategy network is guided by reward of environment feedback, so that maximum return is obtained. The near-end strategy optimization algorithm uses a trust domain strategy optimization algorithm for reference, and a new balance is obtained among sampling efficiency, algorithm performance and complexity of realization and debugging by using first-order optimization. Therefore, the method utilizes the near-end strategy optimization algorithm to construct the anti-collision decision model, and trains the anti-collision decision model on the basis of the step two.
First, the lateral collision avoidance decision problem of the commercial vehicle is converted into a markov decision process under a certain reward function, which can be described as (S, a, P, R). Where S is a state space, A is a driving action, P represents a probability of state transition due to uncertainty in the motion of the target vehicle, and R is a reward function. Secondly, defining basic parameters of a Markov decision process, specifically:
(1) Establishing a state space
Firstly, a state space is constructed by utilizing motion state information of the own vehicle and relative motion state information of the own vehicle and surrounding traffic participants:
Figure BDA0003829934830000041
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the bicycle, and the unit is meter per second squared and theta s Represents the heading angle of the own vehicle and has the unit of degree.
Figure BDA0003829934830000042
Respectively, the relative distance and the relative speed of the own vehicle and the jth surrounding traffic participant are respectively expressed in meters and meters per second. Wherein j =1,2,3,4,5,6, in minutesRespectively represent a traffic participant in front, a traffic participant in front of the left, a traffic participant in the rear of the right, and a traffic participant in the front of the right. Considering that the number of traffic participants in an actual traffic scene is not fixed, when a sensor observes i (i is less than j) traffic participants, the last j-i rows of the state space are filled with zeros.
(2) Establishing an action space
To output advanced driving decisions, the present invention defines the motion space as discrete lateral and longitudinal motions.
A t =[a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ] (3)
In the formula, A t Represents the motion space at time t, a 1 ,a 2 ,a 3 Respectively representing left turn, right turn and straight movement, a 4 ,a 5 ,a 6 Respectively, acceleration, deceleration and holding speed constant.
(3) Establishing a reward function
In order to quantitatively evaluate the advantages and disadvantages of an anti-collision strategy, the invention establishes an anti-collision reward function considering the influence of traffic participant types on driving safety:
Figure BDA0003829934830000043
in the formula, R t The reward function, x, representing time t min_1 ,x min_2 Representing the lateral safety distance threshold in meters, in the present invention, x min_1 =2,x min_2 =2.5。
Furthermore, negative feedback is applied to the decision making the side impact, i.e. when the output decision strategy results in a side impact, the reward value obtained at the current moment is subtracted by 50.
Secondly, training the constructed collision avoidance decision model, and specifically comprising the following substeps:
substep 1: initializing a policy parameter θ 0 Sum function parameterNumber phi 0
Substep 2: a loop of T time steps is performed, each loop comprising sub-step 2.1 to sub-step 2.5, in particular:
substep 2.1: running a policy in the Environment k =π(θ k );
Substep 2.2: calculating the optimal prize value at time t
Figure BDA0003829934830000051
Substep 2.3: based on current value function
Figure BDA0003829934830000052
Calculating an estimate of a merit function
Figure BDA0003829934830000053
Substep 2.4: policy updates are made using the following equation:
Figure BDA0003829934830000054
in the formula, theta k+1 Representing the policy network parameter at time k +1, epsilon representing the hyper-parameter, pi θ A policy network with a parameter theta, clip (-) representing a truncation function, may be
Figure BDA0003829934830000055
Truncation at [ 1-epsilon, 1+ epsilon]Where τ denotes a hyper-parameter that determines the magnitude of the soft update, argmax (·) denotes a variable that maximizes the objective function,
Figure BDA0003829934830000056
representing the dominance value of a state-action pair.
Substep 2.5: the value function update is performed using:
Figure BDA0003829934830000057
in the formula, phi k+1 Parameter of value function, V, representing the time k +1 φ (S t ) Represents a state space S t The following value function.
And finally, after the anti-collision decision model is trained, the motion state information of the vehicle and the relative motion state information of the vehicle and surrounding traffic participants are input into the anti-collision decision model, driving suggestions such as acceleration, deceleration, lane change and the like can be output, and effective and reliable lateral anti-collision driving decisions of large-scale commercial vehicles are realized.
Has the beneficial effects that: compared with a general driving decision method, the method provided by the invention has the characteristics of more effectiveness and reliability, and is specifically embodied as follows:
(1) The method provided by the invention can simulate the safe driving behavior of a human driver, provides a more reasonable and safe lateral anti-collision decision strategy for large-scale commercial vehicles, realizes the lateral anti-collision driving decision of the commercial vehicles in the low-speed environment of the city, and can ensure the running safety of the commercial vehicles.
(2) The method provided by the invention comprehensively considers the influence of factors such as driving conditions, visual blind areas, traffic participant types and the like on driving safety, sets a refined reward function aiming at different traffic participant types, realizes the lateral anti-collision driving decision under different driving conditions, and further improves the effectiveness and reliability of the decision.
(3) The method provided by the invention does not need to consider complex vehicle dynamics equations and vehicle body parameters, the calculation method is simple and clear, the lateral anti-collision strategy of the large-scale commercial vehicle can be output in real time, and the used sensor has low cost and is convenient for large-scale popularization.
Drawings
FIG. 1 is a technical roadmap for the present invention.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The invention provides a decision-making method for vehicle lateral anti-collision driving in a low-speed city environment, aiming at large-scale operation vehicles such as large buses, trucks and city logistics vehicles. Firstly, an urban traffic scene is constructed by utilizing a hardware-in-the-loop driving simulation platform, and safe driving behaviors under different driving conditions and driving conditions are simulated and collected. Secondly, the safe driving behavior of the human driver is simulated by a data set aggregation algorithm in a mode of simulating learning. And finally, further learning a lateral anti-collision strategy by using a near-end strategy optimization algorithm in an unsupervised learning mode, and realizing high-level decision output of lateral anti-collision driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and traffic participant types on the driving safety, provides a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and realizes the decision of lateral anti-collision driving of the commercial vehicles in the low-speed environment of cities. The technical route of the invention is shown in figure 1, and the specific steps are as follows:
the method comprises the following steps: urban traffic scene construction by using driving simulation platform
In order to reduce the frequency of side collision accidents of commercial vehicles caused by factors such as visual blind areas and improve the driving safety of the commercial vehicles, the invention provides a decision-making method for side collision avoidance of the commercial vehicles in an urban low-speed environment, which is applicable to the following scenes: the commercial vehicle runs in a low-speed city environment, other traffic participants (motor vehicles, non-motor vehicles or pedestrians) exist on the left side or the right side of the commercial vehicle, and an effective and reliable lateral anti-collision driving strategy is provided for a driver in order to avoid a lateral collision accident.
According to the scene, firstly, an urban traffic scene is constructed by using a driving simulation platform, and a traffic flow and traffic participants with high randomness are set to cover a straight road, a curve and an intersection. Secondly, a plurality of drivers control the operating vehicle by using a driving simulator (a steering wheel, an accelerator and a brake pedal), and collect safe driving behaviors under 8 driving conditions of lane change, lane keeping, vehicle following, left steering, right steering, acceleration, deceleration, uniform speed and the like. And finally, constructing a safe driving behavior database D based on the collected safe driving behaviors.
Step two: simulation of safe driving behavior of driver by using imitation learning method
Data set Aggregation (DAgger) is a more advanced behavior cloning method, strategies can be actively selected from a safe driving behavior database, the safe driving behaviors of human drivers can be easily matched in the subsequent training process, and the simulation learning capability is higher. Therefore, the invention utilizes the DAgger algorithm to simulate the safe driving behavior of a human driver. The safe driving behavior database D will continuously aggregate new data sets D at each time step i i The specific training process is as follows:
substep 1: initializing a parameter phi;
substep 2: initializing a strategy pi;
substep 3: performing a loop of N time steps, each loop comprising sub-steps 3.1 to 3.5, in particular:
substep 3.1: the strategy is updated using the following equation:
Figure BDA0003829934830000071
in the formula, pi i Strategy for indicating the ith time, pi * Express expert strategy, beta i Representing the parameters for soft updates of the policy at the ith iteration,
Figure BDA0003829934830000072
representing the optimal strategy at the ith moment;
substep 3.2: by using pi i Sampling an expert track;
substep 3.3: output is formed by i Accessed data set D composed of policies and actions given by experts i ={(S t* (S t ))}, S t Representing the state space at time t;
substep 3.4: aggregating the data sets: d ← D ≈ D ÷ D i
Substep 3.5: training a strategy on a data set D
Figure BDA0003829934830000073
Wherein
Figure BDA0003829934830000074
Representing the optimal strategy at the moment i + 1; and substep 4: finally, returning to the optimal strategy at the moment of N +1
Figure BDA0003829934830000075
Step three: further learning of collision avoidance strategies using unsupervised learning methods
In an actual driving decision task, it is difficult to effectively and accurately handle driving conditions and driving conditions not involved in the safe driving behavior database by the lack of sufficient generalization capability of the driving decision based on the imitation learning. In order to further improve the effectiveness and reliability of the lateral collision avoidance decision, a decision network needs to be trained further. Deep reinforcement learning is used as an unsupervised learning method, understanding of traffic environment can be obtained through continuous exploration and trial and error, and improvement of a strategy network is guided by reward of environment feedback, so that maximum return is obtained. The near-end strategy optimization algorithm uses a trust domain strategy optimization algorithm for reference, and a new balance is obtained among sampling efficiency, algorithm performance and complexity of realization and debugging by using first-order optimization. Therefore, the method utilizes the near-end strategy optimization algorithm to construct the anti-collision decision model, and trains the anti-collision decision model on the basis of the step two.
First, the lateral collision avoidance decision problem of the operating vehicle is converted into a markov decision process under a certain reward function, which can be described as (S, a, P, R). Where S is a state space, a is a driving action, P represents a state transition probability due to uncertainty of the motion of the target vehicle, and R is a reward function. Secondly, defining basic parameters of the Markov decision process, specifically:
(1) Establishing a state space
Firstly, a state space is constructed by utilizing motion state information of the own vehicle and relative motion state information of the own vehicle and surrounding traffic participants:
Figure BDA0003829934830000081
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the self-vehicle, and the unit is meter per square second theta s Indicating the heading angle of the vehicle in degrees.
Figure BDA0003829934830000082
Respectively, the relative distance and the relative speed of the own vehicle and the jth surrounding traffic participant are respectively expressed in meters and meters per second. Wherein j =1,2,3,4,5,6, respectively represents a front traffic participant, a left rear traffic participant, a right rear traffic participant, and a right front traffic participant. Considering that the number of traffic participants in an actual traffic scene is not fixed, when the sensor observes i (i is less than j) traffic participants, the last j-i line of the state space is filled with zeros.
(2) Establishing an action space
To output advanced driving decisions, the present invention defines the motion space as discrete lateral and longitudinal motions.
A t =[a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ] (3)
In the formula, A t Represents the motion space at time t, a 1 ,a 2 ,a 3 Respectively, left turn, right turn and straight line, a 4 ,a 5 ,a 6 Respectively, acceleration, deceleration, and holding the speed constant.
(3) Establishing a reward function
In order to quantitatively evaluate the advantages and disadvantages of an anti-collision strategy, the invention establishes an anti-collision reward function considering the influence of traffic participant types on driving safety:
Figure BDA0003829934830000083
in the formula, R t The reward function, x, representing time t min_1 ,x min_2 Representing the lateral safety distance threshold in meters, in the present invention, x min_1 =2,x min_2 =2.5。
Furthermore, negative feedback is applied to the decision making the side impact, i.e. when the outputted decision strategy results in a side impact, the reward value obtained at the current moment is subtracted by 50.
Secondly, training the constructed collision avoidance decision model, and specifically comprising the following substeps:
substep 1: initializing a policy parameter θ 0 Sum function parameter phi 0
Substep 2: a loop of T time steps is performed, each loop comprising sub-step 2.1 to sub-step 2.5, in particular:
substep 2.1: running policy π in the Environment k =π(θ k );
Substep 2.2: calculating the optimal prize value at time t
Figure BDA0003829934830000091
Substep 2.3: based on current value function
Figure BDA0003829934830000092
Calculating an estimate of a merit function
Figure BDA0003829934830000093
Substep 2.4: policy updates are made using the following equation:
Figure BDA0003829934830000094
in the formula, theta k+1 Representing the policy network parameter at time k +1, epsilon representing the hyper-parameter, pi θ Represents a policy network with a parameter θ, clip () represents a truncation function, which may be
Figure BDA0003829934830000095
Truncation at [ 1-epsilon, 1+ epsilon]Wherein τ denotes a hyper-parameter that determines the soft update width, argmax (·) denotes a variable that maximizes the objective function,
Figure BDA0003829934830000096
representing the dominance value of a state-action pair.
Substep 2.5: the value function update is performed using:
Figure BDA0003829934830000097
in the formula, phi k+1 Parameter of value function, V, representing the time k +1 φ (S t ) Represents a state space S t The following value function.
And finally, after the anti-collision decision model is trained, the motion state information of the vehicle and the relative motion state information of the vehicle and surrounding traffic participants are input into the anti-collision decision model, driving suggestions such as acceleration, deceleration, lane change and the like can be output, and effective and reliable lateral anti-collision driving decisions of large-scale commercial vehicles are realized.

Claims (1)

1. A decision-making method for side anti-collision driving of an operating vehicle in an urban low-speed environment comprises the steps of firstly, utilizing a hardware-in-loop driving simulation platform to construct an urban traffic scene, and simulating and collecting safe driving behaviors under different driving conditions and driving working conditions; secondly, simulating the safe driving behavior of the driver by using a data set combination algorithm in a learning simulation mode; finally, a lateral anti-collision strategy is further learned by a near-end strategy optimization algorithm in an unsupervised learning mode, an anti-collision driving strategy is provided for large commercial vehicles, and a decision of lateral anti-collision driving of the commercial vehicles in an urban low-speed environment is realized; the method is characterized in that:
the method comprises the following steps: urban traffic scene construction by using driving simulation platform
The commercial vehicle runs in a low-speed environment in a city, and other traffic participants including motor vehicles, non-motor vehicles or pedestrians exist on the left side or the right side of the commercial vehicle;
according to the scenes described above, firstly, a driving simulation platform is utilized to construct an urban traffic scene, which covers straight roads, curved roads and intersections, and traffic flows and traffic participants with high randomness are set; secondly, a plurality of drivers control the operating vehicles by using a driving simulator, wherein the driving simulator is provided with a steering wheel, an accelerator and a brake pedal and collects safe driving behaviors under 8 driving conditions of lane change, lane keeping, vehicle following, left steering, right steering, acceleration, deceleration and uniform speed; finally, constructing a safe driving behavior database D based on the collected safe driving behaviors;
step two: simulating safe driving behavior of driver by using imitation learning method
Simulating safe driving behaviors of human drivers by using a data set combination algorithm; the safe driving behavior database D will continuously aggregate new data sets D at each time step i i The specific training process is as follows:
substep 1: initializing a parameter phi;
substep 2: initializing a strategy pi;
substep 3: performing a loop of N time steps, each loop comprising sub-steps 3.1 to 3.5, in particular:
substep 3.1: the strategy is updated using the following equation:
Figure FDA0003829934820000011
in the formula, pi i Strategy for indicating the ith time, pi * Express expert strategy, beta i Representing the parameters for soft update of the policy at the ith iteration,
Figure FDA0003829934820000012
representing the optimal strategy at the ith moment;
substep 3.2: using a pi i Sampling an expert track;
substep 3.3: output is formed by i Accessed data set D composed of policies and actions given by experts i ={(S t* (S t ))},S t Representing the state space at time t;
substep 3.4: aggregating the data sets: d ← D & ÷ D & - i
Substep 3.5: training a strategy on a data set D
Figure FDA0003829934820000021
Wherein
Figure FDA0003829934820000022
Representing the optimal strategy at the moment i + 1;
substep 4: finally, returning to the optimal strategy at the moment of N +1
Figure FDA0003829934820000023
Step three: further learning of collision avoidance strategies using unsupervised learning methods
Constructing an anti-collision decision model by using a near-end strategy optimization algorithm, and training the anti-collision decision model on the basis of the second step; firstly, converting a lateral collision avoidance decision problem of a commercial vehicle into a Markov decision process under a certain reward function, which is described as (S, A, P, R); wherein S is a state space, A is a driving action, P represents a state transition probability caused by uncertainty of the motion of the target vehicle, and R is a reward function; secondly, defining basic parameters of a Markov decision process, specifically:
(1) Establishing a state space
Firstly, a state space is constructed by utilizing motion state information of the own vehicle and relative motion state information of the own vehicle and surrounding traffic participants:
Figure FDA0003829934820000024
in the formula, S t Representing the state space at time t, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the self-vehicle, and the unit is meter per square second theta s Representing the course angle of the vehicle, and the unit is degree;
Figure FDA0003829934820000025
respectively representing the relative distance and the relative speed of the own vehicle and the jth surrounding traffic participant, wherein the units are meter and meter per second respectively; wherein j =1,2,3,4,5,6, respectively representing a traffic participant in front, a traffic participant in front left, a traffic participant in rear right, and a traffic participant in front right; considering that the number of traffic participants in an actual traffic scene is not fixed, when a sensor observes i (i is less than j) traffic participants, the last j-i line of the state space is filled with zero;
(2) Establishing an action space
Defining an action space as discrete lateral and longitudinal actions for outputting advanced driving decisions;
A t =[a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ] (3)
in the formula, A t Represents the motion space at time t, a 1 ,a 2 ,a 3 Respectively, left turn, right turn and straight line, a 4 ,a 5 ,a 6 Respectively representing acceleration, deceleration and keeping the speed unchanged;
(3) Establishing a reward function
In order to quantitatively evaluate the advantages and disadvantages of the anti-collision strategy, an anti-collision reward function considering the influence of the traffic participant type on the driving safety is established:
Figure FDA0003829934820000031
in the formula, R t Reward function, x, indicating time t min_1 ,x min_2 Representing the lateral safety distance threshold in meters, in the present invention, x min_1 =2,x min_2 =2.5;
In addition, negative feedback is applied to the decision causing the side collision, namely when the output decision strategy causes the side collision, the reward value obtained at the current moment is subtracted by 50;
secondly, training the constructed collision avoidance decision model, and specifically comprising the following substeps:
substep 1: initializing a policy parameter θ 0 Parameter phi of sum function 0
Substep 2: a loop of T time steps is performed, each loop comprising sub-step 2.1 to sub-step 2.5, in particular:
substep 2.1: running a policy in the Environment k =π(θ k ),θ k A policy network parameter representing time k;
substep 2.2: calculating the optimal prize value at time t
Figure FDA0003829934820000032
Substep 2.3: based on current value function
Figure FDA0003829934820000033
Calculating an estimate of a merit function
Figure FDA0003829934820000034
Substep 2.4: policy updates are made using the following equation:
Figure FDA0003829934820000035
in the formula, theta k+1 Representing the policy network parameter at time k +1, epsilon representing the hyper-parameter, pi θ A policy network with a parameter theta, clip (-) representing a truncation function, may be
Figure FDA0003829934820000036
Truncation at [ 1-epsilon, 1+ epsilon]Where τ denotes a hyper-parameter that determines the magnitude of the soft update, argmax (·) denotes a variable that maximizes the objective function,
Figure FDA0003829934820000037
a dominance value representing a state-action pair;
substep 2.5: the value function update is performed using:
Figure FDA0003829934820000041
in the formula, phi k+1 Parameter of value function, V, representing the time k +1 φ (S t ) Represents a state space S t A value function of;
and finally, after the anti-collision decision model is trained, the motion state information of the vehicle and the relative motion state information of the vehicle and surrounding traffic participants are input into the anti-collision decision model, driving suggestions such as acceleration, deceleration, lane change and the like can be output, and effective and reliable lateral anti-collision driving decisions of the large commercial vehicle are realized.
CN202211070522.XA 2022-09-02 2022-09-02 Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment Pending CN115257789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211070522.XA CN115257789A (en) 2022-09-02 2022-09-02 Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211070522.XA CN115257789A (en) 2022-09-02 2022-09-02 Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment

Publications (1)

Publication Number Publication Date
CN115257789A true CN115257789A (en) 2022-11-01

Family

ID=83755043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211070522.XA Pending CN115257789A (en) 2022-09-02 2022-09-02 Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment

Country Status (1)

Country Link
CN (1) CN115257789A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116822659A (en) * 2023-08-31 2023-09-29 浪潮(北京)电子信息产业有限公司 Automatic driving motor skill learning method, system, equipment and computer medium
CN116959260A (en) * 2023-09-20 2023-10-27 东南大学 Multi-vehicle driving behavior prediction method based on graph neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116822659A (en) * 2023-08-31 2023-09-29 浪潮(北京)电子信息产业有限公司 Automatic driving motor skill learning method, system, equipment and computer medium
CN116822659B (en) * 2023-08-31 2024-01-23 浪潮(北京)电子信息产业有限公司 Automatic driving motor skill learning method, system, equipment and computer medium
CN116959260A (en) * 2023-09-20 2023-10-27 东南大学 Multi-vehicle driving behavior prediction method based on graph neural network
CN116959260B (en) * 2023-09-20 2023-12-05 东南大学 Multi-vehicle driving behavior prediction method based on graph neural network

Similar Documents

Publication Publication Date Title
CN110969848B (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
CN110297494B (en) Decision-making method and system for lane change of automatic driving vehicle based on rolling game
CN106740846B (en) A kind of electric car self-adapting cruise control method of double mode switching
CN102109821B (en) System and method for controlling adaptive cruise of vehicles
CN113291308B (en) Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics
CN115257789A (en) Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment
CN106874597A (en) A kind of highway passing behavior decision-making technique for being applied to automatic driving vehicle
CN112622886B (en) Anti-collision early warning method for heavy operation vehicle comprehensively considering front and rear obstacles
CN107813820A (en) A kind of unmanned vehicle lane-change paths planning method for imitating outstanding driver
CN113253739B (en) Driving behavior decision method for expressway
CN112249008B (en) Unmanned automobile early warning method aiming at complex dynamic environment
Zhu et al. Safe model-based off-policy reinforcement learning for eco-driving in connected and automated hybrid electric vehicles
CN110956851A (en) Intelligent networking automobile cooperative scheduling lane changing method
CN110320916A (en) Consider the autonomous driving vehicle method for planning track and system of occupant's impression
CN114580302A (en) Decision planning method for automatic driving automobile based on maximum entropy reinforcement learning
CN115257819A (en) Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment
Yeom Model predictive control and deep reinforcement learning based energy efficient eco-driving for battery electric vehicles
CN113901718A (en) Deep reinforcement learning-based driving collision avoidance optimization method in following state
CN112201070A (en) Deep learning-based automatic driving expressway bottleneck section behavior decision method
CN113120003B (en) Unmanned vehicle motion behavior decision method
Li et al. Deep reinforcement learning-based eco-driving control for connected electric vehicles at signalized intersections considering traffic uncertainties
CN114802306A (en) Intelligent vehicle integrated decision-making system based on man-machine co-driving concept
Zhang et al. Enhancement of Driving Strategy of Electric Vehicle by Consideration of Individual Driver Intention
Zhang et al. Lane Change Decision Algorithm Based on Deep Q Network for Autonomous Vehicles
Pathare et al. Improved Tactical Decision Making and Control Architecture for Autonomous Truck in SUMO Using Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination