CN115257789A - Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment - Google Patents
Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment Download PDFInfo
- Publication number
- CN115257789A CN115257789A CN202211070522.XA CN202211070522A CN115257789A CN 115257789 A CN115257789 A CN 115257789A CN 202211070522 A CN202211070522 A CN 202211070522A CN 115257789 A CN115257789 A CN 115257789A
- Authority
- CN
- China
- Prior art keywords
- driving
- collision
- substep
- strategy
- vehicle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
- B60W30/08—Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
- B60W2050/0028—Mathematical models, e.g. for simulation
- B60W2050/0029—Mathematical model of the driver
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2300/00—Indexing codes relating to the type of vehicle
- B60W2300/12—Trucks; Load vehicles
Abstract
The invention discloses a decision-making method for side anti-collision driving of an operating vehicle in an urban low-speed environment. Firstly, an urban traffic scene is constructed by utilizing a hardware-in-the-loop driving simulation platform, and safe driving behaviors under different driving conditions and driving conditions are simulated and collected. Secondly, the safe driving behavior of the human driver is simulated by a data set aggregation algorithm in a mode of simulating learning. And finally, further learning a lateral anti-collision strategy by using a near-end strategy optimization algorithm in an unsupervised learning mode, and realizing high-level decision output of lateral anti-collision driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and traffic participant types on the driving safety, provides a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and realizes the decision of lateral anti-collision driving of the commercial vehicles in the low-speed environment of cities.
Description
Technical Field
The invention relates to a decision-making method for driving of commercial vehicles, in particular to a decision-making method for lateral anti-collision driving of the commercial vehicles in an urban low-speed environment, and belongs to the technical field of automobile safety.
Background
In commercial vehicle traffic accidents in urban environments, the percentage of accidents caused by vision blind areas is highest. The reason is that under the influence of a plurality of factors such as the length of the vehicle body of a commercial vehicle, the high driving position, the large difference between the inner wheel and the outer wheel, the large right turning radius and the like, when the vehicle turns, particularly turns to the right, a dynamic vision blind area in a crescent shape is formed, and pedestrians and non-motor vehicles in the vision blind area are easy to collide and even roll. Therefore, the right side of the commercial vehicle is one of the most dangerous areas in all blind vision areas, and is the main area where serious safety accidents such as side collision, rolling and the like occur. Under the urban traffic environment with more types and dense quantity of traffic participants, especially when vehicles run at low speed (starting, turning to the right and the like), how to avoid the lateral collision caused by the visual blind areas of operating vehicles becomes a core problem for ensuring the safety of road traffic and transportation.
If the driver can be warned before collision and rolling accidents occur and the driver is reminded to take operations such as speed reduction and steering, the frequency of traffic accidents caused by visual blind areas can be greatly reduced or the damage caused by the traffic accidents can be reduced. Therefore, an effective and reliable operation vehicle lateral anti-collision driving decision method is researched for an urban low-speed environment with mixed operation of machines and non-machines, and the method plays an important role in reducing the frequency of vehicle lateral collision and improving the road traffic safety.
Generally, although the existing method can play a certain early warning role, certain defects still exist in the aspects of effectiveness and reliability of lateral collision prevention, and the research of lateral collision prevention driving strategies for providing specific driving suggestions such as driving speed, steering and the like is not involved, particularly the research of an effective and reliable decision-making method for lateral collision prevention driving of a commercial vehicle in a low-speed environment of a city is lacked.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a lateral anti-collision driving decision method of an operating vehicle in a low-speed city environment, aiming at large operating vehicles such as large buses, trucks and urban logistics vehicles, and aiming at realizing a lateral anti-collision driving decision of the operating vehicle in the low-speed city environment and ensuring the running safety of the vehicle. The method can simulate safe driving behaviors of human drivers, considers the influence of factors such as driving conditions, visual blind areas and traffic participant types on driving safety, can provide a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and further guarantees the running safety of the commercial vehicles. Meanwhile, the method does not need to consider complex vehicle dynamics equations and vehicle body parameters, the calculation method is simple and clear, the lateral collision avoidance decision strategy of the large-scale commercial vehicle can be output in real time, the cost of the used sensor is low, and the method is convenient for large-scale popularization.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a decision-making method for side anti-collision driving of commercial vehicles in an urban low-speed environment. Firstly, an urban traffic scene is constructed by utilizing a hardware-in-the-loop driving simulation platform, and safe driving behaviors under different driving conditions and driving conditions are simulated and collected. Secondly, the safe driving behavior of the human driver is simulated by a data set aggregation algorithm in a mode of simulating learning. And finally, further learning a lateral anti-collision strategy by using a near-end strategy optimization algorithm in an unsupervised learning mode, and realizing high-level decision output of lateral anti-collision driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and traffic participant types on the driving safety, provides a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and realizes the lateral anti-collision driving decision of the commercial vehicles in the low-speed environment of cities. The method specifically comprises the following three steps:
the method comprises the following steps: urban traffic scene construction by using driving simulation platform
In order to reduce the frequency of the occurrence of side collision accidents of the commercial vehicles caused by factors such as visual blind areas and the like and improve the driving safety of the commercial vehicles, the invention provides a decision-making method for side collision avoidance of the commercial vehicles in an urban low-speed environment, which is applicable to the following scenes: the commercial vehicle runs in a low-speed city environment, other traffic participants (motor vehicles, non-motor vehicles or pedestrians) exist on the left side or the right side of the commercial vehicle, and an effective and reliable lateral anti-collision driving strategy is provided for a driver in order to avoid a lateral collision accident.
According to the scene, firstly, an urban traffic scene is constructed by using a driving simulation platform, and a traffic flow and traffic participants with high randomness are set to cover a straight road, a curve and an intersection. Secondly, a plurality of drivers control the operating vehicle by using a driving simulator (a steering wheel, an accelerator and a brake pedal), and collect safe driving behaviors under 8 driving conditions of lane change, lane keeping, vehicle following, left steering, right steering, acceleration, deceleration, uniform speed and the like. And finally, constructing a safe driving behavior database D based on the collected safe driving behaviors.
Step two: simulation of safe driving behavior of driver by using imitation learning method
Data set Aggregation (DAgger) is a more advanced behavior cloning method, strategies can be actively selected from a safe driving behavior database, the safe driving behavior of a human driver can be matched easily in the subsequent training process, and the simulation learning capability is higher. Therefore, the invention utilizes the DAgger algorithm to simulate the safe driving behavior of a human driver. The safe driving behavior database D will continuously aggregate new data sets D at each time step i i The specific training process is as follows:
substep 1: initializing a parameter phi;
and substep 2: initializing a strategy pi;
substep 3: performing a loop of N time steps, each loop comprising sub-steps 3.1 to 3.5, in particular:
substep 3.1: the strategy is updated using the following equation:
in the formula, pi i Strategy for indicating the ith time, pi * Express expert strategy, beta i Representing the parameters for soft update of the policy at the ith iteration,representing the optimal strategy at the ith moment;
substep 3.2: using a pi i Sampling the expert track;
substep 3.3: output by pi i Accessed data set D composed of policies and actions given by experts i ={(S t ,π * (S t ))}, S t Representing the state space at time t;
substep 3.4: aggregating the data sets: d ← D & ÷ D & - i ;
Substep 3.5: training a strategy on a data set DWhereinThe optimal strategy of the i +1 moment is represented;
Step three: further learning collision avoidance strategies using unsupervised learning methods
In an actual driving decision task, it is difficult to effectively and accurately process driving conditions and driving conditions not involved in a safe driving behavior database by the lack of sufficient generalization ability of driving decisions based on the imitation learning. In order to further improve the effectiveness and reliability of the lateral collision avoidance decision, a decision network needs to be further trained. Deep reinforcement learning is used as an unsupervised learning method, understanding of traffic environment can be obtained through continuous exploration and trial and error, and improvement of a strategy network is guided by reward of environment feedback, so that maximum return is obtained. The near-end strategy optimization algorithm uses a trust domain strategy optimization algorithm for reference, and a new balance is obtained among sampling efficiency, algorithm performance and complexity of realization and debugging by using first-order optimization. Therefore, the method utilizes the near-end strategy optimization algorithm to construct the anti-collision decision model, and trains the anti-collision decision model on the basis of the step two.
First, the lateral collision avoidance decision problem of the commercial vehicle is converted into a markov decision process under a certain reward function, which can be described as (S, a, P, R). Where S is a state space, A is a driving action, P represents a probability of state transition due to uncertainty in the motion of the target vehicle, and R is a reward function. Secondly, defining basic parameters of a Markov decision process, specifically:
(1) Establishing a state space
Firstly, a state space is constructed by utilizing motion state information of the own vehicle and relative motion state information of the own vehicle and surrounding traffic participants:
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the bicycle, and the unit is meter per second squared and theta s Represents the heading angle of the own vehicle and has the unit of degree.Respectively, the relative distance and the relative speed of the own vehicle and the jth surrounding traffic participant are respectively expressed in meters and meters per second. Wherein j =1,2,3,4,5,6, in minutesRespectively represent a traffic participant in front, a traffic participant in front of the left, a traffic participant in the rear of the right, and a traffic participant in the front of the right. Considering that the number of traffic participants in an actual traffic scene is not fixed, when a sensor observes i (i is less than j) traffic participants, the last j-i rows of the state space are filled with zeros.
(2) Establishing an action space
To output advanced driving decisions, the present invention defines the motion space as discrete lateral and longitudinal motions.
A t =[a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ] (3)
In the formula, A t Represents the motion space at time t, a 1 ,a 2 ,a 3 Respectively representing left turn, right turn and straight movement, a 4 ,a 5 ,a 6 Respectively, acceleration, deceleration and holding speed constant.
(3) Establishing a reward function
In order to quantitatively evaluate the advantages and disadvantages of an anti-collision strategy, the invention establishes an anti-collision reward function considering the influence of traffic participant types on driving safety:
in the formula, R t The reward function, x, representing time t min_1 ,x min_2 Representing the lateral safety distance threshold in meters, in the present invention, x min_1 =2,x min_2 =2.5。
Furthermore, negative feedback is applied to the decision making the side impact, i.e. when the output decision strategy results in a side impact, the reward value obtained at the current moment is subtracted by 50.
Secondly, training the constructed collision avoidance decision model, and specifically comprising the following substeps:
substep 1: initializing a policy parameter θ 0 Sum function parameterNumber phi 0 ;
Substep 2: a loop of T time steps is performed, each loop comprising sub-step 2.1 to sub-step 2.5, in particular:
substep 2.1: running a policy in the Environment k =π(θ k );
Substep 2.4: policy updates are made using the following equation:
in the formula, theta k+1 Representing the policy network parameter at time k +1, epsilon representing the hyper-parameter, pi θ A policy network with a parameter theta, clip (-) representing a truncation function, may beTruncation at [ 1-epsilon, 1+ epsilon]Where τ denotes a hyper-parameter that determines the magnitude of the soft update, argmax (·) denotes a variable that maximizes the objective function,representing the dominance value of a state-action pair.
Substep 2.5: the value function update is performed using:
in the formula, phi k+1 Parameter of value function, V, representing the time k +1 φ (S t ) Represents a state space S t The following value function.
And finally, after the anti-collision decision model is trained, the motion state information of the vehicle and the relative motion state information of the vehicle and surrounding traffic participants are input into the anti-collision decision model, driving suggestions such as acceleration, deceleration, lane change and the like can be output, and effective and reliable lateral anti-collision driving decisions of large-scale commercial vehicles are realized.
Has the beneficial effects that: compared with a general driving decision method, the method provided by the invention has the characteristics of more effectiveness and reliability, and is specifically embodied as follows:
(1) The method provided by the invention can simulate the safe driving behavior of a human driver, provides a more reasonable and safe lateral anti-collision decision strategy for large-scale commercial vehicles, realizes the lateral anti-collision driving decision of the commercial vehicles in the low-speed environment of the city, and can ensure the running safety of the commercial vehicles.
(2) The method provided by the invention comprehensively considers the influence of factors such as driving conditions, visual blind areas, traffic participant types and the like on driving safety, sets a refined reward function aiming at different traffic participant types, realizes the lateral anti-collision driving decision under different driving conditions, and further improves the effectiveness and reliability of the decision.
(3) The method provided by the invention does not need to consider complex vehicle dynamics equations and vehicle body parameters, the calculation method is simple and clear, the lateral anti-collision strategy of the large-scale commercial vehicle can be output in real time, and the used sensor has low cost and is convenient for large-scale popularization.
Drawings
FIG. 1 is a technical roadmap for the present invention.
Detailed Description
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The invention provides a decision-making method for vehicle lateral anti-collision driving in a low-speed city environment, aiming at large-scale operation vehicles such as large buses, trucks and city logistics vehicles. Firstly, an urban traffic scene is constructed by utilizing a hardware-in-the-loop driving simulation platform, and safe driving behaviors under different driving conditions and driving conditions are simulated and collected. Secondly, the safe driving behavior of the human driver is simulated by a data set aggregation algorithm in a mode of simulating learning. And finally, further learning a lateral anti-collision strategy by using a near-end strategy optimization algorithm in an unsupervised learning mode, and realizing high-level decision output of lateral anti-collision driving behaviors of the commercial vehicle. The method provided by the invention can simulate the safe driving behavior of human drivers, considers the influence of factors such as visual blind areas and traffic participant types on the driving safety, provides a more reasonable and effective anti-collision driving strategy for large-scale commercial vehicles, and realizes the decision of lateral anti-collision driving of the commercial vehicles in the low-speed environment of cities. The technical route of the invention is shown in figure 1, and the specific steps are as follows:
the method comprises the following steps: urban traffic scene construction by using driving simulation platform
In order to reduce the frequency of side collision accidents of commercial vehicles caused by factors such as visual blind areas and improve the driving safety of the commercial vehicles, the invention provides a decision-making method for side collision avoidance of the commercial vehicles in an urban low-speed environment, which is applicable to the following scenes: the commercial vehicle runs in a low-speed city environment, other traffic participants (motor vehicles, non-motor vehicles or pedestrians) exist on the left side or the right side of the commercial vehicle, and an effective and reliable lateral anti-collision driving strategy is provided for a driver in order to avoid a lateral collision accident.
According to the scene, firstly, an urban traffic scene is constructed by using a driving simulation platform, and a traffic flow and traffic participants with high randomness are set to cover a straight road, a curve and an intersection. Secondly, a plurality of drivers control the operating vehicle by using a driving simulator (a steering wheel, an accelerator and a brake pedal), and collect safe driving behaviors under 8 driving conditions of lane change, lane keeping, vehicle following, left steering, right steering, acceleration, deceleration, uniform speed and the like. And finally, constructing a safe driving behavior database D based on the collected safe driving behaviors.
Step two: simulation of safe driving behavior of driver by using imitation learning method
Data set Aggregation (DAgger) is a more advanced behavior cloning method, strategies can be actively selected from a safe driving behavior database, the safe driving behaviors of human drivers can be easily matched in the subsequent training process, and the simulation learning capability is higher. Therefore, the invention utilizes the DAgger algorithm to simulate the safe driving behavior of a human driver. The safe driving behavior database D will continuously aggregate new data sets D at each time step i i The specific training process is as follows:
substep 1: initializing a parameter phi;
substep 2: initializing a strategy pi;
substep 3: performing a loop of N time steps, each loop comprising sub-steps 3.1 to 3.5, in particular:
substep 3.1: the strategy is updated using the following equation:
in the formula, pi i Strategy for indicating the ith time, pi * Express expert strategy, beta i Representing the parameters for soft updates of the policy at the ith iteration,representing the optimal strategy at the ith moment;
substep 3.2: by using pi i Sampling an expert track;
substep 3.3: output is formed by i Accessed data set D composed of policies and actions given by experts i ={(S t ,π * (S t ))}, S t Representing the state space at time t;
substep 3.4: aggregating the data sets: d ← D ≈ D ÷ D i ;
Substep 3.5: training a strategy on a data set DWhereinRepresenting the optimal strategy at the moment i + 1; and substep 4: finally, returning to the optimal strategy at the moment of N +1
Step three: further learning of collision avoidance strategies using unsupervised learning methods
In an actual driving decision task, it is difficult to effectively and accurately handle driving conditions and driving conditions not involved in the safe driving behavior database by the lack of sufficient generalization capability of the driving decision based on the imitation learning. In order to further improve the effectiveness and reliability of the lateral collision avoidance decision, a decision network needs to be trained further. Deep reinforcement learning is used as an unsupervised learning method, understanding of traffic environment can be obtained through continuous exploration and trial and error, and improvement of a strategy network is guided by reward of environment feedback, so that maximum return is obtained. The near-end strategy optimization algorithm uses a trust domain strategy optimization algorithm for reference, and a new balance is obtained among sampling efficiency, algorithm performance and complexity of realization and debugging by using first-order optimization. Therefore, the method utilizes the near-end strategy optimization algorithm to construct the anti-collision decision model, and trains the anti-collision decision model on the basis of the step two.
First, the lateral collision avoidance decision problem of the operating vehicle is converted into a markov decision process under a certain reward function, which can be described as (S, a, P, R). Where S is a state space, a is a driving action, P represents a state transition probability due to uncertainty of the motion of the target vehicle, and R is a reward function. Secondly, defining basic parameters of the Markov decision process, specifically:
(1) Establishing a state space
Firstly, a state space is constructed by utilizing motion state information of the own vehicle and relative motion state information of the own vehicle and surrounding traffic participants:
in the formula, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the self-vehicle, and the unit is meter per square second theta s Indicating the heading angle of the vehicle in degrees.Respectively, the relative distance and the relative speed of the own vehicle and the jth surrounding traffic participant are respectively expressed in meters and meters per second. Wherein j =1,2,3,4,5,6, respectively represents a front traffic participant, a left rear traffic participant, a right rear traffic participant, and a right front traffic participant. Considering that the number of traffic participants in an actual traffic scene is not fixed, when the sensor observes i (i is less than j) traffic participants, the last j-i line of the state space is filled with zeros.
(2) Establishing an action space
To output advanced driving decisions, the present invention defines the motion space as discrete lateral and longitudinal motions.
A t =[a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ] (3)
In the formula, A t Represents the motion space at time t, a 1 ,a 2 ,a 3 Respectively, left turn, right turn and straight line, a 4 ,a 5 ,a 6 Respectively, acceleration, deceleration, and holding the speed constant.
(3) Establishing a reward function
In order to quantitatively evaluate the advantages and disadvantages of an anti-collision strategy, the invention establishes an anti-collision reward function considering the influence of traffic participant types on driving safety:
in the formula, R t The reward function, x, representing time t min_1 ,x min_2 Representing the lateral safety distance threshold in meters, in the present invention, x min_1 =2,x min_2 =2.5。
Furthermore, negative feedback is applied to the decision making the side impact, i.e. when the outputted decision strategy results in a side impact, the reward value obtained at the current moment is subtracted by 50.
Secondly, training the constructed collision avoidance decision model, and specifically comprising the following substeps:
substep 1: initializing a policy parameter θ 0 Sum function parameter phi 0 ;
Substep 2: a loop of T time steps is performed, each loop comprising sub-step 2.1 to sub-step 2.5, in particular:
substep 2.1: running policy π in the Environment k =π(θ k );
Substep 2.4: policy updates are made using the following equation:
in the formula, theta k+1 Representing the policy network parameter at time k +1, epsilon representing the hyper-parameter, pi θ Represents a policy network with a parameter θ, clip () represents a truncation function, which may beTruncation at [ 1-epsilon, 1+ epsilon]Wherein τ denotes a hyper-parameter that determines the soft update width, argmax (·) denotes a variable that maximizes the objective function,representing the dominance value of a state-action pair.
Substep 2.5: the value function update is performed using:
in the formula, phi k+1 Parameter of value function, V, representing the time k +1 φ (S t ) Represents a state space S t The following value function.
And finally, after the anti-collision decision model is trained, the motion state information of the vehicle and the relative motion state information of the vehicle and surrounding traffic participants are input into the anti-collision decision model, driving suggestions such as acceleration, deceleration, lane change and the like can be output, and effective and reliable lateral anti-collision driving decisions of large-scale commercial vehicles are realized.
Claims (1)
1. A decision-making method for side anti-collision driving of an operating vehicle in an urban low-speed environment comprises the steps of firstly, utilizing a hardware-in-loop driving simulation platform to construct an urban traffic scene, and simulating and collecting safe driving behaviors under different driving conditions and driving working conditions; secondly, simulating the safe driving behavior of the driver by using a data set combination algorithm in a learning simulation mode; finally, a lateral anti-collision strategy is further learned by a near-end strategy optimization algorithm in an unsupervised learning mode, an anti-collision driving strategy is provided for large commercial vehicles, and a decision of lateral anti-collision driving of the commercial vehicles in an urban low-speed environment is realized; the method is characterized in that:
the method comprises the following steps: urban traffic scene construction by using driving simulation platform
The commercial vehicle runs in a low-speed environment in a city, and other traffic participants including motor vehicles, non-motor vehicles or pedestrians exist on the left side or the right side of the commercial vehicle;
according to the scenes described above, firstly, a driving simulation platform is utilized to construct an urban traffic scene, which covers straight roads, curved roads and intersections, and traffic flows and traffic participants with high randomness are set; secondly, a plurality of drivers control the operating vehicles by using a driving simulator, wherein the driving simulator is provided with a steering wheel, an accelerator and a brake pedal and collects safe driving behaviors under 8 driving conditions of lane change, lane keeping, vehicle following, left steering, right steering, acceleration, deceleration and uniform speed; finally, constructing a safe driving behavior database D based on the collected safe driving behaviors;
step two: simulating safe driving behavior of driver by using imitation learning method
Simulating safe driving behaviors of human drivers by using a data set combination algorithm; the safe driving behavior database D will continuously aggregate new data sets D at each time step i i The specific training process is as follows:
substep 1: initializing a parameter phi;
substep 2: initializing a strategy pi;
substep 3: performing a loop of N time steps, each loop comprising sub-steps 3.1 to 3.5, in particular:
substep 3.1: the strategy is updated using the following equation:
in the formula, pi i Strategy for indicating the ith time, pi * Express expert strategy, beta i Representing the parameters for soft update of the policy at the ith iteration,representing the optimal strategy at the ith moment;
substep 3.2: using a pi i Sampling an expert track;
substep 3.3: output is formed by i Accessed data set D composed of policies and actions given by experts i ={(S t ,π * (S t ))},S t Representing the state space at time t;
substep 3.4: aggregating the data sets: d ← D & ÷ D & - i ;
Substep 3.5: training a strategy on a data set DWhereinRepresenting the optimal strategy at the moment i + 1;
Step three: further learning of collision avoidance strategies using unsupervised learning methods
Constructing an anti-collision decision model by using a near-end strategy optimization algorithm, and training the anti-collision decision model on the basis of the second step; firstly, converting a lateral collision avoidance decision problem of a commercial vehicle into a Markov decision process under a certain reward function, which is described as (S, A, P, R); wherein S is a state space, A is a driving action, P represents a state transition probability caused by uncertainty of the motion of the target vehicle, and R is a reward function; secondly, defining basic parameters of a Markov decision process, specifically:
(1) Establishing a state space
Firstly, a state space is constructed by utilizing motion state information of the own vehicle and relative motion state information of the own vehicle and surrounding traffic participants:
in the formula, S t Representing the state space at time t, p x ,p y Respectively represents the transverse position and the longitudinal position of the bicycle, and the units are meter and v x ,v y Respectively, the transverse speed and the longitudinal speed of the self-vehicle, and the unit is meter per second, a x ,a y Respectively represents the lateral acceleration and the longitudinal acceleration of the self-vehicle, and the unit is meter per square second theta s Representing the course angle of the vehicle, and the unit is degree;respectively representing the relative distance and the relative speed of the own vehicle and the jth surrounding traffic participant, wherein the units are meter and meter per second respectively; wherein j =1,2,3,4,5,6, respectively representing a traffic participant in front, a traffic participant in front left, a traffic participant in rear right, and a traffic participant in front right; considering that the number of traffic participants in an actual traffic scene is not fixed, when a sensor observes i (i is less than j) traffic participants, the last j-i line of the state space is filled with zero;
(2) Establishing an action space
Defining an action space as discrete lateral and longitudinal actions for outputting advanced driving decisions;
A t =[a 1 ,a 2 ,a 3 ,a 4 ,a 5 ,a 6 ] (3)
in the formula, A t Represents the motion space at time t, a 1 ,a 2 ,a 3 Respectively, left turn, right turn and straight line, a 4 ,a 5 ,a 6 Respectively representing acceleration, deceleration and keeping the speed unchanged;
(3) Establishing a reward function
In order to quantitatively evaluate the advantages and disadvantages of the anti-collision strategy, an anti-collision reward function considering the influence of the traffic participant type on the driving safety is established:
in the formula, R t Reward function, x, indicating time t min_1 ,x min_2 Representing the lateral safety distance threshold in meters, in the present invention, x min_1 =2,x min_2 =2.5;
In addition, negative feedback is applied to the decision causing the side collision, namely when the output decision strategy causes the side collision, the reward value obtained at the current moment is subtracted by 50;
secondly, training the constructed collision avoidance decision model, and specifically comprising the following substeps:
substep 1: initializing a policy parameter θ 0 Parameter phi of sum function 0 ;
Substep 2: a loop of T time steps is performed, each loop comprising sub-step 2.1 to sub-step 2.5, in particular:
substep 2.1: running a policy in the Environment k =π(θ k ),θ k A policy network parameter representing time k;
Substep 2.4: policy updates are made using the following equation:
in the formula, theta k+1 Representing the policy network parameter at time k +1, epsilon representing the hyper-parameter, pi θ A policy network with a parameter theta, clip (-) representing a truncation function, may beTruncation at [ 1-epsilon, 1+ epsilon]Where τ denotes a hyper-parameter that determines the magnitude of the soft update, argmax (·) denotes a variable that maximizes the objective function,a dominance value representing a state-action pair;
substep 2.5: the value function update is performed using:
in the formula, phi k+1 Parameter of value function, V, representing the time k +1 φ (S t ) Represents a state space S t A value function of;
and finally, after the anti-collision decision model is trained, the motion state information of the vehicle and the relative motion state information of the vehicle and surrounding traffic participants are input into the anti-collision decision model, driving suggestions such as acceleration, deceleration, lane change and the like can be output, and effective and reliable lateral anti-collision driving decisions of the large commercial vehicle are realized.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211070522.XA CN115257789A (en) | 2022-09-02 | 2022-09-02 | Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211070522.XA CN115257789A (en) | 2022-09-02 | 2022-09-02 | Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115257789A true CN115257789A (en) | 2022-11-01 |
Family
ID=83755043
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211070522.XA Pending CN115257789A (en) | 2022-09-02 | 2022-09-02 | Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115257789A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116822659A (en) * | 2023-08-31 | 2023-09-29 | 浪潮(北京)电子信息产业有限公司 | Automatic driving motor skill learning method, system, equipment and computer medium |
CN116959260A (en) * | 2023-09-20 | 2023-10-27 | 东南大学 | Multi-vehicle driving behavior prediction method based on graph neural network |
-
2022
- 2022-09-02 CN CN202211070522.XA patent/CN115257789A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116822659A (en) * | 2023-08-31 | 2023-09-29 | 浪潮(北京)电子信息产业有限公司 | Automatic driving motor skill learning method, system, equipment and computer medium |
CN116822659B (en) * | 2023-08-31 | 2024-01-23 | 浪潮(北京)电子信息产业有限公司 | Automatic driving motor skill learning method, system, equipment and computer medium |
CN116959260A (en) * | 2023-09-20 | 2023-10-27 | 东南大学 | Multi-vehicle driving behavior prediction method based on graph neural network |
CN116959260B (en) * | 2023-09-20 | 2023-12-05 | 东南大学 | Multi-vehicle driving behavior prediction method based on graph neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110969848B (en) | Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes | |
CN110297494B (en) | Decision-making method and system for lane change of automatic driving vehicle based on rolling game | |
CN106740846B (en) | A kind of electric car self-adapting cruise control method of double mode switching | |
CN102109821B (en) | System and method for controlling adaptive cruise of vehicles | |
CN113291308B (en) | Vehicle self-learning lane-changing decision-making system and method considering driving behavior characteristics | |
CN115257789A (en) | Decision-making method for side anti-collision driving of commercial vehicle in urban low-speed environment | |
CN106874597A (en) | A kind of highway passing behavior decision-making technique for being applied to automatic driving vehicle | |
CN112622886B (en) | Anti-collision early warning method for heavy operation vehicle comprehensively considering front and rear obstacles | |
CN107813820A (en) | A kind of unmanned vehicle lane-change paths planning method for imitating outstanding driver | |
CN113253739B (en) | Driving behavior decision method for expressway | |
CN112249008B (en) | Unmanned automobile early warning method aiming at complex dynamic environment | |
Zhu et al. | Safe model-based off-policy reinforcement learning for eco-driving in connected and automated hybrid electric vehicles | |
CN110956851A (en) | Intelligent networking automobile cooperative scheduling lane changing method | |
CN110320916A (en) | Consider the autonomous driving vehicle method for planning track and system of occupant's impression | |
CN114580302A (en) | Decision planning method for automatic driving automobile based on maximum entropy reinforcement learning | |
CN115257819A (en) | Decision-making method for safe driving of large-scale commercial vehicle in urban low-speed environment | |
Yeom | Model predictive control and deep reinforcement learning based energy efficient eco-driving for battery electric vehicles | |
CN113901718A (en) | Deep reinforcement learning-based driving collision avoidance optimization method in following state | |
CN112201070A (en) | Deep learning-based automatic driving expressway bottleneck section behavior decision method | |
CN113120003B (en) | Unmanned vehicle motion behavior decision method | |
Li et al. | Deep reinforcement learning-based eco-driving control for connected electric vehicles at signalized intersections considering traffic uncertainties | |
CN114802306A (en) | Intelligent vehicle integrated decision-making system based on man-machine co-driving concept | |
Zhang et al. | Enhancement of Driving Strategy of Electric Vehicle by Consideration of Individual Driver Intention | |
Zhang et al. | Lane Change Decision Algorithm Based on Deep Q Network for Autonomous Vehicles | |
Pathare et al. | Improved Tactical Decision Making and Control Architecture for Autonomous Truck in SUMO Using Reinforcement Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |