CN114013443A - Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning - Google Patents

Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning Download PDF

Info

Publication number
CN114013443A
CN114013443A CN202111339265.0A CN202111339265A CN114013443A CN 114013443 A CN114013443 A CN 114013443A CN 202111339265 A CN202111339265 A CN 202111339265A CN 114013443 A CN114013443 A CN 114013443A
Authority
CN
China
Prior art keywords
lane
vehicle
target
automatic driving
speed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111339265.0A
Other languages
Chinese (zh)
Other versions
CN114013443B (en
Inventor
崔建勋
慈玉生
要甲
姜慧夫
曲明成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202111339265.0A priority Critical patent/CN114013443B/en
Publication of CN114013443A publication Critical patent/CN114013443A/en
Application granted granted Critical
Publication of CN114013443B publication Critical patent/CN114013443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/18Propelling the vehicle
    • B60W30/18009Propelling the vehicle related to particular drive situations
    • B60W30/18163Lane change; Overtaking manoeuvres
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/10Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to vehicle motion
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

A hierarchical reinforcement learning-based decision control method for lane change of an automatic driving vehicle belongs to the technical field of automatic driving control. The problem of there is the security poor/inefficiency in current autopilot process is solved. The method comprises the steps of establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation; the method comprises the steps of establishing an acceleration decision model for deep Q learning by utilizing a reward function corresponding to the following or lane changing action and speed in an actual driving scene of an automatic driving vehicle and relative position information of vehicles in the surrounding environment, obtaining lane changing or following acceleration information, and generating a reference lane changing track by adopting a 5-degree polynomial curve when lane changing. The method is suitable for automatic driving lane change decision and control.

Description

Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
Technical Field
The invention belongs to the technical field of automatic driving control.
Background
In general, the driving strategy of automatic driving is a modular component. Roughly divided into 4 levels: (1) a strategic plan layer: generally responsible for the planning of the global path level from the starting point to the end point. The part relates to some related knowledge such as the shortest path, the weighted shortest path, GIS and the like, and the current research and implementation methods are relatively mature; (2) tactical layer decision making: generally, the system is responsible for behavior decisions in a local range in the actual driving process, such as following driving, lane changing, overtaking, accelerating, decelerating and the like; (3) local planning layer: according to the action intention of a tactical decision layer, the tactical decision layer is responsible for generating a safe track (track) which accords with the traffic regulation; (4) a vehicle control layer: the layer mainly adopts an optimal control method according to the generated track, and realizes the minimum deviation tracking of the generated track through controlling the accelerator, the brake and the steering wheel of the vehicle.
The lane change decision and the lane change track generation are respectively key contents in an automatic driving tactical decision layer and a local planning layer, are basic decision behaviors in a plurality of driving scenes, and the safety, the efficiency and the quality of the automatic driving decision, the planning and the control are determined to a great extent by the performance level. The traditional method mainly comprises the following steps: (1) the track switching decision is realized in a rule-based (such as finite state machine) mode, and the track switching track generation is generated by adopting an optimal control theory; (2) the lane change decision and the execution are bound together, learning is carried out in an end-to-end (end-to-end) mode, and the lane change vehicle control action is directly input from the state and output. In the mode (1), because the method is based on the rules, the method is difficult to generalize to the undefined driving scene, and the rule set in the complex scene is difficult to define, even cannot be realized; although the method (2) is very efficient in decision making and can be well generalized to an undefined scene, the safety of lane changing cannot be completely guaranteed due to the pure learning-based method. In addition, the autopilot strategy is "hierarchical" in nature, i.e., the driving intent is generated first, then the trajectory is generated and the vehicle is controlled according to the intent, and if the decision and control are directly tied together, it is difficult to establish an efficient decision and control method.
Disclosure of Invention
The invention aims to solve the problems of poor safety and low efficiency in the existing automatic driving process, and provides a hierarchical reinforcement learning-based automatic driving vehicle lane change decision control method.
The invention relates to a hierarchical reinforcement learning-based automatic driving vehicle lane change decision control method, which comprises the following steps:
step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;
step two, when the movement with the maximum Q valuation is taken as the lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as the continuous following movement, establishing an acceleration decision model for deep Q learning by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position information of the vehicle in the surrounding environment and a reward function corresponding to the following movement, obtaining the following acceleration, and finishing one-time automatic driving decision and control;
thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;
step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;
and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.
Further, in the present invention, the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with the vehicle in the surrounding environment in the first step, the second step, and the third step are as follows:
the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ xleader=|xego-xleaderL, |; wherein x isegoPosition coordinates, x, of the target autonomous vehicle in the lane directionleaderAutomatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;
the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ xtarget=|xego-xtargetL, |; wherein x istargetPosition coordinates of a front vehicle of the target lane along the lane direction;
the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ xfollow=|xego-xfollowL, |; wherein x isfollowPosition coordinates of a rear vehicle of the target lane along the lane direction;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ vego=|vego-vleaderL, |; wherein v isegoSpeed of the vehicle, v, automatically driven for the purposeleaderAutomatically driving the speed of the vehicle ahead for the current lane target;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ vtarget=|vego-vtargetL, |; wherein v istargetThe speed of the front vehicle of the target lane along the lane direction;
target autonomous vehicle speed: v. ofego
Target autonomous vehicle acceleration: a isego
Further, in the present invention, in the first step, the lane change security reward function is:
Figure BDA0003351846860000031
wherein, w1,w2,w3,w4The weight coefficient of the relative position of the target autonomous vehicle and the vehicle in front of the current lane, the weight coefficient of the relative speed of the target autonomous vehicle and the vehicle in front of the target lane, the weight coefficient of the relative position of the target autonomous vehicle and the vehicle in front of the target lane and the weight coefficient of the relative speed of the target autonomous vehicle and the vehicle in front of the target lane are respectively.
Further, in the present invention, in step one, in the decision neural network with 3 hidden layers, each hidden layer includes 100 neurons.
Further, in the present invention, in the second step, a specific method for establishing an acceleration decision model for deep Q learning is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ xleader,Δxtarget,Δxfollow,Δvego,Δvtarget,vego,aego)
Wherein a represents the longitudinal acceleration needed to be decided;
following reward function:
Rdis=-wdis.|xleader-xegoequation two
Rv=wv.|vleader-vegoEquation III
Rc=Rdis+RvFormula four
Wherein R isdis,RvRespectively representing a distance-related reward function and a speed-related reward function of the following state; w is adisAnd wvWeights corresponding to the distance reward and the speed reward of the following state are respectively set; rcA composite reward representing the following state in relation to distance and speed;
final Q estimation by the acceleration decision model:
Q(s,a)=A(s).(B(s)-Rc|a)2+ C(s) formula five
Wherein R isc| a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.
Further, in the third step, the specific method for establishing the acceleration decision model of deep Q learning by using the actual driving scene information and speed of the autonomous vehicle and the reward function corresponding to the lane change action of the vehicle in the surrounding environment is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ xleader,Δxtarget,ΔxfollwΔvego,Δvtarget,vego,aego)
Wherein a represents the longitudinal acceleration needed to be decided;
lane change reward function:
rdis=-wdis.|min(Δxleader,Δxtarget)-Δxfollowequation six
rv=-wv.|min(vleader,vtarget)-vegoFormula seven |
RA=rdis+rvEquation eight
Wherein r isdis,rvRespectively representing rewards related to distance and speed when changing the lane; w is adisAnd wvRespectively rewarding distance and speed corresponding to the weight in the lane changing state; rARepresenting a composite reward related to distance and speed when changing lane status;
final Q value of the acceleration decision model:
Q(s,a)=A(s).(B(s)-RA|a)2+ C(s) formula nine
Wherein R isA| a represents the instant reward obtained in the lane change state under the condition that the acceleration takes a, A(s), B(s), C(s) are respectively 3 sub-total rewards under the current state sThe output of the neural network is connected.
Further, in the fourth step of the present invention, a reference lane change trajectory generated by using the acceleration information of the lane change action and adopting a polynomial curve of degree 5 is:
x(t)=a5t5+a4t4+a3t3+a2t2+a1t+a0formula ten
y(t)=b5t5+b4t4+b3t3+b2t2+b1t+b0Formula eleven
Wherein x (t) is the position coordinate of the track point along the transverse direction of the road at the moment t, y (t) is the position coordinate of the track point along the longitudinal direction of the road at the moment t, t is time, and the parameter a1,...,a5,b1,...,b5By the expectation function:
Figure BDA0003351846860000051
is determined by changing a1,...,a5,b1,...,b5The value of the target value (A) optimizes an expectation function, so that the distance and risk of the acceleration a of the expectation function corresponding to a reference track at the time T under the constraint of a track planning boundary and the constraint of a traffic speed limit are minimized, and the comfort is maximized, wherein T is a time window of reference track changing planning,
Figure BDA0003351846860000052
travel distance term, w, representing a reference lane change trajectorydP (dangerous | a, t) represents a security risk term for the reference lane change trajectory, wcP (comfort | a, t) denotes a comfort term referring to the lane change trajectory, wd,wcWeight of risk term and weight of comfort, w, respectively, of the reference trajectorycP (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.
Further, in the present invention, the constraint conditions of the trajectory planning boundary are specifically: bringing the reference trajectory within the lane line:
Figure BDA0003351846860000053
wherein x ismin、ymin、xmaxAnd ymaxRespectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.
Further, in the present invention, the traffic speed limit constraint condition is specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:
Figure BDA0003351846860000054
wherein upsilon isx,minυx,max,υy,min,υy,maxRespectively representing the range of allowable speeds of the autonomous vehicle in both x and y directions.
Further, in the invention, in the fifth step, a pure tracking control method is adopted, and a concrete method for controlling the automatic driving vehicle to execute the lane changing action is as follows:
and controlling the steering wheel angle in the lane changing process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the generated reference lane changing track:
Figure BDA0003351846860000055
Figure BDA0003351846860000056
delta (t) is a steering wheel rotation angle calculated by a pure tracking control algorithm at the moment t; α (t) is an actual steering wheel angle; ldIs the distance viewed forward and L is the wheelbase of the vehicle.
The method of the invention combines the advantages of generalization and optimal control based on learning mode, and simultaneously, because the lane change decision and the acceleration decision are processed hierarchically by adopting two models, the lane change decision model and the acceleration decision model are adopted to utilize the Q-estimation neural network, so that the processing is more efficient and more accurate, and the method is essentially closer to the human driving lane change behavior of 'lane change intention generation → lane change track generation → lane change action execution', thereby being capable of generating safer, more robust and more efficient decision and control output.
Drawings
FIG. 1 is a schematic diagram of an automated driving lane change based decision and control method of the present invention;
FIG. 2 is a schematic view of lane change scene parameters; in the figure, ego is the target autonomous vehicle, leader is the vehicle ahead of the target autonomous vehicle in the current lane, target is the vehicle ahead of the target autonomous vehicle in the target lane, and follow is the vehicle behind the target autonomous vehicle in the target lane;
FIG. 3 is a network architecture diagram of an acceleration decision model.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The first embodiment is as follows: the present embodiment is described below with reference to fig. 1, and the method for controlling a lane change decision of an autonomous vehicle based on hierarchical reinforcement learning in the present embodiment includes:
step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position and relative speed information of the vehicle in the surrounding environment, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;
step two, when the movement with the maximum Q valuation is taken as the lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as the continuous following movement, establishing an acceleration decision model for deep Q learning by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position information of the vehicle in the surrounding environment and a reward function corresponding to the following movement, obtaining the following acceleration, and finishing one-time automatic driving decision and control;
thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;
step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;
and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.
In the present embodiment, the input is a lane change request/command and environmental state information. The need to change lanes may come from a higher level of behavioral decision, such as in the case of a cut-in, which is attempted to trigger a lane change request or instruction due to the fact that the vehicle ahead of the lane in which the autonomous target vehicle is located is traveling too slowly, and the autonomous vehicle is gaining a higher driving efficiency benefit. Meanwhile, environmental information (mainly information such as relative position and speed of surrounding environmental vehicles) around the target autonomous vehicle must be synchronously input, and the environmental information is the basis of lane changing decision of the autonomous vehicle.
The method adopts the framework of two decision models, including a lane change decision model and an acceleration decision model. And after receiving the lane change requirement and the environmental state information, the lane change decision model determines whether to change lanes, adjusts the longitudinal acceleration (decision) of the automatic driving vehicle, and further executes the following and lane change behaviors.
Further, the present embodiment is described with reference to fig. 2, and in the present embodiment, the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with respect to the vehicle in the surrounding environment in the first step, the second step, and the third step are:
the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ xleader=|xego-xleaderL, |; wherein x isegoPosition coordinates, x, of the target autonomous vehicle in the lane directionleaderAutomatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;
the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ xtarget=|xego-xtargetL, |; wherein x istargetPosition coordinates of a front vehicle of the target lane along the lane direction;
the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ xfollow=|xego-xfollowL, |; wherein x isfollowPosition coordinates of a rear vehicle of the target lane along the lane direction;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ vego=|vego-vleaderL, |; wherein v isegoSpeed of the vehicle, v, automatically driven for the purposeleaderAutomatically driving the speed of the vehicle ahead for the current lane target;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ vtarget=|vego-vtargetL, |; wherein v istargetThe speed of the front vehicle of the target lane along the lane direction;
target autonomous vehicle speed: v. ofego
Target autonomous vehicle acceleration: a isego
In this embodiment, a schematic diagram of a lane change environment state definition is shown in fig. 2, where Ego is an autonomous vehicle, and the other vehicles are background vehicles. Each vehicle has its own state, including positionAbscissa, position ordinate, velocity and acceleration 4 pieces of information. Ambient state s ═ Δ xleader,Δxtarget,Δxfollow,Δvego,Δvtarget,vego,aego)。
Further, in the present embodiment, in the step one, the lane change security reward function is:
Figure BDA0003351846860000081
wherein, w1,w2,w3,w4The weight coefficients of the relative position of the target automatic driving vehicle and the front vehicle of the current lane, the weight coefficient of the relative speed of the target automatic driving vehicle and the front vehicle of the target lane, the relative position of the target automatic driving vehicle and the front vehicle of the target lane and the weight coefficient of the relative speed of the target automatic driving vehicle and the front vehicle of the target lane are respectively;
in this embodiment, w1=0.4,w2=0.6,w3=0.4,w4=0.6。
Further, as described with reference to fig. 2, in the first step of the present embodiment, in the decision neural network with 3 hidden layers, each hidden layer includes 100 neurons.
Further, in the present embodiment, in the second step, a specific method for establishing the acceleration decision model for deep Q learning is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ xleader,Δxtarget,Δxfollow,Δvego,Δvtarget,vego,aego)
Wherein a represents the longitudinal acceleration needed to be decided;
following reward function:
Rdis=-wdis.|xleader-xego' GongFormula II
Rv=-wv.|vleader-vegoEquation III
Rc=Rdis+RvFormula four
Wherein R isdis,RvRespectively representing a distance-related reward function and a speed-related reward function of the following state; w is adisAnd wvWeights corresponding to the distance reward and the speed reward of the following state are respectively set; rcA composite reward representing the following state in relation to distance and speed;
final Q estimation by the acceleration decision model:
Q(s,a)=A(s).(B(s)-Rc|a)2+ C(s) formula five
Wherein R isc| a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.
Further, in the present embodiment, in step three, a specific method of establishing an acceleration decision model for deep Q learning is performed by using an incentive function corresponding to lane change actions and actual driving scene information and speed of an autonomous vehicle, and relative position information of vehicles in the surrounding environment:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ xleader,Δxtarget,Δxfollow,Δvego,Δvtarget,vego,aego)
Wherein a represents the longitudinal acceleration needed to be decided;
lane change reward function:
rdis=-wdis.|min(Δxleader,Δxtarget)-Δxfollowequation six
rv=-wv.|min(vleader,vtarget)-vegoFormula seven |
RA=rdis+rvEquation eight
Wherein r isdis,rvRespectively representing rewards related to distance and speed when changing the lane; w is adisAnd wvRespectively rewarding distance and speed corresponding to the weight in the lane changing state; rARepresenting a composite reward associated with distance and speed when changing lane status.
Final Q value of the acceleration decision model:
Q(s,a)=A(s).(B(s)-RA|a)2+ C(s) formula nine
Wherein R isAAnd | a represents the instant reward obtained in the lane change state under the condition that the acceleration is a, and A(s), B(s) and C(s) are respectively the output of 3 sub fully-connected neural networks in the current state s.
In this embodiment, the acceleration decision model receives a decision output from the lane change decision model, i.e., whether to change lanes. And if the lane is not changed, triggering the following behavior, and if the lane is changed, triggering the lane changing behavior. As shown in fig. 1, the acceleration decision model is responsible for deciding a longitudinal acceleration (continuous value along the road direction), and the acceleration decision model is responsible for generating a safe track and then controlling the vehicle to track the generated track. In this embodiment, the actual driving scene information, the speed, and the relative position information of the vehicle in the surrounding environment of the autonomous vehicle are the environment states, the three sub fully-connected neural networks include three sub fully-connected neural networks, and each sub fully-connected neural network includes 200 neurons.
Further, in the fourth step of the present embodiment, the acceleration information of the lane change operation is utilized, and a 5 th-order polynomial curve is adopted to generate a reference lane change track as follows:
x(t)=a5t5+a4t4+a3t3+a2t2+a1t+a0formula ten
y(t)=b5t5+b4t4+b3t3+b2t2+b1t+b0PublicFormula eleven
Wherein x (t) is the position coordinate of the track point at the time t along the longitudinal direction of the road, y (t) is the position coordinate of the track point at the time t along the transverse direction of the road, t is time, and the parameter a1,...,a5,b1,...,b5By the expectation function:
Figure BDA0003351846860000101
is determined by changing a1,...,a5,b1,...,b5The value of the target value (A) optimizes an expectation function, so that the distance and risk of the acceleration a of the expectation function corresponding to a reference track at the time T under the constraint of a track planning boundary and the constraint of a traffic speed limit are minimized, and the comfort is maximized, wherein T is a time window of reference track changing planning,
Figure BDA0003351846860000102
travel distance term, w, representing a reference lane change trajectorydP (dangerous | a, t) represents a security risk term for the reference lane change trajectory, wcP (comfort | a, t) denotes a comfort term referring to the lane change trajectory, wd,wcWeight of risk term and weight of comfort, w, respectively, of the reference trajectorycP (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.
In the present embodiment, the trajectory of the autonomous vehicle following or changing lanes is planned next based on the acceleration a output by the acceleration decision model. The planning of the track is based on two indexes, which are respectively: safety and comfort. Firstly, a reference track-changing track is generated by adopting a polynomial curve of degree 5, and the safety is embodied by the distance and the risk of the reference track.
Further, in this embodiment, the constraint conditions of the trajectory planning boundary are specifically: bringing the reference trajectory within the lane line:
Figure BDA0003351846860000111
wherein x ismin、ymin、xmaxAnd ymaxRespectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.
Further, in this embodiment, the traffic speed limit constraint condition is specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:
Figure BDA0003351846860000112
wherein v isx,min、vx,max、vy,minAnd vy,maxRepresenting the range of allowable speeds of the autonomous vehicle in both the lateral y and longitudinal x directions of the road, respectively.
Further, in the present embodiment, in the fifth step, a pure tracking control method is adopted, and a specific method of controlling the autonomous vehicle to perform the lane change operation is as follows:
and controlling the steering wheel angle in the lane changing process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the generated reference lane changing track:
Figure BDA0003351846860000113
Figure BDA0003351846860000114
delta (t) is a steering wheel rotation angle calculated by a pure tracking control algorithm at the moment t; α (t) is an actual steering wheel angle; ldIs the distance viewed forward and L is the wheelbase of the vehicle.
Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims (10)

1. A decision control method for lane change of an automatic driving vehicle based on layered reinforcement learning is characterized by comprising the following steps:
step one, establishing a decision neural network with 3 hidden layers by utilizing the speed in the actual driving scene of the automatic driving vehicle, the relative position of the automatic driving vehicle in the surrounding environment and the relative speed information, and training and fitting a Q valuation function to the decision neural network by utilizing a lane changing safety reward function to obtain the action of the maximum Q valuation;
step two, when the movement with the maximum Q valuation is taken as the lane changing movement, executing step three, when the movement with the maximum Q valuation is taken as the continuous following movement, establishing an acceleration decision model for deep Q learning by utilizing the speed in the actual driving scene of the automatic driving vehicle and the relative position information of the vehicle in the surrounding environment and a reward function corresponding to the following movement, obtaining the following acceleration, and finishing one-time automatic driving decision and control;
thirdly, establishing an acceleration decision model for deep Q learning by using the speed in the actual driving scene of the automatic driving vehicle and the reward function corresponding to the relative position information of the vehicles in the surrounding environment and the lane changing action; acquiring acceleration information of a lane changing action;
step four, generating a reference lane changing track by using the acceleration information of the lane changing action and adopting a 5 th-order polynomial curve;
and step five, controlling the automatic driving vehicle to execute a lane changing action by adopting a pure tracking control method, and finishing one-time automatic driving lane changing decision and control.
2. The method as claimed in claim 1, wherein the speed in the actual driving scene of the autonomous vehicle and the relative position and relative speed information with the vehicle in the surrounding environment in the first step, the second step and the third step are:
the relative position of the target automatic driving vehicle and the front vehicle of the current lane is as follows: Δ xleader=|xego-xleaderL, |; wherein x isegoPosition coordinates, x, of the target autonomous vehicle in the lane directionleaderAutomatically driving position coordinates of a front vehicle of the vehicle along the lane direction for a current lane target;
the relative position of the target automatic driving vehicle and the front vehicle of the target lane is as follows: Δ xtarget=|xego-xtargetL, |; wherein x istargetPosition coordinates of a front vehicle of the target lane along the lane direction;
the relative position of the target automatic driving vehicle and the rear vehicle of the target lane is as follows: Δ xfollow=|xego-xfollowL, |; wherein x isfollowPosition coordinates of a rear vehicle of the target lane along the lane direction;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ vego=|vego-vleaderL, |; wherein v isegoSpeed of the vehicle, v, automatically driven for the purposeleaderAutomatically driving the speed of the vehicle ahead for the current lane target;
relative speed of the target autonomous vehicle and the target lane front vehicle: Δ vtarget=|vego-vtargetL, |; wherein v istargetThe speed of the front vehicle of the target lane along the lane direction;
target autonomous vehicle speed: v. ofego
Target autonomous vehicle acceleration: a isego
3. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 2, wherein in the first step, the lane change safety reward function is as follows:
Figure FDA0003351846850000021
wherein, w1,w2,w3,w4The weight coefficient of the relative position of the target autonomous vehicle and the vehicle in front of the current lane, the weight coefficient of the relative speed of the target autonomous vehicle and the vehicle in front of the target lane, the weight coefficient of the relative position of the target autonomous vehicle and the vehicle in front of the target lane and the weight coefficient of the relative speed of the target autonomous vehicle and the vehicle in front of the target lane are respectively.
4. The method as claimed in claim 2 or 3, wherein in step one, in the decision neural network with 3 hidden layers, each hidden layer comprises 100 neurons.
5. The automatic driving vehicle lane change decision control method based on the hierarchical reinforcement learning according to claim 2 or 3, characterized in that in the second step, a specific method for establishing an acceleration decision model of the deep Q learning is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ xleader,Δxtarget,Δxfollow,Δvego,Δvtarget,vego,aego)
Wherein a represents the longitudinal acceleration needed to be decided;
following reward function:
Rdis=-wdis·|xleader-xegoequation two
Rv=-wv·|vleadeer-vegoEquation III
Rc=Rdis+RvFormula four
Wherein R isdis,RvRespectively representing a distance-related reward function and a speed-related reward function of the following state; w is adisAnd wvWeights corresponding to the distance reward and the speed reward of the following state are respectively set; rcA composite reward representing the following state in relation to distance and speed;
final Q estimation by the acceleration decision model:
Q(s,a)=A(s).(B(s)-Rc|a)2+ C(s) formula five
Wherein R isc| a represents the comprehensive reward obtained in the following state under the condition that the acceleration is a; a,(s), B,(s), C(s) are respectively the output of 3 sub fully-connected neural networks under the current state s.
6. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 2 or 3, wherein in the third step, a concrete method for establishing an acceleration decision model of deep Q learning by using the actual driving scene information, speed of the automatic driving vehicle and the reward function corresponding to the lane change action of the relative position information of the vehicle in the surrounding environment is as follows:
taking the environment state as an input, the final Q estimation value of the acceleration decision model is obtained through 3 sub fully-connected neural networks A, B, C respectively:
environmental state: s ═ Δ xleader,Δxtarget,Δxfollow,Δvego,Δvtarget,vego,aego)
Wherein a represents the longitudinal acceleration needed to be decided;
lane change reward function:
rdis=-wdis·|min(Δxleader,Δxtarget)-Δxfollowequation six
rv=wv·|min(vleader,vtarget)-vegoFormula seven |
RA=rdis+rvEquation eight
Wherein r isdis,rvRespectively generation by generationAwards related to distance and speed when the lane change state is indicated; w is adisAnd wvRespectively rewarding distance and speed corresponding to the weight in the lane changing state; rARepresenting a composite reward related to distance and speed when changing lane status;
final Q value of the acceleration decision model:
Q(s,a)=A(s).(B(s)-RA|a)2+ C(s) formula nine
Wherein R isAAnd | a represents the instant reward obtained in the lane change state under the condition that the acceleration is a, and A(s), B(s) and C(s) are respectively the output of 3 sub fully-connected neural networks in the current state s.
7. The automatic driving vehicle lane change decision control method based on the hierarchical reinforcement learning as claimed in claim 2 or 6, characterized in that in step four, the acceleration information of the lane change action is utilized, and a 5 th-order polynomial curve is adopted to generate a reference lane change track as follows:
x(t)=a5t5+a4t4+a3t3+a2t2+a1t+a0formula ten
y(t)=b5 5+b4t4+b3t3+b2t2+b1t+b0Formula eleven
Wherein x (t) is the position coordinate of the track point along the transverse direction of the road at the moment t, y (t) is the position coordinate of the track point along the longitudinal direction of the road at the moment t, t is time, and the parameter a1,...,a5,b1,...,b5By the expectation function:
Figure FDA0003351846850000041
is determined by changing a1,...,a5,b1,...,b5The value of (a) optimizes the expectation function, so that the expectation function is at t under the conditions of the trajectory planning boundary constraint and the traffic speed limit constraintThe distance and the risk of the acceleration a of the moment corresponding to the reference track are minimized, and the comfort is maximized, wherein T is the time window of the reference track changing planning,
Figure FDA0003351846850000042
travel distance term, w, representing a reference lane change trajectorydP (dangerous | a, t) represents a security risk term for the reference lane change trajectory, wcP (comfort | a, t) denotes a comfort term referring to the lane change trajectory, wd,wcWeight of risk term and weight of comfort, w, respectively, of the reference trajectorycP (dangerous | a, t) is the probability of safety risk in the objective function, and P (comfort | a, t) is the degree of comfort in the objective function.
8. The method for controlling the lane change decision of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 6, wherein the constraint conditions of the trajectory planning boundary are specifically as follows: bringing the reference trajectory within the lane line:
Figure FDA0003351846850000043
wherein x ismin、ymin、xmaxAnd ymaxRespectively representing the boundary coordinates of the lane lines corresponding to the current vehicle.
9. The automatic driving vehicle lane change decision control method based on the hierarchical reinforcement learning as claimed in claim 6, wherein the traffic speed limit constraint conditions are specifically: and (3) making the site speed at any point of the reference track not exceed the traffic speed limit value:
Figure FDA0003351846850000051
wherein v isx,min、vx,max、vy,minAnd vy,maxRespectively indicating automatic drivingThe allowable range of vehicle speed in both the lateral y and longitudinal x directions of the road.
10. The decision-making method for lane change of the automatic driving vehicle based on the hierarchical reinforcement learning as claimed in claim 2, wherein in the fifth step, a pure tracking control method is adopted to control the automatic driving vehicle to execute a specific method of lane change action:
controlling the steering wheel angle in the lane changing process of the automatic driving vehicle by adopting a pure tracking control algorithm according to the reference lane changing track:
Figure FDA0003351846850000052
Figure FDA0003351846850000053
delta (t) is a steering wheel rotation angle calculated by a pure tracking control algorithm at the moment t; alpha (t) is the actual steering wheel angle of the automatic driving vehicle at the time t; ldIs the distance viewed forward and L is the wheelbase of the vehicle.
CN202111339265.0A 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning Active CN114013443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111339265.0A CN114013443B (en) 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111339265.0A CN114013443B (en) 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Publications (2)

Publication Number Publication Date
CN114013443A true CN114013443A (en) 2022-02-08
CN114013443B CN114013443B (en) 2022-09-23

Family

ID=80063836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111339265.0A Active CN114013443B (en) 2021-11-12 2021-11-12 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning

Country Status (1)

Country Link
CN (1) CN114013443B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114802307A (en) * 2022-05-23 2022-07-29 哈尔滨工业大学 Intelligent vehicle transverse control method under automatic and manual hybrid driving scene
CN114880938A (en) * 2022-05-16 2022-08-09 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN115082900A (en) * 2022-07-19 2022-09-20 湖南大学无锡智能控制研究院 Intelligent vehicle driving decision system and method in parking lot scene
CN115116249A (en) * 2022-06-06 2022-09-27 苏州科技大学 Method for estimating different permeability and road traffic capacity of automatic driving vehicle
WO2023141940A1 (en) * 2022-01-28 2023-08-03 华为技术有限公司 Intelligent driving method and device, and vehicle
CN117275240A (en) * 2023-11-21 2023-12-22 之江实验室 Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109901574A (en) * 2019-01-28 2019-06-18 华为技术有限公司 Automatic Pilot method and device
CN110532846A (en) * 2019-05-21 2019-12-03 华为技术有限公司 Automatic lane-change method, apparatus and storage medium
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110716562A (en) * 2019-09-25 2020-01-21 南京航空航天大学 Decision-making method for multi-lane driving of unmanned vehicle based on reinforcement learning
US20200062262A1 (en) * 2018-08-24 2020-02-27 Ford Global Technologies, Llc Vehicle action control
US20200070844A1 (en) * 2018-08-30 2020-03-05 Honda Motor Co., Ltd. Learning device, learning method, and storage medium
CN111273668A (en) * 2020-02-18 2020-06-12 福州大学 Unmanned vehicle motion track planning system and method for structured road
US20200189590A1 (en) * 2018-12-18 2020-06-18 Beijing DIDI Infinity Technology and Development Co., Ltd Systems and methods for determining driving action in autonomous driving
WO2020159247A1 (en) * 2019-01-31 2020-08-06 엘지전자 주식회사 Image output device
CN112498354A (en) * 2020-12-25 2021-03-16 郑州轻工业大学 Multi-time scale self-learning lane changing method considering personalized driving experience

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200062262A1 (en) * 2018-08-24 2020-02-27 Ford Global Technologies, Llc Vehicle action control
US20200070844A1 (en) * 2018-08-30 2020-03-05 Honda Motor Co., Ltd. Learning device, learning method, and storage medium
US20200189590A1 (en) * 2018-12-18 2020-06-18 Beijing DIDI Infinity Technology and Development Co., Ltd Systems and methods for determining driving action in autonomous driving
CN109901574A (en) * 2019-01-28 2019-06-18 华为技术有限公司 Automatic Pilot method and device
WO2020159247A1 (en) * 2019-01-31 2020-08-06 엘지전자 주식회사 Image output device
CN110532846A (en) * 2019-05-21 2019-12-03 华为技术有限公司 Automatic lane-change method, apparatus and storage medium
CN110716562A (en) * 2019-09-25 2020-01-21 南京航空航天大学 Decision-making method for multi-lane driving of unmanned vehicle based on reinforcement learning
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN111273668A (en) * 2020-02-18 2020-06-12 福州大学 Unmanned vehicle motion track planning system and method for structured road
CN112498354A (en) * 2020-12-25 2021-03-16 郑州轻工业大学 Multi-time scale self-learning lane changing method considering personalized driving experience

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023141940A1 (en) * 2022-01-28 2023-08-03 华为技术有限公司 Intelligent driving method and device, and vehicle
CN114880938A (en) * 2022-05-16 2022-08-09 重庆大学 Method for realizing decision of automatically driving automobile behavior
CN114802307A (en) * 2022-05-23 2022-07-29 哈尔滨工业大学 Intelligent vehicle transverse control method under automatic and manual hybrid driving scene
CN114802307B (en) * 2022-05-23 2023-05-05 哈尔滨工业大学 Intelligent vehicle transverse control method under automatic and manual mixed driving scene
CN115116249A (en) * 2022-06-06 2022-09-27 苏州科技大学 Method for estimating different permeability and road traffic capacity of automatic driving vehicle
CN115116249B (en) * 2022-06-06 2023-08-01 苏州科技大学 Method for estimating different permeability and road traffic capacity of automatic driving vehicle
CN115082900A (en) * 2022-07-19 2022-09-20 湖南大学无锡智能控制研究院 Intelligent vehicle driving decision system and method in parking lot scene
CN115082900B (en) * 2022-07-19 2023-06-16 湖南大学无锡智能控制研究院 Intelligent vehicle driving decision system and method in parking lot scene
CN117275240A (en) * 2023-11-21 2023-12-22 之江实验室 Traffic signal reinforcement learning control method and device considering multiple types of driving styles
CN117275240B (en) * 2023-11-21 2024-02-20 之江实验室 Traffic signal reinforcement learning control method and device considering multiple types of driving styles

Also Published As

Publication number Publication date
CN114013443B (en) 2022-09-23

Similar Documents

Publication Publication Date Title
CN114013443B (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN110969848B (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
Sun et al. A fast integrated planning and control framework for autonomous driving via imitation learning
Chen et al. Conditional DQN-based motion planning with fuzzy logic for autonomous driving
Zhang et al. Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning
CN114407931B (en) Safe driving decision method for automatic driving operation vehicle of high class person
CN110187639A (en) A kind of trajectory planning control method based on Parameter Decision Making frame
Huang et al. Use of neural fuzzy networks with mixed genetic/gradient algorithm in automated vehicle control
CN111679660B (en) Unmanned deep reinforcement learning method integrating human-like driving behaviors
CN110304074A (en) A kind of hybrid type driving method based on stratification state machine
CN114564016A (en) Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning
CN113581182B (en) Automatic driving vehicle lane change track planning method and system based on reinforcement learning
Xu et al. A nash Q-learning based motion decision algorithm with considering interaction to traffic participants
CN113255998B (en) Expressway unmanned vehicle formation method based on multi-agent reinforcement learning
CN112389436A (en) Safety automatic driving track-changing planning method based on improved LSTM neural network
CN113264043A (en) Unmanned driving layered motion decision control method based on deep reinforcement learning
Yan et al. A multi-vehicle game-theoretic framework for decision making and planning of autonomous vehicles in mixed traffic
CN113511222A (en) Scene self-adaptive vehicle interactive behavior decision and prediction method and device
Zhang et al. Structured road-oriented motion planning and tracking framework for active collision avoidance of autonomous vehicles
Xu et al. An actor-critic based learning method for decision-making and planning of autonomous vehicles
Ruan et al. Longitudinal planning and control method for autonomous vehicles based on a new potential field model
CN113353102B (en) Unprotected left-turn driving control method based on deep reinforcement learning
Cardamone et al. Advanced overtaking behaviors for blocking opponents in racing games using a fuzzy architecture
Bellingard et al. Adaptive and Reliable Multi-Risk Assessment and Management Control Strategy for Autonomous Navigation in Dense Roundabouts
Gutiérrez-Moreno et al. Hybrid decision making for autonomous driving in complex urban scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant