CN112498354A - Multi-time scale self-learning lane changing method considering personalized driving experience - Google Patents
Multi-time scale self-learning lane changing method considering personalized driving experience Download PDFInfo
- Publication number
- CN112498354A CN112498354A CN202011561553.6A CN202011561553A CN112498354A CN 112498354 A CN112498354 A CN 112498354A CN 202011561553 A CN202011561553 A CN 202011561553A CN 112498354 A CN112498354 A CN 112498354A
- Authority
- CN
- China
- Prior art keywords
- vehicle
- data
- time scale
- learning
- driver
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W30/00—Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
- B60W30/18—Propelling the vehicle
- B60W30/18009—Propelling the vehicle related to particular drive situations
- B60W30/18163—Lane change; Overtaking manoeuvres
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0019—Control system elements or transfer functions
- B60W2050/0028—Mathematical models, e.g. for simulation
- B60W2050/0031—Mathematical model of the vehicle
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Human Computer Interaction (AREA)
- Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention discloses a multi-time scale self-learning lane changing method considering personalized driving experience, which comprises the following steps of: the first step is preparation; the second step is off-line learning; the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm and learn the driving habits of the driver on line, updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, introduces transition probability through the Markov decision lane change model to capture variation among and inside the individual, enables the automatic control output of the electric control device on lane change to gradually approach the driving habits of the driver of the host vehicle, and improves the driving experience. The invention adopts a learning structure combining an off-line strategy and an on-line strategy, considers both generality and particularity, and is very consistent with the characteristics of L4-level intelligent driving.
Description
Technical Field
The invention belongs to the field of intelligent driving, and particularly relates to a multi-time scale self-learning lane changing method considering personalized driving experience.
Background
In the intelligent driving field, along with the development of vehicle intellectualization, the intelligent control unit and the driver share the bottom control right of the vehicle more and more, the intelligent automobile can hardly avoid taking the right of the driver, or the intelligent automobile interferes the driver at an important moment to make a control strategy which is beneficial to the benefit of the driver, and then potential safety hazards are caused. Therefore, smart cars cannot ignore the understanding and perception of the highest decision maker of the vehicle, i.e., the driver.
The advanced driving assistance system at the present stage already has a function of monitoring driving behaviors preliminarily through detection of a driver state, a vehicle and a surrounding environment.
However, from the viewpoint of man-machine driving of the intelligent vehicle, personalized differences of different drivers and different driving states of the same driver are not considered, and it is still difficult to meet the requirement of intelligent driving of the vehicle.
For example, a lane change system detects the distance and relative speed between a vehicle ahead and a surrounding vehicle using radar, and performs a lane change if the distance and relative speed are higher than a set threshold (risk of collision may occur), otherwise does not perform a lane change.
This requires that the car manufacturer must base its extensive experimentation on finding set thresholds reflecting the average driving behaviour and reaction time of humans.
However, different drivers will observe the relative distance and vehicle speed and choose to slow down or change lanes according to personal dynamic preferences. Therefore, the intelligent lane-changing method not only needs to "infer" and "learn" the personalized dynamic preferences of human beings, but also needs to continuously "adapt" and "take action" so as to realize the self-learning lane-changing method considering the personalized driving experience.
The reinforcement learning is a machine learning method independent of an environment model and prior knowledge, and can continuously optimize a control strategy through a trial and error and delay return mechanism, so that a feasible scheme is provided for developing an individualized self-learning lane change method.
However, reinforcement learning also faces the following challenges in developing a specific application of the personalized self-learning lane-changing method:
first, for general reinforcement learning, performing the same action for the same environmental state will receive the same reward. The actual reward is dynamically changed along with the individual preference of the driver, and the driver can take corresponding actions (such as braking and decelerating) or accelerate lane change when facing the lane change decision.
Thus, the rewards earned by reinforcement learning do not always reflect their own behavior, but are influenced by driver personalized driving behavior decisions.
Furthermore, general reinforcement learning assumes that the reward earned in each step is due to the corresponding action performed in that step. In practice the driver response (in seconds) is not always instantaneous, depending on his personalized dynamic preferences, and varies from moment to moment. Thus, evaluating the reward may require several steps until the effect of performing the action is observed and the corresponding reward value is received.
Disclosure of Invention
The invention aims to solve the problem of poor driving experience caused by lack of consideration of personalized differences in the conventional active lane change, and provides a multi-time scale self-learning lane change method considering the personalized driving experience.
In order to achieve the purpose, the multi-time scale self-learning lane-changing method considering the personalized driving experience is carried out according to the following steps:
the first step is preparation;
establishing an individualized driving experience data set, a multi-time scale neural network, a multi-time scale self-learning algorithm, a Markov decision-based lane change model and a dynamic time-varying reward function considering driver preference in an electric control device of a host vehicle;
the personalized driving experience data set comprises environmental vehicle data, control data and a driver preference metric matrix; the environmental vehicle data and the control data are derived from public data;
the second step is off-line learning;
before a host vehicle is started for the first time, enabling a multi-time scale neural network to read environmental vehicle data and control data in an individualized driving experience data set, and establishing a mapping relation from the environmental vehicle data to the control data;
the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm and learns the driving habits of the driver on line, and updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, so that the automatic control output of the electric control device to the lane change is gradually close to the driving habits of the driver of the host vehicle, and the driving experience of the driver is improved.
The environmental vehicle data in the first step includes xt,yt,φt,Δxt,ΔytAnd Δ vt;
Wherein x ist,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
the control data includes vehicle steering wheel target angle data and vehicle target speed data;
the driver preference metric matrix is given by formula one:
whereinIs the driver discomfort threshold for the longitudinal acceleration,is the longitudinal deceleration threshold, | ayI is the lateral acceleration threshold, | zxI and I zyL is the maximum allowed longitudinal and transverse impulse, respectively;
in the second step, the multi-time scale neural network is represented by formula two:
(ii) a Wherein f (x, u/w) is a nonlinear function of system output, x and u are respectively a system state and an input, w is a weight of the neural network,is a time scale factor that can be adaptively varied; tau isσObeying Gaussian normal distribution N, T0,σ0The mean and variance vectors are Gaussian normal distribution N;
in the second step, the electric control device performs off-line learning according to the individualized driving experience data set through the multi-time scale neural network, and obtains a mapping relation from environmental vehicle data to control data in an off-line state;
in the third step, the multi-time scale self-learning algorithm is as follows:
3.1, initializing parameters; the electric control device initializes a discount parameter gamma, a learning step length alpha, an exploration parameter epsilon and a multi-time scale parameter ts≤ta≤tlAnd event Q (s, a) ═ 1 (lane change) and event Q (s, a) ═ 0 (lane change) are also included in the event Q (s, a) — 0 (lane change));
tsThe sampling period of the vehicle-mounted sensor is also the period of the lane change model for acquiring information; the vehicle-mounted sensor is used for acquiring environmental vehicle data; t is taThe time interval from the acquisition of information to the output of control data by the electric control device through a multi-time scale self-learning algorithm is adopted; the system comprises a plurality of pieces of environment vehicle data, a control data and a control data, wherein the environment vehicle data comprises vehicle steering wheel target turning angle data and vehicle target speed data; t is tlThe method is a learning and updating period of a multi-time scale self-learning algorithm;
3.2, observing the vehicle state; a lane change model in the electric control device acquires current environmental vehicle data through a vehicle-mounted sensor connected with the electric control device to obtain a current environmental state s; acquiring a current environment state s by a multi-time scale self-learning algorithm in the electric control device through a lane change model;
3.3, executing a control action; multiple time scale self-learning algorithm in electric control device every taSelecting and outputting control data a according to the environment state s by using a greedy algorithm, and controlling the host vehicle to perform lane changing or lane unchanging actions according to the vehicle steering wheel target corner data and the vehicle target speed data in the control data a if the driver does not intervene; and if the driver intervenes, controlling the host vehicle to perform lane changing or lane unchanging actions according to the operation of the driver.
During the third step, every tlTime, a learning and updating operation is performed once;
tau is the current time, s' is the current environmental state, s is the environmental state after acting according to the control data a; t is tau-tlFor at τ -tlAnd all times t between τiCalculating a driver preference metric matrix M to obtain tiReward function of time of daytlIs a learning update period of a multi-time scale self-learning algorithm, tl>ta,
The electric control device correspondingly updates the individualized driving experience data set on line through a sixth formula according to the data, wherein the sixth formula is as follows:
in the formula six, R is a formula five, namely a reward function; wherein s is the environmental state of the host vehicle expressed by formula three; a is the control data that actually occurs; alpha represents the learning step length, and gamma is a discount factor;
every tlSelecting control data a with a probability of 1-epsilon when learning and updating are performed once, or randomly selecting off-line learning with epsilon probability as a new learned action when Q value is maximum; wherein epsilon represents the transition probability, 0 < epsilon < 0.5;
the third formula is stored in the electric control device, and the third formula is as follows: st=[xt,yt,φt,Δxt,Δyt,Δvt](ii) a In the third formula, stThe environmental state of the host vehicle is represented by the direction of the road width, which is the y-axis direction, i.e., the lateral direction, and the direction of the road length, which is the x-axis direction, i.e., the longitudinal directiont,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
if there are a plurality of surrounding vehicles around the host vehicle, Δ xt,Δyt,ΔvtRespectively representing column vectors corresponding to different surrounding vehicles; a formula III forms a lane change model;
the dynamic time-varying reward function that takes into account driver preferences in the first step is:
defining the expression for performing action a as formula four:
at=[δt,vt](ii) a In the fourth formula, δtIs the steering wheel angle, vtIs the speed;
the electric control device stores a reward function expressed by a formula five, wherein the formula five is as follows:
(ii) a In the fifth formula, M*A safety boundary constructed for each parameter range in equation one,anda reference state and an execution action corresponding to the formula one;
and finally, the electric control device trains the multi-time scale neural network in the second step by using the new strategy data learned by the multi-time scale self-learning algorithm, and updates the off-line strategy.
The invention has the following advantages:
the invention provides a multi-time scale self-learning lane changing method considering personalized driving experience, and provides personalized comfortable driving experience for intelligent vehicle users.
The invention has the following advantages:
(1) the invention adopts a learning structure combining an off-line strategy and an on-line strategy, the off-line strategy learned from historical data reflects the general lane changing behavior of personalized driving experience, and the turning and vehicle speed two-dimensional actions generated each time by on-line self-learning consider the particularity of the actual lane changing working condition, so that the learning architecture design not only considers the generality, but also considers the particularity, and is very in line with the characteristics of L4-level intelligent driving;
(2) the invention defines a driver preference measurement matrix M (shown as a formula I) consisting of optimal transverse and longitudinal acceleration and a maximum allowable impulse area of a user, represents a comprehensive result of driver preference and a perception risk level corresponding to dynamic motion in a given environment, and provides an acceptable comfort standard of personalized driving experience;
(3) the Markov decision lane change model is used for capturing variation among changing individuals and in the same individual by introducing the transition probability through the Markov decision lane change model;
(4) a dynamic time-varying reward function considering the preference of a driver is given and applied, so that the self-learning lane-changing method can be evaluated in real time according to the actual operating condition to obtain a self-learning strategy; meanwhile, the multi-time scale self-learning lane-changing method realizes time-sharing operation of state acquisition, action behavior evaluation and action behavior execution, and better accords with the actual decision-making behavior of lane changing when a driver drives a vehicle.
After the third step is continuously carried out and continuously updated, the lane changing method disclosed by the invention is more and more close to the driving habit of a driver in continuous use, so that better automatic driving experience is brought to the driver.
Drawings
FIG. 1 is a functional block diagram of a multi-time scale self-learning lane-change method of the present invention that considers a personalized driving experience;
fig. 2 is a schematic lane change of the present invention.
Detailed Description
As shown in fig. 1 and 2, the multi-time scale self-learning lane-changing method considering the personalized driving experience of the present invention is performed according to the following steps:
the first step is preparation;
establishing an individualized driving experience data set, a multi-time scale neural network, a multi-time scale self-learning algorithm, a Markov decision-based lane change model and a dynamic time-varying reward function considering driver preference in an electric control device of a host vehicle; the electric control device of the host vehicle is an in-vehicle ECU of the host vehicle.
The personalized driving experience data set comprises environmental vehicle data, control data and a driver preference metric matrix; the environmental vehicle data and the control data are derived from public data;
the second step is off-line learning;
before a host vehicle is started for the first time, enabling a multi-time scale neural network to read environmental vehicle data and control data in an individualized driving experience data set, and establishing a mapping relation from the environmental vehicle data to the control data;
the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm, learns the driving habits of the driver on line (namely learns the control data output by the driver under the specific environmental vehicle data), and updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, so that the automatic control output of the electric control device to lane change gradually approaches the driving habits of the driver of the host vehicle, and the driving experience of the driver is improved.
The environmental vehicle data in the first step includes xt,yt,φt,Δxt,ΔytAnd Δ vt;
Wherein x ist,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtDenotes the distance between the host vehicle and the surrounding vehicle along the x-axis (i.e., the distance in the road surface width direction), aytDenotes the distance between the host vehicle and the surrounding vehicle along the y-axis (i.e., the distance in the road surface length direction), Δ νtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
the control data includes vehicle steering wheel target angle data and vehicle target speed data;
the driver preference metric matrix is given by formula one:
whereinIs the driver discomfort threshold for the longitudinal acceleration,is the longitudinal deceleration threshold, | ay| is a transverse acceleration threshold valuezxI and I zyL is the maximum allowed longitudinal and lateral impulse (i.e. rate of change of acceleration), respectively;
in the second step, the multi-time scale neural network is represented by formula two:
(ii) a Wherein f (x, u/w) is a nonlinear function of system output, x and u are respectively a system state and an input, w is a weight of the neural network,is a time scale factor that can be adaptively varied; tau isσObeying Gaussian normal distribution N, T0,σ0The mean and variance vectors are Gaussian normal distribution N;
in the second step, the electric control device performs off-line learning according to the individualized driving experience data set through the multi-time scale neural network, and obtains a mapping relation from environmental vehicle data to control data in an off-line state;
in equation two, all neurons process information according to the newly incoming connection information and their previous internal states, according to the time scale factor τiWeights for retaining the previous information and processing the new information are determined.
The invention introduces learning parameters into time scale factors and provides a time scale capable of changing in a self-adaptive mannerMean and variance vector initial values tau of standard Gaussian normal distribution N0=0,σ0And (1) extracting a time sequence from the Gaussian normal distribution N so as to realize different time scale mapping of the actual multidimensional state signal, performing off-line strategy learning by using an individualized driving experience data set, and obtaining a mapping relation from environmental vehicle data to control data in an off-line state.
In the third step:
standard online reinforcement learning algorithms require immediate evaluation of the performed actions made before the next iteration is performed, whereas the driving preferences of each driver are different for lane-change behavior; the control data output by different drivers is different in the face of the same environmental vehicle data.
Factors such as the degree of attention, the reaction time, and the surrounding environment determine that the control data (the vehicle steering wheel target angle data and the vehicle target speed data) of the control action to be taken based on the same environmental vehicle data are different every lane change even for the same driver. Thus, the evaluation of the execution action may require several recursive steps (looping several times) until the effect of the operation is observed and a corresponding reward is obtained. Therefore, the invention provides a multi-time scale self-learning lane-changing algorithm as follows:
the multi-time scale self-learning algorithm comprises the following steps:
3.1, initializing parameters; the electric control device initializes a discount parameter gamma, a learning step length alpha, an exploration parameter epsilon and a multi-time scale parameter ts≤ta≤tlAnd event Q (s, a) ═ 1 (lane change) and event Q (s, a) ═ 0 (lane change);
tsthe sampling period of the vehicle-mounted sensor is also the period of the lane change model for acquiring information; the vehicle-mounted sensor is used for acquiring environmental vehicle data; t is taThe time interval from the acquisition of information to the output of control data by the electric control device through a multi-time scale self-learning algorithm is adopted; the system comprises a plurality of pieces of environment vehicle data, a control data and a control data, wherein the environment vehicle data comprises vehicle steering wheel target turning angle data and vehicle target speed data; t is tlThe method is a learning and updating period of a multi-time scale self-learning algorithm;
various sensors on the vehicle, such as a speed sensor, a distance sensor, an angle sensor and the like, are all in the prior art, and various environmental vehicle data, including environmental data and state data of the vehicle itself, can be provided for the vehicle-mounted ECU for realizing automatic driving, and details are not described.
3.2, observing the vehicle state; a lane change model in the electric control device acquires current environmental vehicle data through a vehicle-mounted sensor connected with the electric control device to obtain a current environmental state s, wherein the current environmental state s comprises the current environmental vehicle data; acquiring a current environment state s by a multi-time scale self-learning algorithm in the electric control device through a lane change model;
3.3, executing a control action; multiple time scale self-learning algorithm in electric control device every taSelecting and outputting control data a according to the environment state s by using a greedy algorithm, and controlling the host vehicle to perform lane changing or lane unchanging actions according to the vehicle steering wheel target corner data and the vehicle target speed data in the control data a if the driver does not intervene; and if the driver intervenes, controlling the host vehicle to perform lane changing or lane unchanging actions according to the operation of the driver.
The greedy algorithm is a conventional algorithm and will not be described in detail.
3. The multi-time scale self-learning lane-changing method taking personalized driving experience into account of claim 2, characterized in that: during the third step, every tlTime, a learning and updating operation is performed once;
tau is the current time, s' is the current environmental state, s is the environmental state after acting according to the control data a; t is tau-tl(time of action according to control data a) for the time at τ -tlAnd all times t between τiCalculating a driver preference metric matrix M to obtain tiReward function of time of daytlIs a learning update period of a multi-time scale self-learning algorithm, tl>taFrom the actual tsAnd taIs determined by the value of (a), is taAbout three times of;
the electric control device correspondingly updates the individualized driving experience data set on line through a sixth formula according to the data, wherein the sixth formula is as follows:
in the formula six, R is a formula five, namely a reward function; wherein s is the environmental state of the host vehicle expressed by formula three; a is the control data that actually occurs; alpha represents the learning step length, and gamma is a discount factor; after the third step is continuously carried out and continuously updated, the lane changing method disclosed by the invention is more and more close to the driving habit of a driver in continuous use, so that better automatic driving experience is brought to the driver.
When modeling the driver-host vehicle-environment, the differences between individuals and in different states in the individuals must be considered so as to accurately capture relevant real-time state information, take corresponding actions, evaluate the executed actions and realize state updating. For this purpose, a markov-based lane change model is proposed, which takes into account the transition probabilities.
Every tlSelecting actually generated control data a according to each state s by using an epsilon-greedy algorithm when one learning and updating operation is performed in time;
specifically, every tlSelecting control data a with a probability of 1-epsilon when learning and updating are performed once, or randomly selecting off-line learning with epsilon probability as a new learned action when Q value is maximum; wherein epsilon represents the transition probability, 0 < epsilon < 0.5;
ε represents the degree of trade-off between learned (1- ε) and unlearned (ε) values, with ε generally being chosen to be small (0 < ε < 0.5) for conservative considerations.
The third formula is stored in the electric control device, and the third formula is as follows: st=[xt,yt,φt,Δxt,Δyt,Δvt](ii) a In the third formula, stThe environmental state of the host vehicle is represented by the direction of the road width, which is the y-axis direction, i.e., the lateral direction, and the direction of the road length, which is the x-axis direction, i.e., the longitudinal directiont,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytIndicating the distance between the host vehicle and the surrounding vehicle along the y-axisFrom, Δ vtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
if there are a plurality of surrounding vehicles around the host vehicle, Δ xt,Δyt,ΔvtRespectively representing column vectors corresponding to different surrounding vehicles; a formula III forms a lane change model;
the markov decision state transition depends on the driver preference metric matrix M and the environmental state of the host vehicle, with state changes between the two different states being linked by transition probabilities. The transition probabilities are used for capturing variation among the varying individuals and in the same individual, the unknown transition probabilities are updated by adopting a multi-time scale self-learning algorithm, and corresponding actions are taken (lane change is 1 or lane change is not 1);
the dynamic time-varying reward function that takes into account driver preferences in the first step is:
the goal of reinforcement learning is to continually generate strategies to guide the system from a "bad" state to a "good" state. The evaluation of "bad" and "good" captures each executed action in all states by assigning a reward value. Defining the expression for performing action a as formula four:
at=[δt,vt](ii) a In the fourth formula, δtIs the steering wheel angle (in degrees), vtIs the speed (in km/h);
in the general reinforcement learning criteria definition, the reward function is invariant (static); however, in the present invention, the reward function varies with time depending on the driver's personalized driving preference (to decide lane change or no lane change), and for this reason, the electric control device stores the reward function expressed by the formula five:
(ii) a In the fifth formula, M*A safety boundary constructed for each parameter range in the driver preference metric matrix defined by equation one,anda reference state and an execution action corresponding to the formula one;
and finally, the electric control device trains the multi-time scale neural network in the second step by using the new strategy data learned by the multi-time scale self-learning algorithm, and updates the off-line strategy, so that the off-line strategy is closer to the driving habit of the driver.
Although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.
Claims (3)
1. The multi-time scale self-learning lane changing method considering the personalized driving experience is characterized by comprising the following steps of:
the first step is preparation;
establishing an individualized driving experience data set, a multi-time scale neural network, a multi-time scale self-learning algorithm, a Markov decision-based lane change model and a dynamic time-varying reward function considering driver preference in an electric control device of a host vehicle;
the personalized driving experience data set comprises environmental vehicle data, control data and a driver preference metric matrix; the environmental vehicle data and the control data are derived from public data;
the second step is off-line learning;
before a host vehicle is started for the first time, enabling a multi-time scale neural network to read environmental vehicle data and control data in an individualized driving experience data set, and establishing a mapping relation from the environmental vehicle data to the control data;
the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm and learns the driving habits of the driver on line, and updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, so that the automatic control output of the electric control device to the lane change is gradually close to the driving habits of the driver of the host vehicle, and the driving experience of the driver is improved.
2. The multi-time scale self-learning lane-changing method taking personalized driving experience into account of claim 1, characterized in that:
the environmental vehicle data in the first step includes xt,yt,φt,Δxt,ΔytAnd Δ vt;
Wherein x ist,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
the control data includes vehicle steering wheel target angle data and vehicle target speed data;
the driver preference metric matrix is given by formula one:
whereinIs the driver discomfort threshold for the longitudinal acceleration,is the longitudinal deceleration threshold, | ayI is the lateral acceleration threshold, | zxI and I zyL is the maximum allowed longitudinal and transverse impulse, respectively;
in the second step, the multi-time scale neural network is represented by formula two:
(ii) a Wherein f (x, u/w) is a nonlinear function of system output, x and u are respectively a system state and an input, w is a weight of the neural network,is a time scale factor that can be adaptively varied; tau isσObeying Gaussian normal distribution N, T0,σ0The mean and variance vectors are Gaussian normal distribution N;
in the second step, the electric control device performs off-line learning according to the individualized driving experience data set through the multi-time scale neural network, and obtains a mapping relation from environmental vehicle data to control data in an off-line state;
in the third step, the first step is carried out,
the multi-time scale self-learning algorithm comprises the following steps:
3.1, initializing parameters; the electric control device initializes a discount parameter gamma, a learning step length alpha, an exploration parameter epsilon and a multi-time scale parameter ts≤ta≤tlAnd event Q (s, a) 1 (lane change) and event Q (s, a) 0 (lane change);
tsthe sampling period of the vehicle-mounted sensor is also the period of the lane change model for acquiring information; the vehicle-mounted sensor is used for acquiring environmental vehicle data; t is taThe time interval from the acquisition of information to the output of control data by the electric control device through a multi-time scale self-learning algorithm is adopted; the system comprises a plurality of pieces of environment vehicle data, a control data and a control data, wherein the environment vehicle data comprises vehicle steering wheel target turning angle data and vehicle target speed data; t is tlThe method is a learning and updating period of a multi-time scale self-learning algorithm;
3.2, observing the vehicle state; a lane change model in the electric control device acquires current environmental vehicle data through a vehicle-mounted sensor connected with the electric control device to obtain a current environmental state s; acquiring a current environment state s by a multi-time scale self-learning algorithm in the electric control device through a lane change model;
3.3, executing a control action; multiple time scale self-learning algorithm in electric control device every taSelecting and outputting control data a according to the environment state s by using a greedy algorithm, and controlling the host vehicle to perform lane changing or lane unchanging actions according to the vehicle steering wheel target corner data and the vehicle target speed data in the control data a if the driver does not intervene; and if the driver intervenes, controlling the host vehicle to perform lane changing or lane unchanging actions according to the operation of the driver.
3. The multi-time scale self-learning lane-changing method taking personalized driving experience into account of claim 2, characterized in that: during the third step, every tlTime, a learning and updating operation is performed once;
tau is the current time, s' is the current environmental state, s is the environmental state after acting according to the control data a; t is tau-tlFor at τ -tlAnd all times t between τiCalculating a driver preference metric matrix M to obtain tiReward function R(s) of time of dayti,ati),tlIs a learning update period of a multi-time scale self-learning algorithm, tl>ta,
The electric control device correspondingly updates the individualized driving experience data set on line through a sixth formula according to the data, wherein the sixth formula is as follows:
in the formula six, R is a formula five, namely a reward function; wherein s is the environmental state of the host vehicle expressed by formula three; a is the control data that actually occurs; alpha represents the learning step length, and gamma is a discount factor;
every tlWhen learning and updating are performed once, control data a is selected with a probability of 1-epsilon so that the Q value is the maximum, or off-line learning is selected at random with a probability of epsilonLearning a new action; wherein epsilon represents the transition probability, 0 < epsilon < 0.5;
the third formula is stored in the electric control device, and the third formula is as follows: st=[xt,yt,φt,Δxt,Δyt,Δvt](ii) a In the third formula, stThe environmental state of the host vehicle is represented by the direction of the road width, which is the y-axis direction, i.e., the lateral direction, and the direction of the road length, which is the x-axis direction, i.e., the longitudinal directiont,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
if there are a plurality of surrounding vehicles around the host vehicle, Δ xt,Δyt,ΔvtRespectively representing column vectors corresponding to different surrounding vehicles; a formula III forms a lane change model;
the dynamic time-varying reward function that takes into account driver preferences in the first step is:
defining the expression for performing action a as formula four:
at=[δt,vt](ii) a In the fourth formula, δtIs the steering wheel angle, vtIs the speed;
the electric control device stores a reward function expressed by a formula five, wherein the formula five is as follows:
(ii) a In the fifth formula, M*A safety boundary constructed for each parameter range in equation one,andis a corresponding to the formulaReference state and execution action;
and finally, the electric control device trains the multi-time scale neural network in the second step by using the new strategy data learned by the multi-time scale self-learning algorithm, and updates the off-line strategy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011561553.6A CN112498354B (en) | 2020-12-25 | 2020-12-25 | Multi-time scale self-learning lane changing method considering personalized driving experience |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011561553.6A CN112498354B (en) | 2020-12-25 | 2020-12-25 | Multi-time scale self-learning lane changing method considering personalized driving experience |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112498354A true CN112498354A (en) | 2021-03-16 |
CN112498354B CN112498354B (en) | 2021-11-12 |
Family
ID=74922076
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011561553.6A Active CN112498354B (en) | 2020-12-25 | 2020-12-25 | Multi-time scale self-learning lane changing method considering personalized driving experience |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112498354B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113978470A (en) * | 2021-12-13 | 2022-01-28 | 郑州轻工业大学 | On-line rapid estimation method for friction force between tire and road surface |
CN114013443A (en) * | 2021-11-12 | 2022-02-08 | 哈尔滨工业大学 | Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning |
CN114023108A (en) * | 2021-11-02 | 2022-02-08 | 河北工业大学 | Mixed traffic flow lane change model and lane change simulation method |
CN115018016A (en) * | 2022-08-03 | 2022-09-06 | 苏州大学 | Method and system for identifying lane changing intention of manually-driven vehicle |
CN115195757A (en) * | 2022-09-07 | 2022-10-18 | 郑州轻工业大学 | Electric bus starting driving behavior modeling and recognition training method |
CN115512540A (en) * | 2022-09-20 | 2022-12-23 | 中国第一汽车股份有限公司 | Information processing method and device for vehicle, storage medium and processor |
FR3137642A1 (en) * | 2022-07-05 | 2024-01-12 | Psa Automobiles Sa | Method and device for controlling a system for semi-automatically changing the lane of a vehicle as a function of a maximum value of a dynamic parameter |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109624986A (en) * | 2019-03-01 | 2019-04-16 | 吉林大学 | A kind of the study cruise control system and method for the driving style based on pattern switching |
CN110733506A (en) * | 2019-10-17 | 2020-01-31 | 上海舵敏智能科技有限公司 | Lane changing method and apparatus for unmanned vehicle |
EP3650297A1 (en) * | 2018-11-08 | 2020-05-13 | Bayerische Motoren Werke Aktiengesellschaft | Method and apparatus for determining information related to a lane change of a target vehicle, method and apparatus for determining a vehicle comfort metric for a prediction of a driving maneuver of a target vehicle and computer program |
US20200189597A1 (en) * | 2018-12-12 | 2020-06-18 | Visteon Global Technologies, Inc. | Reinforcement learning based approach for sae level-4 automated lane change |
CN111483468A (en) * | 2020-04-24 | 2020-08-04 | 广州大学 | Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning |
-
2020
- 2020-12-25 CN CN202011561553.6A patent/CN112498354B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3650297A1 (en) * | 2018-11-08 | 2020-05-13 | Bayerische Motoren Werke Aktiengesellschaft | Method and apparatus for determining information related to a lane change of a target vehicle, method and apparatus for determining a vehicle comfort metric for a prediction of a driving maneuver of a target vehicle and computer program |
US20200189597A1 (en) * | 2018-12-12 | 2020-06-18 | Visteon Global Technologies, Inc. | Reinforcement learning based approach for sae level-4 automated lane change |
CN109624986A (en) * | 2019-03-01 | 2019-04-16 | 吉林大学 | A kind of the study cruise control system and method for the driving style based on pattern switching |
CN110733506A (en) * | 2019-10-17 | 2020-01-31 | 上海舵敏智能科技有限公司 | Lane changing method and apparatus for unmanned vehicle |
CN111483468A (en) * | 2020-04-24 | 2020-08-04 | 广州大学 | Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning |
Non-Patent Citations (3)
Title |
---|
HEYE HUANG等: "A probabilistic risk assessment framework considering lane-changing behavior interaction", 《SCIENCE CHINA(INFORMATION SCIENCES)》 * |
杨炜等: "结合前车驾驶意图辨识的汽车主动防撞预警系统", 《中国科技论文》 * |
王其东等: "识别驾驶人意图的车道偏离防避有界控制", 《中国公路学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114023108A (en) * | 2021-11-02 | 2022-02-08 | 河北工业大学 | Mixed traffic flow lane change model and lane change simulation method |
CN114013443A (en) * | 2021-11-12 | 2022-02-08 | 哈尔滨工业大学 | Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning |
CN113978470A (en) * | 2021-12-13 | 2022-01-28 | 郑州轻工业大学 | On-line rapid estimation method for friction force between tire and road surface |
CN113978470B (en) * | 2021-12-13 | 2024-01-12 | 郑州轻工业大学 | On-line quick estimation method for friction force between tire and road surface |
FR3137642A1 (en) * | 2022-07-05 | 2024-01-12 | Psa Automobiles Sa | Method and device for controlling a system for semi-automatically changing the lane of a vehicle as a function of a maximum value of a dynamic parameter |
CN115018016A (en) * | 2022-08-03 | 2022-09-06 | 苏州大学 | Method and system for identifying lane changing intention of manually-driven vehicle |
CN115195757A (en) * | 2022-09-07 | 2022-10-18 | 郑州轻工业大学 | Electric bus starting driving behavior modeling and recognition training method |
CN115195757B (en) * | 2022-09-07 | 2023-08-04 | 郑州轻工业大学 | Electric bus starting driving behavior modeling and recognition training method |
CN115512540A (en) * | 2022-09-20 | 2022-12-23 | 中国第一汽车股份有限公司 | Information processing method and device for vehicle, storage medium and processor |
Also Published As
Publication number | Publication date |
---|---|
CN112498354B (en) | 2021-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112498354B (en) | Multi-time scale self-learning lane changing method considering personalized driving experience | |
EP2065842B1 (en) | Adaptive driver assistance system with robust estimation of object properties | |
US6230111B1 (en) | Control system for controlling object using pseudo-emotions and pseudo-personality generated in the object | |
CN109927725A (en) | A kind of self-adaption cruise system and implementation method with driving style learning ability | |
EP3750765A1 (en) | Methods, apparatuses and computer programs for generating a machine-learning model and for generating a control signal for operating a vehicle | |
CN112347567A (en) | Vehicle intention and track prediction method | |
CN111332362B (en) | Intelligent steer-by-wire control method integrating individual character of driver | |
CN112677982B (en) | Vehicle longitudinal speed planning method based on driver characteristics | |
CN112109708A (en) | Adaptive cruise control system considering driving behaviors and control method thereof | |
US20210331663A1 (en) | Electric vehicle control system | |
JP7415471B2 (en) | Driving evaluation device, driving evaluation system, in-vehicle device, external evaluation device, and driving evaluation program | |
CN113655794A (en) | Multi-vehicle cooperative control method based on robust model predictive control | |
CN110103960B (en) | Vehicle self-adaptive cruise control method and system and vehicle | |
Selvaraj et al. | An ML-aided reinforcement learning approach for challenging vehicle maneuvers | |
CN113184040B (en) | Unmanned vehicle line-controlled steering control method and system based on steering intention of driver | |
US20210213977A1 (en) | Nearby Driver Intent Determining Autonomous Driving System | |
CN110271557B (en) | Vehicle user feature recognition system | |
CN114503133A (en) | Information processing apparatus, information processing method, and program | |
CN114269632A (en) | Method and device for estimating a mechanically fed steering wheel torque on a steering wheel of a steering system of a motor vehicle | |
CN115848369A (en) | Personalized self-adaptive cruise system based on deep reinforcement learning and control method thereof | |
US11738804B2 (en) | Training a vehicle to accommodate a driver | |
CN115649197A (en) | Automatic driving control method based on driver characteristics and storage medium | |
CN111413974B (en) | Automobile automatic driving motion planning method and system based on learning sampling type | |
CN112835362B (en) | Automatic lane change planning method and device, electronic equipment and storage medium | |
Guo et al. | Optimal design of a driver assistance controller based on surrounding vehicle’s social behavior game model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |