CN112498354A - Multi-time scale self-learning lane changing method considering personalized driving experience - Google Patents

Multi-time scale self-learning lane changing method considering personalized driving experience Download PDF

Info

Publication number
CN112498354A
CN112498354A CN202011561553.6A CN202011561553A CN112498354A CN 112498354 A CN112498354 A CN 112498354A CN 202011561553 A CN202011561553 A CN 202011561553A CN 112498354 A CN112498354 A CN 112498354A
Authority
CN
China
Prior art keywords
vehicle
data
time scale
learning
driver
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011561553.6A
Other languages
Chinese (zh)
Other versions
CN112498354B (en
Inventor
付志军
郭耀华
殷玉明
肖艳秋
侯俊剑
周放
刘晓丽
姚雷
王辉
王良文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202011561553.6A priority Critical patent/CN112498354B/en
Publication of CN112498354A publication Critical patent/CN112498354A/en
Application granted granted Critical
Publication of CN112498354B publication Critical patent/CN112498354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units, or advanced driver assistance systems for ensuring comfort, stability and safety or drive control systems for propelling or retarding the vehicle
    • B60W30/18Propelling the vehicle
    • B60W30/18009Propelling the vehicle related to particular drive situations
    • B60W30/18163Lane change; Overtaking manoeuvres
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions
    • B60W2050/0028Mathematical models, e.g. for simulation
    • B60W2050/0031Mathematical model of the vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a multi-time scale self-learning lane changing method considering personalized driving experience, which comprises the following steps of: the first step is preparation; the second step is off-line learning; the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm and learn the driving habits of the driver on line, updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, introduces transition probability through the Markov decision lane change model to capture variation among and inside the individual, enables the automatic control output of the electric control device on lane change to gradually approach the driving habits of the driver of the host vehicle, and improves the driving experience. The invention adopts a learning structure combining an off-line strategy and an on-line strategy, considers both generality and particularity, and is very consistent with the characteristics of L4-level intelligent driving.

Description

Multi-time scale self-learning lane changing method considering personalized driving experience
Technical Field
The invention belongs to the field of intelligent driving, and particularly relates to a multi-time scale self-learning lane changing method considering personalized driving experience.
Background
In the intelligent driving field, along with the development of vehicle intellectualization, the intelligent control unit and the driver share the bottom control right of the vehicle more and more, the intelligent automobile can hardly avoid taking the right of the driver, or the intelligent automobile interferes the driver at an important moment to make a control strategy which is beneficial to the benefit of the driver, and then potential safety hazards are caused. Therefore, smart cars cannot ignore the understanding and perception of the highest decision maker of the vehicle, i.e., the driver.
The advanced driving assistance system at the present stage already has a function of monitoring driving behaviors preliminarily through detection of a driver state, a vehicle and a surrounding environment.
However, from the viewpoint of man-machine driving of the intelligent vehicle, personalized differences of different drivers and different driving states of the same driver are not considered, and it is still difficult to meet the requirement of intelligent driving of the vehicle.
For example, a lane change system detects the distance and relative speed between a vehicle ahead and a surrounding vehicle using radar, and performs a lane change if the distance and relative speed are higher than a set threshold (risk of collision may occur), otherwise does not perform a lane change.
This requires that the car manufacturer must base its extensive experimentation on finding set thresholds reflecting the average driving behaviour and reaction time of humans.
However, different drivers will observe the relative distance and vehicle speed and choose to slow down or change lanes according to personal dynamic preferences. Therefore, the intelligent lane-changing method not only needs to "infer" and "learn" the personalized dynamic preferences of human beings, but also needs to continuously "adapt" and "take action" so as to realize the self-learning lane-changing method considering the personalized driving experience.
The reinforcement learning is a machine learning method independent of an environment model and prior knowledge, and can continuously optimize a control strategy through a trial and error and delay return mechanism, so that a feasible scheme is provided for developing an individualized self-learning lane change method.
However, reinforcement learning also faces the following challenges in developing a specific application of the personalized self-learning lane-changing method:
first, for general reinforcement learning, performing the same action for the same environmental state will receive the same reward. The actual reward is dynamically changed along with the individual preference of the driver, and the driver can take corresponding actions (such as braking and decelerating) or accelerate lane change when facing the lane change decision.
Thus, the rewards earned by reinforcement learning do not always reflect their own behavior, but are influenced by driver personalized driving behavior decisions.
Furthermore, general reinforcement learning assumes that the reward earned in each step is due to the corresponding action performed in that step. In practice the driver response (in seconds) is not always instantaneous, depending on his personalized dynamic preferences, and varies from moment to moment. Thus, evaluating the reward may require several steps until the effect of performing the action is observed and the corresponding reward value is received.
Disclosure of Invention
The invention aims to solve the problem of poor driving experience caused by lack of consideration of personalized differences in the conventional active lane change, and provides a multi-time scale self-learning lane change method considering the personalized driving experience.
In order to achieve the purpose, the multi-time scale self-learning lane-changing method considering the personalized driving experience is carried out according to the following steps:
the first step is preparation;
establishing an individualized driving experience data set, a multi-time scale neural network, a multi-time scale self-learning algorithm, a Markov decision-based lane change model and a dynamic time-varying reward function considering driver preference in an electric control device of a host vehicle;
the personalized driving experience data set comprises environmental vehicle data, control data and a driver preference metric matrix; the environmental vehicle data and the control data are derived from public data;
the second step is off-line learning;
before a host vehicle is started for the first time, enabling a multi-time scale neural network to read environmental vehicle data and control data in an individualized driving experience data set, and establishing a mapping relation from the environmental vehicle data to the control data;
the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm and learns the driving habits of the driver on line, and updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, so that the automatic control output of the electric control device to the lane change is gradually close to the driving habits of the driver of the host vehicle, and the driving experience of the driver is improved.
The environmental vehicle data in the first step includes xt,yt,φt,Δxt,ΔytAnd Δ vt
Wherein x ist,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
the control data includes vehicle steering wheel target angle data and vehicle target speed data;
the driver preference metric matrix is given by formula one:
Figure BDA0002860821450000021
wherein
Figure BDA0002860821450000022
Is the driver discomfort threshold for the longitudinal acceleration,
Figure BDA0002860821450000023
is the longitudinal deceleration threshold, | ayI is the lateral acceleration threshold, | zxI and I zyL is the maximum allowed longitudinal and transverse impulse, respectively;
in the second step, the multi-time scale neural network is represented by formula two:
Figure BDA0002860821450000031
(ii) a Wherein f (x, u/w) is a nonlinear function of system output, x and u are respectively a system state and an input, w is a weight of the neural network,
Figure BDA0002860821450000032
is a time scale factor that can be adaptively varied; tau isσObeying Gaussian normal distribution N, T00The mean and variance vectors are Gaussian normal distribution N;
in the second step, the electric control device performs off-line learning according to the individualized driving experience data set through the multi-time scale neural network, and obtains a mapping relation from environmental vehicle data to control data in an off-line state;
in the third step, the multi-time scale self-learning algorithm is as follows:
3.1, initializing parameters; the electric control device initializes a discount parameter gamma, a learning step length alpha, an exploration parameter epsilon and a multi-time scale parameter ts≤ta≤tlAnd event Q (s, a) ═ 1 (lane change) and event Q (s, a) ═ 0 (lane change) are also included in the event Q (s, a) — 0 (lane change));
tsThe sampling period of the vehicle-mounted sensor is also the period of the lane change model for acquiring information; the vehicle-mounted sensor is used for acquiring environmental vehicle data; t is taThe time interval from the acquisition of information to the output of control data by the electric control device through a multi-time scale self-learning algorithm is adopted; the system comprises a plurality of pieces of environment vehicle data, a control data and a control data, wherein the environment vehicle data comprises vehicle steering wheel target turning angle data and vehicle target speed data; t is tlThe method is a learning and updating period of a multi-time scale self-learning algorithm;
3.2, observing the vehicle state; a lane change model in the electric control device acquires current environmental vehicle data through a vehicle-mounted sensor connected with the electric control device to obtain a current environmental state s; acquiring a current environment state s by a multi-time scale self-learning algorithm in the electric control device through a lane change model;
3.3, executing a control action; multiple time scale self-learning algorithm in electric control device every taSelecting and outputting control data a according to the environment state s by using a greedy algorithm, and controlling the host vehicle to perform lane changing or lane unchanging actions according to the vehicle steering wheel target corner data and the vehicle target speed data in the control data a if the driver does not intervene; and if the driver intervenes, controlling the host vehicle to perform lane changing or lane unchanging actions according to the operation of the driver.
During the third step, every tlTime, a learning and updating operation is performed once;
tau is the current time, s' is the current environmental state, s is the environmental state after acting according to the control data a; t is tau-tlFor at τ -tlAnd all times t between τiCalculating a driver preference metric matrix M to obtain tiReward function of time of day
Figure BDA0002860821450000033
tlIs a learning update period of a multi-time scale self-learning algorithm, tl>ta
The electric control device correspondingly updates the individualized driving experience data set on line through a sixth formula according to the data, wherein the sixth formula is as follows:
Figure BDA0002860821450000034
in the formula six, R is a formula five, namely a reward function; wherein s is the environmental state of the host vehicle expressed by formula three; a is the control data that actually occurs; alpha represents the learning step length, and gamma is a discount factor;
every tlSelecting control data a with a probability of 1-epsilon when learning and updating are performed once, or randomly selecting off-line learning with epsilon probability as a new learned action when Q value is maximum; wherein epsilon represents the transition probability, 0 < epsilon < 0.5;
the third formula is stored in the electric control device, and the third formula is as follows: st=[xt,ytt,Δxt,Δyt,Δvt](ii) a In the third formula, stThe environmental state of the host vehicle is represented by the direction of the road width, which is the y-axis direction, i.e., the lateral direction, and the direction of the road length, which is the x-axis direction, i.e., the longitudinal directiont,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
if there are a plurality of surrounding vehicles around the host vehicle, Δ xt,Δyt,ΔvtRespectively representing column vectors corresponding to different surrounding vehicles; a formula III forms a lane change model;
the dynamic time-varying reward function that takes into account driver preferences in the first step is:
defining the expression for performing action a as formula four:
at=[δt,vt](ii) a In the fourth formula, δtIs the steering wheel angle, vtIs the speed;
the electric control device stores a reward function expressed by a formula five, wherein the formula five is as follows:
Figure BDA0002860821450000041
(ii) a In the fifth formula, M*A safety boundary constructed for each parameter range in equation one,
Figure BDA0002860821450000042
and
Figure BDA0002860821450000043
a reference state and an execution action corresponding to the formula one;
and finally, the electric control device trains the multi-time scale neural network in the second step by using the new strategy data learned by the multi-time scale self-learning algorithm, and updates the off-line strategy.
The invention has the following advantages:
the invention provides a multi-time scale self-learning lane changing method considering personalized driving experience, and provides personalized comfortable driving experience for intelligent vehicle users.
The invention has the following advantages:
(1) the invention adopts a learning structure combining an off-line strategy and an on-line strategy, the off-line strategy learned from historical data reflects the general lane changing behavior of personalized driving experience, and the turning and vehicle speed two-dimensional actions generated each time by on-line self-learning consider the particularity of the actual lane changing working condition, so that the learning architecture design not only considers the generality, but also considers the particularity, and is very in line with the characteristics of L4-level intelligent driving;
(2) the invention defines a driver preference measurement matrix M (shown as a formula I) consisting of optimal transverse and longitudinal acceleration and a maximum allowable impulse area of a user, represents a comprehensive result of driver preference and a perception risk level corresponding to dynamic motion in a given environment, and provides an acceptable comfort standard of personalized driving experience;
(3) the Markov decision lane change model is used for capturing variation among changing individuals and in the same individual by introducing the transition probability through the Markov decision lane change model;
(4) a dynamic time-varying reward function considering the preference of a driver is given and applied, so that the self-learning lane-changing method can be evaluated in real time according to the actual operating condition to obtain a self-learning strategy; meanwhile, the multi-time scale self-learning lane-changing method realizes time-sharing operation of state acquisition, action behavior evaluation and action behavior execution, and better accords with the actual decision-making behavior of lane changing when a driver drives a vehicle.
After the third step is continuously carried out and continuously updated, the lane changing method disclosed by the invention is more and more close to the driving habit of a driver in continuous use, so that better automatic driving experience is brought to the driver.
Drawings
FIG. 1 is a functional block diagram of a multi-time scale self-learning lane-change method of the present invention that considers a personalized driving experience;
fig. 2 is a schematic lane change of the present invention.
Detailed Description
As shown in fig. 1 and 2, the multi-time scale self-learning lane-changing method considering the personalized driving experience of the present invention is performed according to the following steps:
the first step is preparation;
establishing an individualized driving experience data set, a multi-time scale neural network, a multi-time scale self-learning algorithm, a Markov decision-based lane change model and a dynamic time-varying reward function considering driver preference in an electric control device of a host vehicle; the electric control device of the host vehicle is an in-vehicle ECU of the host vehicle.
The personalized driving experience data set comprises environmental vehicle data, control data and a driver preference metric matrix; the environmental vehicle data and the control data are derived from public data;
the second step is off-line learning;
before a host vehicle is started for the first time, enabling a multi-time scale neural network to read environmental vehicle data and control data in an individualized driving experience data set, and establishing a mapping relation from the environmental vehicle data to the control data;
the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm, learns the driving habits of the driver on line (namely learns the control data output by the driver under the specific environmental vehicle data), and updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, so that the automatic control output of the electric control device to lane change gradually approaches the driving habits of the driver of the host vehicle, and the driving experience of the driver is improved.
The environmental vehicle data in the first step includes xt,yt,φt,Δxt,ΔytAnd Δ vt
Wherein x ist,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtDenotes the distance between the host vehicle and the surrounding vehicle along the x-axis (i.e., the distance in the road surface width direction), aytDenotes the distance between the host vehicle and the surrounding vehicle along the y-axis (i.e., the distance in the road surface length direction), Δ νtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
the control data includes vehicle steering wheel target angle data and vehicle target speed data;
the driver preference metric matrix is given by formula one:
Figure BDA0002860821450000061
wherein
Figure BDA0002860821450000062
Is the driver discomfort threshold for the longitudinal acceleration,
Figure BDA0002860821450000063
is the longitudinal deceleration threshold, | ay| is a transverse acceleration threshold valuezxI and I zyL is the maximum allowed longitudinal and lateral impulse (i.e. rate of change of acceleration), respectively;
in the second step, the multi-time scale neural network is represented by formula two:
Figure BDA0002860821450000064
(ii) a Wherein f (x, u/w) is a nonlinear function of system output, x and u are respectively a system state and an input, w is a weight of the neural network,
Figure BDA0002860821450000065
is a time scale factor that can be adaptively varied; tau isσObeying Gaussian normal distribution N, T00The mean and variance vectors are Gaussian normal distribution N;
in the second step, the electric control device performs off-line learning according to the individualized driving experience data set through the multi-time scale neural network, and obtains a mapping relation from environmental vehicle data to control data in an off-line state;
in equation two, all neurons process information according to the newly incoming connection information and their previous internal states, according to the time scale factor τiWeights for retaining the previous information and processing the new information are determined.
The invention introduces learning parameters into time scale factors and provides a time scale capable of changing in a self-adaptive manner
Figure BDA0002860821450000066
Mean and variance vector initial values tau of standard Gaussian normal distribution N0=0,σ0And (1) extracting a time sequence from the Gaussian normal distribution N so as to realize different time scale mapping of the actual multidimensional state signal, performing off-line strategy learning by using an individualized driving experience data set, and obtaining a mapping relation from environmental vehicle data to control data in an off-line state.
In the third step:
standard online reinforcement learning algorithms require immediate evaluation of the performed actions made before the next iteration is performed, whereas the driving preferences of each driver are different for lane-change behavior; the control data output by different drivers is different in the face of the same environmental vehicle data.
Factors such as the degree of attention, the reaction time, and the surrounding environment determine that the control data (the vehicle steering wheel target angle data and the vehicle target speed data) of the control action to be taken based on the same environmental vehicle data are different every lane change even for the same driver. Thus, the evaluation of the execution action may require several recursive steps (looping several times) until the effect of the operation is observed and a corresponding reward is obtained. Therefore, the invention provides a multi-time scale self-learning lane-changing algorithm as follows:
the multi-time scale self-learning algorithm comprises the following steps:
3.1, initializing parameters; the electric control device initializes a discount parameter gamma, a learning step length alpha, an exploration parameter epsilon and a multi-time scale parameter ts≤ta≤tlAnd event Q (s, a) ═ 1 (lane change) and event Q (s, a) ═ 0 (lane change);
tsthe sampling period of the vehicle-mounted sensor is also the period of the lane change model for acquiring information; the vehicle-mounted sensor is used for acquiring environmental vehicle data; t is taThe time interval from the acquisition of information to the output of control data by the electric control device through a multi-time scale self-learning algorithm is adopted; the system comprises a plurality of pieces of environment vehicle data, a control data and a control data, wherein the environment vehicle data comprises vehicle steering wheel target turning angle data and vehicle target speed data; t is tlThe method is a learning and updating period of a multi-time scale self-learning algorithm;
various sensors on the vehicle, such as a speed sensor, a distance sensor, an angle sensor and the like, are all in the prior art, and various environmental vehicle data, including environmental data and state data of the vehicle itself, can be provided for the vehicle-mounted ECU for realizing automatic driving, and details are not described.
3.2, observing the vehicle state; a lane change model in the electric control device acquires current environmental vehicle data through a vehicle-mounted sensor connected with the electric control device to obtain a current environmental state s, wherein the current environmental state s comprises the current environmental vehicle data; acquiring a current environment state s by a multi-time scale self-learning algorithm in the electric control device through a lane change model;
3.3, executing a control action; multiple time scale self-learning algorithm in electric control device every taSelecting and outputting control data a according to the environment state s by using a greedy algorithm, and controlling the host vehicle to perform lane changing or lane unchanging actions according to the vehicle steering wheel target corner data and the vehicle target speed data in the control data a if the driver does not intervene; and if the driver intervenes, controlling the host vehicle to perform lane changing or lane unchanging actions according to the operation of the driver.
The greedy algorithm is a conventional algorithm and will not be described in detail.
3. The multi-time scale self-learning lane-changing method taking personalized driving experience into account of claim 2, characterized in that: during the third step, every tlTime, a learning and updating operation is performed once;
tau is the current time, s' is the current environmental state, s is the environmental state after acting according to the control data a; t is tau-tl(time of action according to control data a) for the time at τ -tlAnd all times t between τiCalculating a driver preference metric matrix M to obtain tiReward function of time of day
Figure BDA0002860821450000071
tlIs a learning update period of a multi-time scale self-learning algorithm, tl>taFrom the actual tsAnd taIs determined by the value of (a), is taAbout three times of;
the electric control device correspondingly updates the individualized driving experience data set on line through a sixth formula according to the data, wherein the sixth formula is as follows:
Figure BDA0002860821450000081
in the formula six, R is a formula five, namely a reward function; wherein s is the environmental state of the host vehicle expressed by formula three; a is the control data that actually occurs; alpha represents the learning step length, and gamma is a discount factor; after the third step is continuously carried out and continuously updated, the lane changing method disclosed by the invention is more and more close to the driving habit of a driver in continuous use, so that better automatic driving experience is brought to the driver.
When modeling the driver-host vehicle-environment, the differences between individuals and in different states in the individuals must be considered so as to accurately capture relevant real-time state information, take corresponding actions, evaluate the executed actions and realize state updating. For this purpose, a markov-based lane change model is proposed, which takes into account the transition probabilities.
Every tlSelecting actually generated control data a according to each state s by using an epsilon-greedy algorithm when one learning and updating operation is performed in time;
specifically, every tlSelecting control data a with a probability of 1-epsilon when learning and updating are performed once, or randomly selecting off-line learning with epsilon probability as a new learned action when Q value is maximum; wherein epsilon represents the transition probability, 0 < epsilon < 0.5;
ε represents the degree of trade-off between learned (1- ε) and unlearned (ε) values, with ε generally being chosen to be small (0 < ε < 0.5) for conservative considerations.
The third formula is stored in the electric control device, and the third formula is as follows: st=[xt,ytt,Δxt,Δyt,Δvt](ii) a In the third formula, stThe environmental state of the host vehicle is represented by the direction of the road width, which is the y-axis direction, i.e., the lateral direction, and the direction of the road length, which is the x-axis direction, i.e., the longitudinal directiont,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytIndicating the distance between the host vehicle and the surrounding vehicle along the y-axisFrom, Δ vtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
if there are a plurality of surrounding vehicles around the host vehicle, Δ xt,Δyt,ΔvtRespectively representing column vectors corresponding to different surrounding vehicles; a formula III forms a lane change model;
the markov decision state transition depends on the driver preference metric matrix M and the environmental state of the host vehicle, with state changes between the two different states being linked by transition probabilities. The transition probabilities are used for capturing variation among the varying individuals and in the same individual, the unknown transition probabilities are updated by adopting a multi-time scale self-learning algorithm, and corresponding actions are taken (lane change is 1 or lane change is not 1);
the dynamic time-varying reward function that takes into account driver preferences in the first step is:
the goal of reinforcement learning is to continually generate strategies to guide the system from a "bad" state to a "good" state. The evaluation of "bad" and "good" captures each executed action in all states by assigning a reward value. Defining the expression for performing action a as formula four:
at=[δt,vt](ii) a In the fourth formula, δtIs the steering wheel angle (in degrees), vtIs the speed (in km/h);
in the general reinforcement learning criteria definition, the reward function is invariant (static); however, in the present invention, the reward function varies with time depending on the driver's personalized driving preference (to decide lane change or no lane change), and for this reason, the electric control device stores the reward function expressed by the formula five:
Figure BDA0002860821450000091
(ii) a In the fifth formula, M*A safety boundary constructed for each parameter range in the driver preference metric matrix defined by equation one,
Figure BDA0002860821450000092
and
Figure BDA0002860821450000093
a reference state and an execution action corresponding to the formula one;
and finally, the electric control device trains the multi-time scale neural network in the second step by using the new strategy data learned by the multi-time scale self-learning algorithm, and updates the off-line strategy, so that the off-line strategy is closer to the driving habit of the driver.
Although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made thereto without departing from the spirit and scope of the invention and it is intended to cover in the claims the invention as defined in the appended claims.

Claims (3)

1. The multi-time scale self-learning lane changing method considering the personalized driving experience is characterized by comprising the following steps of:
the first step is preparation;
establishing an individualized driving experience data set, a multi-time scale neural network, a multi-time scale self-learning algorithm, a Markov decision-based lane change model and a dynamic time-varying reward function considering driver preference in an electric control device of a host vehicle;
the personalized driving experience data set comprises environmental vehicle data, control data and a driver preference metric matrix; the environmental vehicle data and the control data are derived from public data;
the second step is off-line learning;
before a host vehicle is started for the first time, enabling a multi-time scale neural network to read environmental vehicle data and control data in an individualized driving experience data set, and establishing a mapping relation from the environmental vehicle data to the control data;
the third step is on-line operation; the electric control device controls the host vehicle to automatically drive at the level of L4 through the multi-time scale self-learning algorithm and learns the driving habits of the driver on line, and updates the personalized driving experience data set, the multi-time scale neural network, the multi-time scale self-learning algorithm, the lane change model and the reward function according to the driving habits of the driver, so that the automatic control output of the electric control device to the lane change is gradually close to the driving habits of the driver of the host vehicle, and the driving experience of the driver is improved.
2. The multi-time scale self-learning lane-changing method taking personalized driving experience into account of claim 1, characterized in that:
the environmental vehicle data in the first step includes xt,yt,φt,Δxt,ΔytAnd Δ vt
Wherein x ist,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
the control data includes vehicle steering wheel target angle data and vehicle target speed data;
the driver preference metric matrix is given by formula one:
Figure FDA0002860821440000011
wherein
Figure FDA0002860821440000012
Is the driver discomfort threshold for the longitudinal acceleration,
Figure FDA0002860821440000013
is the longitudinal deceleration threshold, | ayI is the lateral acceleration threshold, | zxI and I zyL is the maximum allowed longitudinal and transverse impulse, respectively;
in the second step, the multi-time scale neural network is represented by formula two:
Figure FDA0002860821440000014
(ii) a Wherein f (x, u/w) is a nonlinear function of system output, x and u are respectively a system state and an input, w is a weight of the neural network,
Figure FDA0002860821440000015
is a time scale factor that can be adaptively varied; tau isσObeying Gaussian normal distribution N, T00The mean and variance vectors are Gaussian normal distribution N;
in the second step, the electric control device performs off-line learning according to the individualized driving experience data set through the multi-time scale neural network, and obtains a mapping relation from environmental vehicle data to control data in an off-line state;
in the third step, the first step is carried out,
the multi-time scale self-learning algorithm comprises the following steps:
3.1, initializing parameters; the electric control device initializes a discount parameter gamma, a learning step length alpha, an exploration parameter epsilon and a multi-time scale parameter ts≤ta≤tlAnd event Q (s, a) 1 (lane change) and event Q (s, a) 0 (lane change);
tsthe sampling period of the vehicle-mounted sensor is also the period of the lane change model for acquiring information; the vehicle-mounted sensor is used for acquiring environmental vehicle data; t is taThe time interval from the acquisition of information to the output of control data by the electric control device through a multi-time scale self-learning algorithm is adopted; the system comprises a plurality of pieces of environment vehicle data, a control data and a control data, wherein the environment vehicle data comprises vehicle steering wheel target turning angle data and vehicle target speed data; t is tlThe method is a learning and updating period of a multi-time scale self-learning algorithm;
3.2, observing the vehicle state; a lane change model in the electric control device acquires current environmental vehicle data through a vehicle-mounted sensor connected with the electric control device to obtain a current environmental state s; acquiring a current environment state s by a multi-time scale self-learning algorithm in the electric control device through a lane change model;
3.3, executing a control action; multiple time scale self-learning algorithm in electric control device every taSelecting and outputting control data a according to the environment state s by using a greedy algorithm, and controlling the host vehicle to perform lane changing or lane unchanging actions according to the vehicle steering wheel target corner data and the vehicle target speed data in the control data a if the driver does not intervene; and if the driver intervenes, controlling the host vehicle to perform lane changing or lane unchanging actions according to the operation of the driver.
3. The multi-time scale self-learning lane-changing method taking personalized driving experience into account of claim 2, characterized in that: during the third step, every tlTime, a learning and updating operation is performed once;
tau is the current time, s' is the current environmental state, s is the environmental state after acting according to the control data a; t is tau-tlFor at τ -tlAnd all times t between τiCalculating a driver preference metric matrix M to obtain tiReward function R(s) of time of dayti,ati),tlIs a learning update period of a multi-time scale self-learning algorithm, tl>ta
The electric control device correspondingly updates the individualized driving experience data set on line through a sixth formula according to the data, wherein the sixth formula is as follows:
Figure FDA0002860821440000021
in the formula six, R is a formula five, namely a reward function; wherein s is the environmental state of the host vehicle expressed by formula three; a is the control data that actually occurs; alpha represents the learning step length, and gamma is a discount factor;
every tlWhen learning and updating are performed once, control data a is selected with a probability of 1-epsilon so that the Q value is the maximum, or off-line learning is selected at random with a probability of epsilonLearning a new action; wherein epsilon represents the transition probability, 0 < epsilon < 0.5;
the third formula is stored in the electric control device, and the third formula is as follows: st=[xt,ytt,Δxt,Δyt,Δvt](ii) a In the third formula, stThe environmental state of the host vehicle is represented by the direction of the road width, which is the y-axis direction, i.e., the lateral direction, and the direction of the road length, which is the x-axis direction, i.e., the longitudinal directiont,ytIndicates the longitudinal and lateral positions, phi, of the host vehicletRepresenting yaw rate, Δ xtRepresenting the distance between the host vehicle and the surrounding vehicle along the x-axis, aytRepresenting the distance between the host vehicle and the surrounding vehicle along the y-axis, avtRepresenting a speed difference between the host vehicle and the surrounding vehicle;
if there are a plurality of surrounding vehicles around the host vehicle, Δ xt,Δyt,ΔvtRespectively representing column vectors corresponding to different surrounding vehicles; a formula III forms a lane change model;
the dynamic time-varying reward function that takes into account driver preferences in the first step is:
defining the expression for performing action a as formula four:
at=[δt,vt](ii) a In the fourth formula, δtIs the steering wheel angle, vtIs the speed;
the electric control device stores a reward function expressed by a formula five, wherein the formula five is as follows:
Figure FDA0002860821440000031
(ii) a In the fifth formula, M*A safety boundary constructed for each parameter range in equation one,
Figure FDA0002860821440000032
and
Figure FDA0002860821440000033
is a corresponding to the formulaReference state and execution action;
and finally, the electric control device trains the multi-time scale neural network in the second step by using the new strategy data learned by the multi-time scale self-learning algorithm, and updates the off-line strategy.
CN202011561553.6A 2020-12-25 2020-12-25 Multi-time scale self-learning lane changing method considering personalized driving experience Active CN112498354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011561553.6A CN112498354B (en) 2020-12-25 2020-12-25 Multi-time scale self-learning lane changing method considering personalized driving experience

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011561553.6A CN112498354B (en) 2020-12-25 2020-12-25 Multi-time scale self-learning lane changing method considering personalized driving experience

Publications (2)

Publication Number Publication Date
CN112498354A true CN112498354A (en) 2021-03-16
CN112498354B CN112498354B (en) 2021-11-12

Family

ID=74922076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011561553.6A Active CN112498354B (en) 2020-12-25 2020-12-25 Multi-time scale self-learning lane changing method considering personalized driving experience

Country Status (1)

Country Link
CN (1) CN112498354B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113978470A (en) * 2021-12-13 2022-01-28 郑州轻工业大学 On-line rapid estimation method for friction force between tire and road surface
CN114013443A (en) * 2021-11-12 2022-02-08 哈尔滨工业大学 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN114023108A (en) * 2021-11-02 2022-02-08 河北工业大学 Mixed traffic flow lane change model and lane change simulation method
CN115018016A (en) * 2022-08-03 2022-09-06 苏州大学 Method and system for identifying lane changing intention of manually-driven vehicle
CN115195757A (en) * 2022-09-07 2022-10-18 郑州轻工业大学 Electric bus starting driving behavior modeling and recognition training method
CN115512540A (en) * 2022-09-20 2022-12-23 中国第一汽车股份有限公司 Information processing method and device for vehicle, storage medium and processor
FR3137642A1 (en) * 2022-07-05 2024-01-12 Psa Automobiles Sa Method and device for controlling a system for semi-automatically changing the lane of a vehicle as a function of a maximum value of a dynamic parameter

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109624986A (en) * 2019-03-01 2019-04-16 吉林大学 A kind of the study cruise control system and method for the driving style based on pattern switching
CN110733506A (en) * 2019-10-17 2020-01-31 上海舵敏智能科技有限公司 Lane changing method and apparatus for unmanned vehicle
EP3650297A1 (en) * 2018-11-08 2020-05-13 Bayerische Motoren Werke Aktiengesellschaft Method and apparatus for determining information related to a lane change of a target vehicle, method and apparatus for determining a vehicle comfort metric for a prediction of a driving maneuver of a target vehicle and computer program
US20200189597A1 (en) * 2018-12-12 2020-06-18 Visteon Global Technologies, Inc. Reinforcement learning based approach for sae level-4 automated lane change
CN111483468A (en) * 2020-04-24 2020-08-04 广州大学 Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3650297A1 (en) * 2018-11-08 2020-05-13 Bayerische Motoren Werke Aktiengesellschaft Method and apparatus for determining information related to a lane change of a target vehicle, method and apparatus for determining a vehicle comfort metric for a prediction of a driving maneuver of a target vehicle and computer program
US20200189597A1 (en) * 2018-12-12 2020-06-18 Visteon Global Technologies, Inc. Reinforcement learning based approach for sae level-4 automated lane change
CN109624986A (en) * 2019-03-01 2019-04-16 吉林大学 A kind of the study cruise control system and method for the driving style based on pattern switching
CN110733506A (en) * 2019-10-17 2020-01-31 上海舵敏智能科技有限公司 Lane changing method and apparatus for unmanned vehicle
CN111483468A (en) * 2020-04-24 2020-08-04 广州大学 Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HEYE HUANG等: "A probabilistic risk assessment framework considering lane-changing behavior interaction", 《SCIENCE CHINA(INFORMATION SCIENCES)》 *
杨炜等: "结合前车驾驶意图辨识的汽车主动防撞预警系统", 《中国科技论文》 *
王其东等: "识别驾驶人意图的车道偏离防避有界控制", 《中国公路学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114023108A (en) * 2021-11-02 2022-02-08 河北工业大学 Mixed traffic flow lane change model and lane change simulation method
CN114013443A (en) * 2021-11-12 2022-02-08 哈尔滨工业大学 Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN113978470A (en) * 2021-12-13 2022-01-28 郑州轻工业大学 On-line rapid estimation method for friction force between tire and road surface
CN113978470B (en) * 2021-12-13 2024-01-12 郑州轻工业大学 On-line quick estimation method for friction force between tire and road surface
FR3137642A1 (en) * 2022-07-05 2024-01-12 Psa Automobiles Sa Method and device for controlling a system for semi-automatically changing the lane of a vehicle as a function of a maximum value of a dynamic parameter
CN115018016A (en) * 2022-08-03 2022-09-06 苏州大学 Method and system for identifying lane changing intention of manually-driven vehicle
CN115195757A (en) * 2022-09-07 2022-10-18 郑州轻工业大学 Electric bus starting driving behavior modeling and recognition training method
CN115195757B (en) * 2022-09-07 2023-08-04 郑州轻工业大学 Electric bus starting driving behavior modeling and recognition training method
CN115512540A (en) * 2022-09-20 2022-12-23 中国第一汽车股份有限公司 Information processing method and device for vehicle, storage medium and processor

Also Published As

Publication number Publication date
CN112498354B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN112498354B (en) Multi-time scale self-learning lane changing method considering personalized driving experience
EP2065842B1 (en) Adaptive driver assistance system with robust estimation of object properties
US6230111B1 (en) Control system for controlling object using pseudo-emotions and pseudo-personality generated in the object
CN109927725A (en) A kind of self-adaption cruise system and implementation method with driving style learning ability
EP3750765A1 (en) Methods, apparatuses and computer programs for generating a machine-learning model and for generating a control signal for operating a vehicle
CN112347567A (en) Vehicle intention and track prediction method
CN111332362B (en) Intelligent steer-by-wire control method integrating individual character of driver
CN112677982B (en) Vehicle longitudinal speed planning method based on driver characteristics
CN112109708A (en) Adaptive cruise control system considering driving behaviors and control method thereof
US20210331663A1 (en) Electric vehicle control system
JP7415471B2 (en) Driving evaluation device, driving evaluation system, in-vehicle device, external evaluation device, and driving evaluation program
CN113655794A (en) Multi-vehicle cooperative control method based on robust model predictive control
CN110103960B (en) Vehicle self-adaptive cruise control method and system and vehicle
Selvaraj et al. An ML-aided reinforcement learning approach for challenging vehicle maneuvers
CN113184040B (en) Unmanned vehicle line-controlled steering control method and system based on steering intention of driver
US20210213977A1 (en) Nearby Driver Intent Determining Autonomous Driving System
CN110271557B (en) Vehicle user feature recognition system
CN114503133A (en) Information processing apparatus, information processing method, and program
CN114269632A (en) Method and device for estimating a mechanically fed steering wheel torque on a steering wheel of a steering system of a motor vehicle
CN115848369A (en) Personalized self-adaptive cruise system based on deep reinforcement learning and control method thereof
US11738804B2 (en) Training a vehicle to accommodate a driver
CN115649197A (en) Automatic driving control method based on driver characteristics and storage medium
CN111413974B (en) Automobile automatic driving motion planning method and system based on learning sampling type
CN112835362B (en) Automatic lane change planning method and device, electronic equipment and storage medium
Guo et al. Optimal design of a driver assistance controller based on surrounding vehicle’s social behavior game model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant