CN109927725B

CN109927725B - Self-adaptive cruise system with driving style learning capability and implementation method

Info

Publication number: CN109927725B
Application number: CN201910077516.9A
Authority: CN
Inventors: 褚洪庆; 张羽翔; 高炳钊; 闫勇军; 陈虹
Original assignee: Jilin University
Current assignee: Jilin University
Priority date: 2019-01-28
Filing date: 2019-01-28
Publication date: 2020-11-03
Anticipated expiration: 2039-01-28
Also published as: CN109927725A

Abstract

The invention belongs to the technical field of intelligent auxiliary driving and automobile safety of automobiles, and particularly relates to an adaptive cruise system with driving style learning capability and an implementation method, wherein the adaptive cruise system changes the adaptive cruise following performance of the system through online learning of driver style behaviors. The invention designs an adaptive cruise control system with driving style self-learning capability and an implementation method thereof by considering the problem of automobile adaptive cruise control. The system is suitable for an L2-level automatic driving vehicle, aims to realize self-adaption to follow the front vehicle in an actual driving scene through environment perception and information fusion, and simultaneously, the driving strategy of the system considers the driving style of a specific driver, so that the system has consistent satisfaction and acceptance for different crowds. To achieve this goal, the present invention uses a combination of linear quadratic control and reinforcement learning online learning methods to better optimize system performance.

Description

Self-adaptive cruise system with driving style learning capability and implementation method

Technical Field

The invention belongs to the technical field of intelligent auxiliary driving and automobile safety of automobiles, and particularly relates to an adaptive cruise system with driving style learning capability and an implementation method, wherein the adaptive cruise system changes the adaptive cruise following performance of the system through online learning of driver style behaviors.

Background

The intelligent driving assistance technology gradually develops, becomes a main development direction of vehicle technology research and development, and draws wide attention worldwide. Cruise systems are gradually evolving in the context of vehicle system automation and intelligence. In the design of the currently developed adaptive cruise system control system, in addition to the more traditional methods such as PID and fuzzy control, an advanced control algorithm based on a model is used in the adaptive cruise system, and the following performance is improved by using a model predictive control method in the patents of application numbers 201810313067.9 and 201710826862.3. However, these methods still mainly consider vehicle following performance, such as safety, comfort and economy.

With the continuous improvement and maturity of vehicle technology, a higher development direction is provided for the adaptive cruise. In order to better improve the system performance, the cruise control of the vehicle not only needs to ensure indexes such as safety and the like under the traffic flow, but also needs to consider the subjective intention of a driver. Since the driving styles of drivers are greatly different among human drivers, the design of the control system precedes the actual use of the vehicle. Therefore, when the control system needs to be better adapted to the driving style of different specific drivers, a self-learning capability is further required for the system. Such methods are currently immature. For example, in the patent application No. 2010106159140, the driver is classified and learned by off-line data driving analysis, and the driver's acceptance of the system is improved by changing parameters of the adaptive cruise control system in the form of vehicle buttons or the like. The patent of application No. 201710812719.9 also learns driver driving habits and adjusts controller parameters by off-line learning and key press. Unlike the jerry of application No. 201710812719.9, the present application can autonomously learn the driving style of the driver.

Disclosure of Invention

In order to solve the problems, the invention designs an adaptive cruise system with driving style learning capability and an implementation method thereof by considering the problem of automobile adaptive cruise control. The system is suitable for an L2-level automatic driving vehicle, aims to realize self-adaption to follow the front vehicle in an actual driving scene through environment perception and information fusion, and simultaneously, the driving strategy of the system considers the driving style of a specific driver, so that the system has consistent satisfaction and acceptance for different crowds. To achieve this goal, the present invention uses a combination of linear quadratic control and reinforcement learning online learning methods to better optimize system performance.

The technical scheme of the invention is described as follows by combining the attached drawings:

a method for realizing an adaptive cruise system with driving style learning capability comprises a perception fusion module A, a following vehicle control module B, a driving style self-learning module C and a bottom vehicle execution control module D;

the perception fusion module A is used for obtaining the running state information of the front vehicle, including the speed information and the relative distance of the front vehicle;

the following control module B comprises a vehicle following control module B-a and a preceding vehicle acceleration prediction module B-B, wherein the vehicle following control module B-a is used for establishing a vehicle following model, establishing a control problem and determining an optimized target to obtain a following controller; the front vehicle acceleration predicting module B-B is used for predicting the front vehicle acceleration according to the confirmed front vehicle and the running state information of the front vehicle and the current vehicle;

the driving style self-learning module C is used for learning the driving style of a specific driver based on a reinforcement learning method aiming at the characteristic that the driving style of the specific driver is different, so as to adjust the control parameter of the optimal cruise control problem and achieve the function of a self-learning system;

the vehicle execution control module D is used for performing tracking control and finally outputting a vehicle power driving system and a braking system to control driving of the vehicle;

the perception fusion module A is unidirectionally connected with the following vehicle control module B and the driving style self-learning module C; the following vehicle control module B is connected with the vehicle execution control module D in a one-way mode; the driving style self-learning module C is connected with the following vehicle control module B in a one-way mode; the vehicle execution control module D is connected with the vehicle in a one-way mode.

The method comprises the following steps:

firstly, confirming a front vehicle to be followed through a perception fusion module A self fusion recognition algorithm, and obtaining the running state information of the front vehicle, including the speed information and the relative distance of the front vehicle; and the information such as the speed, the engine moment, the braking deceleration and the like of the current vehicle are obtained from the communication of a vehicle bus, namely a CAN network;

step two, establishing a vehicle following model through a vehicle following control module B-a in the vehicle following control module B, establishing a control problem, determining an optimized target and obtaining a vehicle following controller; the vehicle-following control module B is used for controlling the vehicle-following control module B to carry out a vehicle-following control process according to the current vehicle speed information and the current vehicle speed information;

step three, learning the driving style of a specific driver based on a reinforcement learning method aiming at the characteristic that the driving style of the specific driver has difference through a driving style self-learning module C, and further adjusting the control parameter of the optimal cruise control problem to achieve the function of a self-learning system;

and fourthly, performing tracking control through the bottom vehicle execution control module D, and finally outputting a vehicle power driving system and a braking system to control driving of the vehicle.

The specific method of the second step is as follows:

2.1) vehicle following model establishment: according to the self-adaptive following problem requirement and the longitudinal kinematics and dynamics characteristics of the vehicle, a vehicle following model is established;

in order to describe the longitudinal dynamics of the vehicle in a following scene, the system introduces two state variables,

respectively as follows:

＝d-d_ref(1a)

where d is the relative distance between the vehicle and the front vehicle, d_refIs referenced to the relative distance, d_ref＝d₀+τv_h，d₀Is the safe parking distance, τ is the expected headway, v_hIs the speed of the vehicle, v_tIs the front vehicle speed; therefore, the differential equation of the longitudinal dynamics of the vehicle in the following scene can use the above state variables,

represents:

wherein, a_hIs the longitudinal acceleration of the vehicle, a_tIs the front vehicle longitudinal acceleration, τ is the desired headway; it is assumed here that the inner loop acceleration tracking control module of the host vehicle can be expanded to the first order approximation:

wherein, tau_iIs the inner loop dynamic time constant, a_hIs the longitudinal acceleration of the vehicle, a_refIs the desired longitudinal acceleration;

in a continuous system, the state quantities are selected as

The controlled variable is u ═ a_refModeling the acceleration of the front vehicle as system disturbance, namely disturbance quantity d ═ a_h(ii) a Thus, the continuous state space equation for a vehicle following system pair can be expressed as:

wherein,

a_his the longitudinal acceleration of the vehicle, a_refIs the desired longitudinal acceleration, τ_iIs the inner loop dynamic time constant, τ is the expected headway;

in the design of a controller, a discrete system expression form of a state space equation of a vehicle following system is obtained by using a zero-order holding method, namely ZOH, and a state quantity at the k moment is defined as

Control input u (k) ═ a_ref(k) Thus, at time k, one can obtain:

x(k+1)＝Ax(k)+B_uu(k)+B_dd(k) (5)

wherein

d(k)＝a_h(k)，T_sIs the sampling time interval, τ is the desired headway, τ_iIs the inner loop dynamic time constant;

2.2) establishing the optimal cruise control problem as a linear quadratic optimization problem, wherein the specific method comprises the following steps:

2.2.1) establishing an optimal quadratic control problem;

the performance indicator function may be expressed as:

wherein,

is a semi-positive definite symmetric matrix and is,

is a positive definite symmetric matrix, x (k) is a state quantity at the moment k, u (k) is a control quantity at the moment k, and a parameter matrix Q, R represents the requirements of the controller on driving style, comfort and economy, and is defined as:

substituting it into equation (6), the performance indicator function can also be expressed as:

here, it is clear that the penalty term

The following performance is characterized, i.e. the vehicle reaches the steady state with the expected longitudinal directionKeeping an expected following distance with a front vehicle at the vehicle speed and punishing a term

Expectation of minimizing fuel consumption, penalty term

The comfort of the vehicle is expected to be improved, and frequent acceleration and deceleration operations are reduced;

2.2.2) solving the optimal quadratic control problem;

defining a discrete Hamiltonian equation to solve the optimization problem, and for solving the optimization problem equation (6), the discrete Hamiltonian equation is defined as:

wherein,

is a Lagrange multiplier, x (k), u (k), d (k), Q, R, A, B_u,B_dAs previously mentioned and unchanged in the derivation of the following formula, the optimal solution that minimizes the hamiltonian h (k) according to the principle of minimization needs to be satisfied:

Qx(k)+A^Tλ(k+1)-λ(k)＝0 (10b)

due to the inverse matrix R of R^-1Is represented by the formula (10 a):

the following form is selected here for λ (k):

λ(k)＝Px(k)+hd(x) (12)

wherein,

and

is a Hamiltonian matrix, and is obtained by substituting equations (11) and (12) into equation of state (5):

while substituting formula (12) for formula (10b) to obtain:

(Q-R)x(k)+A^TPx(k+1)+A^Thd(k+1)-hd(k)＝0 (14)

substituting formula (13) for formula (14) to obtain:

since equation (15) holds for all x (k), the ricati equation is obtained:

due to the fact that

Ricatt equation (16) translates to:

since d (k +1) is unknown at the current time k without loss of generality, assuming that d (k +1) ═ d (k), equation (17) becomes

Thus, an explicit solution for h is;

wherein,

thus, by integrating the formulas (10b), (11), and (12), the optimal control law is:

wherein,

is the controller gain. From this analysis, it follows that the driving style can be differentiated by changing the controller gain.

2.2.3) preceding vehicle acceleration estimation considering uncertainty;

estimating the acceleration of the front vehicle by using the speed of the front vehicle and two first-order low-pass filters, and finally limiting the abnormal change of the acceleration by using a curvature limiter;

2.3) self-learning module of following driving style

Aiming at the characteristic that the driving styles of specific drivers are different, the driving styles of the specific drivers are learned based on a reinforcement learning method, so that the control parameters of the optimal cruise control problem are adjusted, and the function of a self-learning system is achieved; the specific method is as follows;

2.3.1) specific driver driving style definition;

based on a large number of statistical data analysis results of drivers, dividing the drivers into three categories of aggressive, steady and conservative by using a common driving style classification method; aggressive drivers tend to keep a close distance from the vehicle ahead, and accelerate and decelerate frequently; conservative drivers tend to stay farther away from the leading vehicle and experience less acceleration and deceleration; the system uses the variable headway time distance commonly used in the self-adaptive cruise system to represent the behavior style;

2.3.2) reinforcement learning method establishment

Because the actual driver style information is unknown during system design, the system learns the driving style of a specific driver by using a reinforcement learning method; when the Riccati equation is solved, the reinforcement learning is completed, and the system learns the driving style habit of the driver; in the linear quadratic problem in the linear discrete system, the Q function is a state action value function in reinforcement learning, is a quadratic form with respect to the state quantity and the control quantity, and can be expressed as:

wherein,

is the return in reinforcement learning, and P is the Riccatin solution; the system model formula (5) is substituted into the formula (21) to obtain

Wherein,

z(k)＝[x^T(k),d^T(k),u^T(k)]^Tfor subsequent differentiation convenience, a kernel matrix is defined:

according to the berman optimality principle, the optimal solution is the control quantity u (k) which enables the function value of the Q function to be minimum, namely:

therefore, the derivation of the formula (22)

Obtaining:

from the equation (22), system parameters A, B can be obtained_d,B_uFor the display expression mode of the kernel matrix S, Q and R are solved by using a least square method in function approximation for the kernel matrix S through a learning method;

approximating the Q function using a linear kernel function, which can be written as:

wherein W ═ W₁,w₂,...,w₁₅]^TAs a weight vector, the weight vector is,

the method is a basis function, a Q function in a parameter approximation form is used, and for an infinite time domain γ equal to 1, a bellman equation in reinforcement learning can be written as follows:

gradually updating an approximate optimal value based on driver data through incremental learning, and carrying out approximate solution on 15 unknown parameters of which optimal weight vectors are unknown; converting the solution of the formula (27) into a least square method, namely LS problem solution; determining the number N of training samples in batches, carrying out iterative computation on samples at the moment k for j times, wherein m is an integer constant, and sequentially computing:

to obtain W_j+1The least squares estimate of (d) is:

after the LS algorithm is converged, the system compares the calculated gain vector of the controller with 2 norms of gain parameter vectors of three types of drivers given in the parameter tables of the drivers with different following vehicle behavior styles, and selects the most appropriate driving style type to update the parameters of the controller; otherwise, keeping the original controller parameters, and continuing to learn the next step;

the specific algorithm flow is as follows:

step (1): controller gain initialization weight vector W, N, m in driver parameter table based on different following behavior styles and characteristic parameter tau of each driving style^A,τ^M,τ^C,r^A,r^MAnd r^C；

Step (2): updating and calculating to obtain the current data set

And (3): using the LS algorithm: updating the weight vector W by using a formula (29) to obtain W_j+1；

And (4): if the LS algorithm converges, W is obtained_j+1Then, the kernel matrix is updated: using weight vector W_j+1And (3) performing inverse solution on each parameter of the matrix S to obtain a controller gain estimation value:

comparing the parameters obtained by evaluation with 2 norms of three driving style parameter vectors, selecting a group of most suitable parameters in the driver parameter tables of different driving style, updating the parameters of the controller, and setting: j is 1, and the calculation is ended; returning to the following driving style of the vehicle;

and (5): otherwise, setting: j is j + 1; and (4) continuing to the steps (2) - (4).

The invention has the beneficial effects that:

1. establishing an optimal control problem framework aiming at the problem of the cruise system;

2. predicting and considering the acceleration of the front vehicle in the optimization problem, and avoiding unnecessary acceleration and braking to improve the comfort;

3. the control problem considers the driving style of the driver comprehensively, and the self-learning method is used for learning the driving style, so that the system can carry out the driving style self-learning aiming at each specific driver.

4. And the acceleration of the front vehicle is estimated by using two first-order low-pass filters, so that the prediction complexity is reduced.

Drawings

FIG. 1 is a block diagram of the system architecture of the present invention;

FIG. 2 is a schematic general flow diagram of the driving style self-learning cruise control system of the present invention;

FIG. 3 is a schematic diagram of a forward vehicle acceleration estimation module;

FIG. 4 is a graph showing following behavior of the system regardless of the driving style of the driver in the embodiment;

fig. 5 is a following behavior representation graph of the system after learning by considering the driving style of the driver in the embodiment.

Detailed Description

In order to adapt to higher requirements of development of automobile technology on an intelligent driving system, the system has self-learning capability on the driving style under the condition of ensuring the performance of the self-adaptive vehicle following system. The invention provides an adaptive cruise control system with driving style self-learning capability, which comprehensively considers two factors of vehicle following performance and driving style of a driver. Firstly, a second-order vehicle following model is established for the self-adaptive vehicle following problem, an optimization problem solution is established based on a quadratic optimization type control method, the driving style is expressed as unknown controller parameters in a vehicle following controller, in order to enable the system to have the self-learning capability for the driving style of specific driving, a function approximation method and a least square method are combined to solve the unknown system parameters, and the requirements of the system for drivers with different driving styles on the driving performance are effectively improved.

An adaptive cruise system with driving style learning capability, whose structural block diagram is shown in fig. 1, mainly includes: the system comprises a perception fusion module A, a following vehicle control module B, a following vehicle driving style self-learning module C and a vehicle execution control module D. The perception fusion module A adopts a scheme of radar camera fusion in an environment perception sensing part, confirms a front vehicle to be followed through a self fusion recognition algorithm, and obtains the running state information of the front vehicle, including the speed information and the relative distance of the front vehicle; and obtaining the information of the current vehicle such as speed, engine torque, braking deceleration and the like from the CAN network communication of the vehicle bus. The information is used by the vehicle following control module B-a and the vehicle following driving style self-learning module C. In the following control module B, a vehicle following model is established according to the confirmed running state information of the front vehicle, the front vehicle and the current vehicle, and meanwhile, a control problem is established, and an optimization target is determined. A preceding vehicle acceleration prediction module (see a module B-B in FIG. 1) for predicting the preceding vehicle acceleration according to the confirmed driving state information of the preceding vehicle and the current vehicle; and the following controller (such as a module B-a in fig. 1) is used for designing a vehicle following control model and an optimization solving algorithm according to the states and the prediction information so as to realize a self-adaptive following task for the front vehicle. A following driving style self-learning module (such as a module C shown in a figure 1) is designed aiming at the characteristic that the driving styles of specific drivers are different, the driving style of the specific drivers is defined and evaluated (such as a module C-a shown in a figure 1) to define the relevant parameters of controllers of the driving styles of the drivers with different driving styles, and the driving style attributes of the specific drivers are evaluated by using mean square deviation; the driving style learning module (such as the module C-b in fig. 1) is not known about the optimal cruise controller parameters for a specific driver, so that the system model parameters need to be learned by using a reinforcement learning method; based on specific driver data, a reinforced learning process is completed by solving the Riccati equation in a linear quadratic problem, and then control parameters of the optimal cruise control problem are adjusted, so that the function of a self-learning system is achieved. Fig. 2 shows a flowchart of the overall technical solution of the present invention, which is specifically implemented as follows: in a vehicle perception fusion module (such as a module A shown in figure 1), an environment perception sensing part adopts a scheme of radar camera fusion, confirms a vehicle to be followed in front through a self fusion recognition algorithm, and obtains running state information of the vehicle in front, wherein the running state information comprises speed information and relative distance of the vehicle in front; the information of the current vehicle speed, the engine torque, the braking deceleration and the like is obtained through the communication of a vehicle bus (CAN) network. In the case of system control, the preceding vehicle acceleration prediction submodule (such as modules B-B in FIG. 1) estimates the preceding vehicle acceleration; the following control module (such as a module B-a in figure 1) obtains an optimized control quantity according to the confirmed running state information of the front vehicle, the front vehicle and the current vehicle as well as the established vehicle following model and control problem, acts on the bottom vehicle execution control module (such as a module D in figure 1) to perform tracking control, and finally outputs a vehicle power driving system and a braking system to control the driving of the vehicle; under the condition that the system is not controlled, the specific driver driving style defining and evaluating module judges whether the driver is consistent with the current driving style parameters (such as a module C-a in figure 1), the following driving style self-learning module (such as a module C-b in figure 1) learns the driving style by using a learning method, and after the learning is converged, the system control parameters are updated to the following control module.

The design and the specific working process of each module of the self-adaptive cruise control system with the driving style self-learning capability are as follows:

1) perception fusion module A

The perception fusion module A adopts a scheme of radar camera fusion in an environment perception sensing part, confirms a front vehicle to be followed through a self fusion recognition algorithm, and obtains the running state information of the front vehicle, including the speed information and the relative distance of the front vehicle; the information of the current vehicle speed, the engine torque, the braking deceleration and the like is obtained through the communication of a vehicle bus (CAN) network. The information is used for the vehicle following module and the driving style self-learning module. The feasible installation scheme of the radar camera is that a product-level camera is additionally arranged on the inner wall of the front windshield, and the camera can output lane line information and front obstacle (vehicles, pedestrians, bicycles and the like) information through an internal image processing algorithm. A product-level millimeter wave radar is additionally arranged at the rear part of a vehicle grille and can be a 77Ghz forward long-distance radar, the radar can provide at least 40 original target information and can output 6 dangerous target queues after processing;

2) follow car control module B

The following sub-modules are provided, and the vehicle follows a control module (shown as a module B-a in FIG. 1) and a preceding vehicle acceleration prediction module (shown as a module B-B in FIG. 1). Establishing a vehicle following model, establishing a control problem, determining an optimized target and obtaining a following controller; a preceding vehicle acceleration prediction module that predicts a preceding vehicle acceleration based on the confirmed preceding vehicle and the traveling state information of the preceding vehicle and the current vehicle; the method comprises the following working processes:

2.1) vehicle following model establishment (as module B-a in FIG. 1): in the module, a vehicle following model is established according to the self-adaptive following problem requirement and the longitudinal kinematics and dynamics characteristics of the vehicle;

respectively as follows:

＝d-d_ref(1a)

where d is the relative distance between the vehicle and the front vehicle, d_refIs referenced to the relative distance, d_ref＝d₀+τv_h，d₀Is the safe parking distance, τ is the expected headway, v_hIs the speed of the vehicle, v_tIs the front vehicle speed. Therefore, the differential equation of the longitudinal dynamics of the vehicle in the following scene can use the above state variables,

represents:

wherein, a_hIs the longitudinal acceleration of the vehicle, a_tIs the front vehicle longitudinal acceleration, τ is the desired headway. It is assumed here that the inner loop acceleration tracking control module of the host vehicle can be expanded to the first order approximation:

wherein, tau_iIs the inner loop dynamic time constant, a_hIs the longitudinal acceleration of the vehicle, a_refIs the desired longitudinal acceleration.

In a continuous system, the state quantities are selected as

The controlled variable is u ═ a_refModeling the acceleration of the front vehicle as system disturbance, namely disturbance quantity d ═ a_h. Thus, the continuous state space equation for a vehicle following system pair can be expressed as:

wherein,

a_his the longitudinal acceleration of the vehicle, a_refIs the desired longitudinal acceleration, τ_iIs the inner loop dynamic time constant, and τ is the desired headway.

Further in the controller design, a discrete system expression form of a state space equation of the vehicle following system can be obtained by using a zero-order hold method (ZOH), and the state quantity at the k moment is defined as

Control input u (k) ═ a_ref(k) Thus, at time k, one can obtain:

x(k+1)＝Ax(k)+B_uu(k)+B_dd(k) (5)

wherein

d(k)＝a_h(k)，T_sIs the sampling time interval, τ is the desired headway, τ_iIs the inner loop dynamic time constant.

2.2) establishment of optimal cruise control problem

The safety, comfort and economy in the following performance of the vehicle and the driving style of a driver are taken into comprehensive consideration as multi-objective optimization indexes, and the optimal cruise control problem is established as a linear quadratic optimization problem.

2.2.1) optimal quadratic control problem establishment

In summary of the above, the performance indicator function can be expressed as:

wherein,

is a semi-positive definite symmetric matrix and is,

is a positive definite symmetric matrix. x (k) is a state quantity at time k, and u (k) is a control quantity at time k. The parameter matrix Q, R represents the controller's requirements for driving style, comfort and economy, which is defined here as:

substituting it into equation (5), the performance indicator function can also be expressed as:

here, it is clear that the penalty term

The following performance is characterized, namely, when the steady state is reached, the vehicle keeps a desired following distance with the front vehicle at a desired longitudinal speed. Penalty term

Expectation of minimizing fuel consumption, penalty term

It is desirable to improve the comfort of the vehicle and reduce frequent acceleration and deceleration operations.

2.2.2) solving of optimal quadratic control problem

Defining the discrete hamiltonian equation to solve the optimization problem, for solving the optimization problem equation (6), the discrete hamiltonian equation can be defined as:

wherein,

is a Lagrange multiplier, x (k), u (k), d (k), Q, R, A, B_u,B_dAs previously mentioned and unchanged in the derivation of the following formula, the optimal solution that minimizes the hamiltonian h (k) according to the principle of minima needs to be satisfied:

Qx(k)+A^Tλ(k+1)-λ(k)＝0 (10b)

due to the inverse matrix R of R^-1Is represented by the formula (10 a):

the following form is selected here for λ (k):

λ(k)＝Px(k)+hd(x) (12)

wherein,

and

is a hamiltonian matrix. Substituting equations (11), (12) into equation of state (5) yields:

while substituting formula (12) for formula (10b) to obtain:

(Q-R)x(k)+A^TPx(k+1)+A^Thd(k+1)-hd(k)＝0 (14)

substituting formula (13) for formula (14) to obtain:

since equation (15) holds for all x (k), the ricati equation is obtained:

due to the fact that

Ricatt equation (16) translates to:

Thus, an explicit solution for h is;

wherein,

wherein,

is the controller gain. Thus analyzed to obtainThus, the driving style can be differentiated by changing the controller gain.

2.2.3) preceding vehicle acceleration estimation taking uncertainty into account

In the optimal following problem, the acceleration information of the preceding vehicle is an important consideration factor. It is therefore generally necessary to estimate the leading vehicle acceleration using the difference in velocity information. There are generally several ways: a direct difference method, but noise information affects estimation accuracy; a Kalman filtering method; vehicle-to-vehicle communication, but communication delay and information security problems exist. To simplify the problem, as shown in fig. 3, the vehicle speed of the preceding vehicle is used, the acceleration of the preceding vehicle is estimated by two first-order low-pass filters, and finally, a curvature limiter is used to limit the abnormal change of the acceleration.

2.3) self-learning module of following driving style

Aiming at the characteristic that the driving styles of specific drivers are different, the driving styles of the specific drivers are learned based on a reinforcement learning method, so that the control parameters of the optimal cruise control problem are adjusted, and the function of a self-learning system is achieved.

2.3.1) driver-specific Driving Style definition

Based on a large number of statistical data analysis results of drivers, the drivers are classified into three categories of aggressive, steady and conservative by using a common driving style classification method. Aggressive drivers tend to keep a close distance from the vehicle ahead, and accelerate and decelerate frequently; while conservative drivers tend to stay farther away from the leading vehicle and experience less acceleration and deceleration. The system uses the variable headway commonly used in adaptive cruise systems to characterize this style of behavior. Accordingly, the controller-related parameters of the present system are set accordingly as shown in table 1:

TABLE 1 driver parameters for different following behavior styles

2.3.2) reinforcement learning method establishment

Since the actual driver style information is not known at the time of system design, the system learns a particular driver driving style using a reinforcement learning method. When the Riccati equation is solved, the reinforcement learning is completed, and the system learns the driving style habit of the driver. In the linear quadratic problem in the linear discrete system, the Q function is a state action value function in reinforcement learning, is a quadratic form with respect to the state quantity and the control quantity, and can be expressed as:

wherein

Is the return in reinforcement learning, and P is the Riccatin solution. For x (k), u (k), d (k), Q, R, A, B_u,B_dThe same symbols in this subsection still retain their original physical and numerical meanings. The system model formula (5) is substituted into the formula (21) to obtain

Wherein,

z(k)＝[x^T(k),d^T(k),u^T(k)]^T. For subsequent differentiation convenience, a kernel matrix is defined:

therefore, the derivation of the formula (22)

Obtaining:

from the equation (22), system parameters A, B can be obtained_d,B_uAnd Q and R are used for solving the display expression mode of the kernel matrix S by using a least square method in function approximation through a learning method.

wherein W ═ W₁,w₂,...,w₁₅]^TAs a weight vector, the weight vector is,

through incremental learning, the approximate optimal value is gradually updated based on the driver data, and 15 unknown parameters of which the optimal weight vectors are unknown are approximately solved. The solution of equation (27) is converted to a least squares problem solution (LS). Determining the number N of training samples in batches, carrying out iterative computation on samples at the moment k for j times, wherein m is an integer constant, and sequentially computing:

to obtain W_j+1The least squares estimate of (d) is:

after the LS algorithm is converged, the system compares the calculated controller gain vector with 2 norms of the given three types of driver style gain parameter vectors in the table 1, and selects the most appropriate driving style type to update the controller parameters; otherwise, keeping the original controller parameters and continuing to learn the next step.

Therefore, in the process of learning the driving style based on the driving style learning algorithm of reinforcement learning, the specific algorithm flow is as follows:

step (1): initializing weight vectors W, N, m based on the controller gains in Table 1, and characteristic parameters τ of the respective driving styles^A,τ^M,τ^C,r^A,r^MAnd r^C；

Step (2): updating and calculating to obtain the current data set

comparing the evaluated parameters with 2 norms of three driving style parameter vectors, selecting the most appropriate group of parameters in the table 1, updating the parameters of the controller, and setting: j equals 1 and the calculation ends. And returns the vehicle to follow the driving style.

And (5): otherwise, setting: j equals j + 1. And (4) continuing to the steps (2) - (4).

For the self-adaptive cruise control system with driving style self-learning capability and the implementation method, the effectiveness of the method is verified by using an actual system parameter, and the design parameters of a system controller are shown as q in formula (7)₁₁＝0.15,q₂₂＝0.73,q₂₃0.2, inner loop dynamic time constant τ_i0.9, safety distance d₀3, front vehicle acceleration estimation module parameter τ₁＝0.4,τ₂0.2. After actually collected drivers are subjected to vehicle following style learning, the vehicle following performance of the drivers is finally verified in a test run, fig. 4 is the vehicle following performance of the system without considering the driving style of the drivers, and fig. 5 is the vehicle following performance of the system after the driving style of the drivers is considered and the learning is performed.

Claims

1. A method for realizing an adaptive cruise system with driving style learning capability comprises a perception fusion module (A), a following control module (B), a driving style self-learning module (C) and a vehicle execution control module (D);

the perception fusion module (A) is used for obtaining the running state information of the front vehicle, including the speed information and the relative distance of the front vehicle;

the following control module (B) comprises a vehicle following control module (B-a) and a preceding vehicle acceleration predicting module (B-B), and the vehicle following control module (B-a) is used for establishing a vehicle following model, simultaneously establishing a control problem, determining an optimized target and obtaining a following controller; a preceding vehicle acceleration prediction module (B-B) for predicting a preceding vehicle acceleration on the basis of the confirmed preceding vehicle and the traveling state information of the preceding vehicle and the current vehicle;

the driving style self-learning module (C) is used for learning the driving style of a specific driver based on a reinforcement learning method aiming at the characteristic that the driving style of the specific driver is different, so as to adjust the control parameter of the optimal cruise control problem and achieve the function of a self-learning system;

the vehicle execution control module (D) is used for performing tracking control and finally outputting a vehicle power driving system and a braking system to control the driving of the vehicle;

the perception fusion module (A) is unidirectionally connected with the car following control module (B) and the driving style self-learning module (C); the following control module (B) is unidirectionally connected with the vehicle execution control module (D); the driving style self-learning module (C) is unidirectionally connected with the following vehicle control module (B); the vehicle execution control module (D) is connected with the vehicle in a one-way mode; the method is characterized by comprising the following steps:

firstly, confirming a front vehicle to be followed through a perception fusion module (A) self fusion recognition algorithm, and obtaining the running state information of the front vehicle, including the speed information and the relative distance of the front vehicle; and the information such as the speed, the engine moment, the braking deceleration and the like of the current vehicle are obtained from the communication of a vehicle bus, namely a CAN network;

step two, a vehicle following model is established through a vehicle following control module (B-a) in a vehicle following control module (B), a control problem is established at the same time, an optimization target is determined, and a vehicle following controller is obtained; a preceding vehicle acceleration prediction module (B-B) in the following vehicle control module (B), which uses the speed of the preceding vehicle, estimates the preceding vehicle acceleration through two first-order low-pass filters, and predicts the preceding vehicle acceleration by using a curvature limiter to limit the abnormal change of the acceleration;

step three, learning the driving style of a specific driver based on a reinforcement learning method aiming at the characteristic that the driving style of the specific driver has difference through a driving style self-learning module (C), and further adjusting the control parameter of the optimal cruise control problem to achieve the function of a self-learning system;

fourthly, tracking control is carried out through a bottom vehicle execution control module (D), and finally a vehicle power driving system and a braking system are output to control driving of the vehicle;

the specific method of the second step is as follows:

respectively as follows:

＝d-d_ref(1a)

where d is the relative distance between the vehicle and the front vehicle, d_refIs referenced to the relative distance, d_ref＝d₀+τv_h，d₀Is the safe parking distance, τ is the expected headway, v_hIs the speed of the vehicle, v_tIs the front vehicle speed; therefore, the differential equation of the longitudinal dynamics of the vehicle in the following scene uses the state variables,

represents:

wherein, a_hIs the longitudinal acceleration of the vehicle, a_tIs the front vehicle longitudinal acceleration, τ is the desired headway; it is assumed here that the inner loop acceleration tracking control module of the host vehicle is developed with a first order approximation as:

in a continuous system, the state quantities are selected as

The controlled variable is u ═ a_refModeling the acceleration of the front vehicle as system disturbance, namely disturbance quantity d ═ a_h(ii) a Thus, the continuous state space equation for a vehicle following system pair is expressed as:

wherein,

Control input u (k) ═ a_ref(k) Thus, at time k, we get:

x(k+1)＝Ax(k)+B_uu(k)+B_dd(k) (5)

wherein

2.2.1) establishing an optimal quadratic control problem;

the performance index function is expressed as:

wherein,

is a semi-positive definite symmetric matrix and is,

is a positive definite symmetric matrix, x (k) is a state quantity at the moment k, u (k) is a control quantity at the moment k, and parametersThe matrix Q, R represents the controller's requirements for driving style, comfort and economy, defined here as:

substituting it into equation (6), the performance indicator function is expressed as:

here, it is clear that the penalty term

The following performance is characterized, namely when the vehicle reaches a steady state, the vehicle keeps an expected following distance with a front vehicle at an expected longitudinal speed, and a penalty term

Expectation of minimizing fuel consumption, penalty term

2.2.2) solving the optimal quadratic control problem;

wherein,

is a Lagrange multiplier, x (k), u (k), d (k), Q, R, A, B_u,B_dAs previously described and not changed during the derivation of the following formulae, according to the principle of minimaThe minimum optimal solution of the Hamiltonian H (k) needs to satisfy:

Qx(k)+A^Tλ(k+1)-λ(k)＝0 (10b)

due to the inverse matrix R of R^-1From (10a) there can be obtained:

the following form is selected here for λ (k):

λ(k)＝Px(k)+hd(x) (12)

wherein,

and

while substituting formula (12) for formula (10b) to obtain:

(Q-R)x(k)+A^TPx(k+1)+A^Thd(k+1)-hd(k)＝0 (14)

substituting formula (13) for formula (14) to obtain:

since equation (15) holds for all x (k), the ricati equation is obtained:

due to the fact that

Ricatt equation (16) translates to:

Thus, an explicit solution for h is;

wherein,

wherein,

the method comprises the steps of (1) changing the value of the controller gain to distinguish the driving style;

2.2.3) preceding vehicle acceleration estimation considering uncertainty;

2.3) self-learning module of following driving style

2.3.1) specific driver driving style definition;

2.3.2) reinforcement learning method establishment

Because the actual driver style information is unknown during system design, the system learns the driving style of a specific driver by using a reinforcement learning method; when the Riccati equation is solved, the reinforcement learning is completed, and the system learns the driving style habit of the driver; in the linear quadratic problem in the linear discrete system, the Q function is a state action value function in reinforcement learning, is a quadratic form with respect to a state quantity and a control quantity, and is expressed as:

wherein,

Wherein,

therefore, the derivation of the formula (22)

Obtaining:

from the formula (22), system parameters A, B are obtained_d,B_uFor the display expression mode of the kernel matrix S, Q and R are solved by using a least square method in function approximation for the kernel matrix S through a learning method;

wherein W ═ W₁,w₂,...,w₁₅]^TAs a weight vector, the weight vector is,

the method is a basis function, a Q function in a parameter approximation form is used, and for an infinite time domain γ equal to 1, a bellman equation in reinforcement learning is written as follows:

to obtain W_j+1The least squares estimate of (d) is:

the specific algorithm flow is as follows:

Step (2): updating and calculating to obtain the current data set