CN112622886B

CN112622886B - Anti-collision early warning method for heavy operation vehicle comprehensively considering front and rear obstacles

Info

Publication number: CN112622886B
Application number: CN202011512720.8A
Authority: CN
Inventors: 李旭; 胡玮明; 胡锦超; 常彬
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-12-20
Filing date: 2020-12-20
Publication date: 2022-02-15
Anticipated expiration: 2040-12-20
Also published as: CN112622886A

Abstract

The invention discloses an anti-collision early warning method for a heavy operation vehicle, which comprehensively considers front and rear obstacles. Firstly, aiming at the road running environment of China, a driving simulation platform is built, and typical driving behaviors of excellent drivers under various running conditions are collected. Secondly, a reverse reinforcement learning algorithm based on the maximum entropy is introduced to learn the driving behavior of a good driver. And finally, describing the anti-collision early warning strategy problem as a Markov decision process, and establishing an anti-collision driving decision model based on forward reinforcement learning to obtain an accurate, reliable and adaptive anti-collision early warning strategy. The method provided by the invention comprehensively considers the influence of forward and backward obstacles on vehicle collision, provides accurate and quantized driving suggestions such as throttle opening, steering wheel angle control quantity and the like for a driver, can adapt to different driving conditions and driver operation, and solves the problem that the existing anti-collision early warning method for heavy commercial vehicles is lack of accuracy and adaptability.

Description

Anti-collision early warning method for heavy operation vehicle comprehensively considering front and rear obstacles

Technical Field

The invention relates to a vehicle anti-collision early warning strategy, in particular to a heavy-duty operation vehicle anti-collision early warning method comprehensively considering front and rear obstacles, and belongs to the technical field of automobile safety.

Background

The safety condition of the commercial vehicle, which is a main undertaker of road transportation, directly influences the safety of road transportation. Different from small passenger vehicles, most of the vehicles for operation, transport and transportation are large and medium-sized vehicles, and the vehicle has the characteristics of high centroid position, large overall dimension, large total mass and the like, and has the advantages of high vehicle operation intensity, long operation time and complex operation environment. In case of traffic accidents in the transportation process, serious consequences such as group death and group injury, cargo falling, combustion, explosion and the like are easily caused, and adverse effects such as property loss, environmental pollution, ecological damage and the like are caused.

Relevant researches show that the collision is the most main accident form in the road transportation process, the proportion of the forward collision in the collision accident is the largest, and particularly, the collision accidents on the expressway are mostly forward collisions. Although the occurrence frequency of the backward collision is relatively low, for heavy operation vehicles represented by dangerous goods transport tank cars, the backward collision can easily cause the damage of the tank body, further cause the leakage, even the combustion and the explosion of dangerous goods in the tank, and generate secondary damage far exceeding the damage caused by the accident itself, so that the vehicle has higher danger. Relevant statistical data from the U.S. highway traffic safety administration indicate that vehicle collision events can be reduced by about 30% to 60% if early warning prompts can be given to the driver and 0.5 second preprocessing time is added before the collision event occurs. Therefore, the accurate and reliable front and back anti-collision early warning strategy of the heavy operation vehicle is researched, and the method has important effects on improving the transportation safety guarantee capability of dangerous goods and improving the road traffic safety.

At present, many patents and documents are provided for studying collision avoidance warning strategies for vehicles, but most of them are directed to small passenger vehicles. Compared with a passenger vehicle, the heavy commercial vehicle has the characteristics of higher centroid position, larger load capacity and the like, so that the braking distance is longer, the side-tipping stability is poorer, in the emergency braking or lane-changing process, the instability of the vehicle can be further increased due to the liquid in the tank or the cargo on the trailer shaking, and the vehicle is extremely easy to tip over due to instability. Therefore, the anti-collision early warning strategy for the passenger vehicle is difficult to be applied to heavy commercial vehicles.

In the research of anti-collision early warning strategies for heavy-duty operating vehicles, the classification early warning prompt is only carried out aiming at the collision danger degree in a single direction such as the front direction or the rear direction at present, and the influence of factors such as driver operation and running conditions on vehicle collision is not considered. Although the existing method can play a certain early warning role, the problems of poor adaptability to different driving conditions and inaccurate early warning exist, and the method is difficult to adapt to complicated and variable traffic environments and vehicle driving conditions with fluctuation differences. In addition, the existing method mainly adopts the forms of sound, light and the like to carry out early warning prompt, and does not relate to the research of anti-collision early warning strategies for providing specific driving suggestions such as driving speed, driving tracks and the like, and is lack of accuracy and reliability.

Generally, the current anti-collision early warning strategy research aiming at the heavy operation vehicle still has great defects in the aspects of accuracy, adaptability and the like, and the anti-collision early warning strategy research of the heavy operation vehicle which is accurate, reliable and self-adaptive to the operation and running working conditions of a driver is lacked.

Disclosure of Invention

The purpose of the invention is as follows: the invention discloses an anti-collision early warning method for a heavy commercial vehicle, which comprehensively considers front and rear obstacles and aims at solving the problem that the anti-collision early warning method for the heavy commercial vehicle lacks accuracy and adaptability. The method can provide accurate and quantized driving suggestions such as the opening degree of a throttle valve and the steering wheel angle control quantity for a driver, can adapt to different driving conditions and driver operation, and improves the accuracy and the adaptability of the anti-collision early warning method for the heavy-duty commercial vehicle.

The technical scheme is as follows: the invention provides an anti-collision early warning strategy comprehensively considering front and rear obstacles aiming at heavy operation vehicles such as a semi-trailer tank car and a semi-trailer train. Firstly, aiming at the road running environment of China, a driving simulation platform is built, and typical driving behaviors of excellent drivers under various running conditions are collected. Secondly, a reverse reinforcement learning algorithm based on the maximum entropy is introduced to learn the driving behavior of a good driver. And finally, describing the anti-collision early warning strategy problem as a Markov decision process, and establishing an anti-collision driving decision model based on forward reinforcement learning to obtain an accurate, reliable and adaptive anti-collision early warning method. The method comprises the following steps:

the method comprises the following steps: building driving simulation platform

In order to reduce the occurrence frequency of traffic accidents caused by vehicle collision and improve the safety of heavy commercial vehicles, the invention provides an anti-collision early warning strategy, which is applicable to the following scenes: in the process of running of a heavy-duty operation vehicle, obstacles exist in front of and behind the vehicle, and in order to prevent collision with surrounding vehicles, decision strategies such as acceleration, deceleration, steering and the like are effectively and timely provided for a driver so as to avoid collision accidents.

According to the scene described above, a driving simulation platform is built, and the driving behavior of an excellent driver in a real driving environment is collected. The method specifically comprises the following steps:

firstly, a Prescan-based driving simulation platform is built, a town virtual environment model comprising a straight road and a curve road is built according to the Chinese road driving environment, and a driver controls a heavy operation vehicle to move through a driving simulator.

Secondly, a centimeter-level high-precision differential GPS, an inertia measurement unit and a millimeter wave radar are installed on the heavy operation vehicle to obtain accurate motion state information and relative motion state information of the vehicle, wherein the accurate motion state information and the relative motion state information specifically comprise position, speed, yaw angle, acceleration, relative speed and relative distance. Meanwhile, the control information of the driver is obtained by utilizing a vehicle body CAN bus, and the control information comprises the pressure of a brake pedal, the steering wheel angle and the opening degree of a throttle valve.

And finally, 6 driving conditions of lane changing, lane keeping, vehicle following, constant speed, acceleration and deceleration are designed, 30 excellent drivers with different ages and driving styles are selected to perform a data acquisition test, data acquisition of various typical driving behaviors of the excellent drivers is realized under a space-time global unified coordinate system, and a driving database of the excellent drivers is constructed.

In the present invention, the front vehicle means a vehicle located in front of the road on which the heavy-duty vehicle travels, located within the same lane line, and having the same traveling direction. The rear vehicle is a vehicle which is positioned behind the driving road of the heavy operation vehicle, is positioned in the same lane line and has the same driving direction.

Step two: learning driving behavior of human excellent driver

In order to improve the adaptability of the anti-collision early warning strategy, the invention introduces a reverse reinforcement learning algorithm based on the maximum entropy to learn the driving behaviors of the excellent driver collected in the step one under different driving conditions.

In an actual traffic scene, the driving behavior of an excellent driver is not easy to express explicitly, but it is relatively easy to acquire a driving track generated by the excellent driving behavior. Considering that the driving track of the excellent driver has the maximum reward value in all possible tracks, the driving behavior of the excellent driver is represented by the reward function.

First, a reward function for the excellent driver's driving trajectory is established:

in the formula (1), xi_iRepresents the travel locus of the ith excellent driver, and xi_i＝{(S₁,A₁),(S₂,A₂),...,(S_m,A_m) M represents the number of driving tracks of excellent drivers collected, r_θ(ξ_i) Feature vector representing the ith excellent driver's driving track, i.e. reward function for this driving track, r_θ(S_i,A_i) Reward value, S, representing the ith "state-action" in this track_iIndicating the state at time i, A_iIndicating the operation at time i.

Considering that an excellent driver often makes driving decisions according to variables such as running speed, yaw angle, distance from a lane line, distance from front and rear obstacles, and the like, the present invention linearly fits a reward value using longitudinal speed, lateral speed, yaw angle, and distance from front and rear obstacles.

r_θ(S_i,A_i)＝r_θ(φ₁,φ₂,φ₃,φ₄)＝θ^rT·φ (2)

In the formula (2), the characteristic value phi₁＝v_sx cosψ_s,φ₂＝v_sy sinψ_s,φ₃＝d_sf-d₀,φ₄＝d_sr-d₀，v_sx,v_syRespectively, the lateral and longitudinal speeds of a heavy commercial vehicle in meters per second, psi_sIs the yaw angle in degrees, d_sf,d_srRespectively represents the relative distance between the heavy operation vehicle and the front vehicle and the rear vehicle, and the unit is meter and theta^rTPhi represents the fitted eigenvalue for the coefficient matrix.

The probability of a trajectory having the maximum entropy can be expressed as:

in the formula (3), p (xi)_i| θ) represents the probability of the trace having the maximum entropy, Z (θ) is a partition function, and

representing a policy n_t-1And n represents the number of sampling tracks in the current strategy.

Secondly, establishing a probability model of the driving track of the excellent driver, and solving the driving track with the maximum entropy by using the maximum information entropy principle, wherein the formula is shown as (4):

in the formula (4), the reaction mixture is,

representing the collected driving track of the excellent driver.

Converting equation (4) into a product by using a Lagrange multiplier method:

in the formula (5), J (θ) is a loss function.

Considering that the greater the probability of occurrence of the driving trajectory of the excellent driver, the more the reward function expresses the driving behavior of the excellent driver, equation (5) is described as:

minimizing the reward function by utilizing a gradient descent method to obtain a global optimal solution of the reward function:

and finally, optimizing the parameters of the reward function by using a gradient descent algorithm, and further learning the global optimal solution of the reward function. According to the optimized parameter theta^rThe current reward function r can be output_θ(S_i,A_i) I.e. a function characterizing the excellent driver driving behavior.

Step three: establishing an anti-collision driving decision model

According to the invention, a DDPG algorithm is adopted, and an anti-collision driving decision model is established based on the driving behavior of the excellent driver collected in the step one and the excellent driving strategy obtained in the step two, so that anti-collision early warning strategies under different driver operation and driving conditions are researched. The method specifically comprises the following 4 sub-steps:

substep 1: defining basic parameters for an anti-collision driving decision model

Considering that the future motion state of the heavy-duty operating vehicle is influenced by the current motion state and the current action at the same time, the anti-collision driving decision problem is modeled into a Markov decision process, and the basic parameters of the model are defined: state S at time t_tState S at time t +1_t+1Action A at time t_tAnd action A_tCorresponding return value R_t(ii) a In particular to：

(1) Defining a state space

The running safety of a heavy-duty vehicle is related to not only the motion state of the vehicle itself but also the relative motion state of the front and rear obstacles. Therefore, using the motion state information obtained in step one, a state space is defined:

S_t＝(v_sx,v_sy,v_sf,v_sr,a_sx,a_sy,d_sf,d_sr,ω_s,θ_s,δ_br,δ_thr) (8)

in the formula (8), v_sf,v_srRespectively representing the relative speeds of the heavy commercial vehicle, a front vehicle and a rear vehicle, and the unit is meter per second; a is_sx,a_syRespectively representing the transverse acceleration and the longitudinal acceleration of the heavy commercial vehicle, and the unit is meter per second of square; omega_sThe yaw rate of the vehicle is expressed in radians per second; theta_sIs the steering wheel angle of the vehicle, in degrees, delta_br,δ_thrRespectively represents the opening degree of a brake pedal and the opening degree of a throttle valve of the vehicle, and the unit is percentage.

(2) Defining action decisions

In order to establish a more accurate and reliable anti-collision early warning strategy, the invention considers the transverse motion and the longitudinal motion of the vehicle, simultaneously considers that the control quantities of a throttle valve and a brake pedal of the vehicle can not appear simultaneously, takes the steering wheel angle and the accelerating/braking normalization quantity as the control quantities, and defines the early warning strategy output by a decision model, namely an action decision A_t＝[θ_{str_out},δ_{s_out}]。

Wherein A is_tFor action decision at time t, θ_{str_out}Represents the normalized steering wheel angle control quantity in the range of [ -1,1]，δ_{s_out}Represents an acceleration/braking normalization quantity in the range of-1, 1]. When delta_{tsuo_}When the value is 0, the vehicle is moving at a constant speed, and when the value is delta_{s_out}When-1, the vehicle is braked at the maximum deceleration, and when δ_{s_out}When 1, the vehicle is accelerated at the maximum acceleration.

(3) Defining a reward function

Defining the reward function as:

R_t＝r₁+r₂+r₃ (9)

in the formula (9), R_tFor a reward function at time t, r₁For a safety distance reward function, r₂As a comfort reward function, r₃Is a penalty function.

First, in order to prevent a collision of a vehicle, a safe distance reward function r is designed₁：

In the formula (10), d₀A safe distance threshold.

Secondly, in order to ensure the driving comfort of the vehicle, the excessive impact degree should be avoided as much as possible, and a comfort rewarding function r is designed₂＝|a_sy(t+1)-a_sy(t)|。

Finally, in order to judge the error action of the vehicle, a penalty function r is designed₃：

In the formula (11), S_penFor penalty, in the present invention, take S_penThe decision model will get a penalty of-100 when the vehicle crashes or rolls over.

Substep 2: network architecture for building anti-collision decision model

And (3) constructing an anti-collision driving decision network by using a strategy-evaluation network framework, wherein the anti-collision driving decision network comprises a strategy network and a value function network. Wherein a policy network is used for the pair state S_tAnd regressing the features to output a continuous action a_t(ii) a Value function network for receiving state S_tAnd action A_tTo evaluate the value of the current "state-action". Specifically, the method comprises the following steps:

(1) designing a policy network

Establishing a strategy network by utilizing a plurality of neural networks with full connection layer structures; the normalized state space S_tIn turn with the full-link layer F₁Full connection layer F₂And a full joint layer F₃Connected to obtain an output O₁Immediate action decision A_t；

Considering that the dimension of the state space is 12, the number of neurons of the state input layer is set to 12. The activation function of each fully-connected layer is a Linear rectification Unit (ReLU) with the expression F (x) max (0, x), and the fully-connected layer F₁、F₂、F₃The number of neurons in (A) is 20, 20, 10, respectively.

(2) Design value function network

Establishing a value function network by utilizing a plurality of neural networks with full connection layer structures; the normalized state quantity S_tAnd action A_tIn turn with the full-link layer F₄Full connection layer F₅And a full joint layer F₆Connecting to obtain an output O2, namely a Q value; the activation function of each fully-connected layer is ReLU, and the fully-connected layer F₄、F₅、F₆The number of neurons in (A) is 20, 20, 10, respectively.

Substep 3: training strategy network and value function network

The strategy network and the value function network have respective network parameters, and the network parameters of the two parts are updated during training iteration, so that the network converges to obtain a better result. The specific training updating step comprises the following steps:

substep 3.1: collecting trajectory data of an excellent driver

Substep 3.2: establishing a reward function using equation (2) and initializing a value function network parameter θ^QPolicy network parameter θ^μAnd a parameter theta^r；

Substep 3.3: taking the formula (9) as an initial strategy optimization target, and performing strategy optimization by using a DDPG algorithm to obtain an initial strategyπ₀；

Substep 3.4: performing iterative solution, each iteration comprising substep 3.41 to substep 3.45, in particular:

substep 3.41: collection strategy pi_t-1Trajectory data of

Substep 3.42: based on trajectory data

And

fitting a partition function Z (theta);

substep 3.43: optimizing reward function parameters using stochastic gradient descent algorithm minimization equation (7)

Substep 3.44: the optimized reward function r_θ(S_i,A_i) As an optimization target, the DDPG algorithm is used for strategy optimization, and the value function network parameter theta is updated^QAnd a policy network parameter θ^μ；

Substep 3.45: and calculating the updating amplitude of the reward function, wherein when the updating amplitude of the reward function is smaller than a given threshold, the reward function at the moment is the optimal reward function.

Substep 3.5: and (4) performing iterative updating according to the method provided by the substep 3.4, so that the strategy network and the value function network are gradually converged. In the training process, if the vehicle collides or turns over, the current round is stopped and a new round is started for training. And when the heavy-duty operation vehicle stably and effectively avoids vehicle collision by using the decision strategy output by the model, the iteration is finished.

Substep 4: outputting an anti-collision early warning strategy by using an anti-collision driving decision model

The information collected by the centimeter-level high-precision differential GPS, the inertial measurement unit, the millimeter wave radar and other sensors is input into the trained anti-collision driving decision network, so that reasonable steering wheel turning angle and throttle opening degree commands can be output in real time, accurate, quantitative and reliable driving suggestions are provided for drivers, and the anti-collision early warning strategy output of the heavy-duty operation vehicle, which is accurate, reliable and self-adaptive to driver operation and driving conditions, is realized.

Has the advantages that: compared with a general vehicle anti-collision early warning strategy, the method provided by the invention has the characteristics of more accuracy, reliability and self-adaption, and is specifically embodied as follows:

(1) the method provided by the invention comprehensively considers the influence of forward and backward obstacles on vehicle collision, accurately quantifies driving strategies such as driving speed, steering of a steering wheel and the like in a numerical form, and realizes accurate and reliable anti-collision early warning decision of heavy commercial vehicles.

(2) The method provided by the invention can adapt to different driver operations and driving conditions, the output driving strategy can be adaptively adjusted according to the driver operations and the driving condition changes, and the problem that the existing anti-collision early warning strategy for heavy-duty operation vehicles is lack of accuracy and adaptability is solved.

(3) The method provided by the invention does not need complex vehicle dynamics modeling, and the calculation method is simple and clear.

Drawings

FIG. 1 is a schematic diagram of a technical route of the present invention;

fig. 2 is a schematic diagram of a network architecture of an anti-collision driving decision model established by the present invention.

Detailed Description

The technical scheme of the invention is further explained by combining the attached drawings.

In order to establish an accurate, reliable and self-adaptive anti-collision early warning strategy for driver operation and driving conditions, the invention provides an anti-collision early warning strategy which comprehensively considers front and rear obstacles aiming at heavy operation vehicles such as semi-trailer trains and semi-trailer tank trucks. Firstly, aiming at the road running environment of China, a driving simulation platform is built, and typical driving behaviors of excellent drivers under various running conditions are collected. Secondly, a reverse reinforcement learning algorithm based on the maximum entropy is introduced to learn the driving behavior of a good driver. And finally, describing the anti-collision early warning strategy problem as a Markov decision process, and establishing an anti-collision driving decision model based on forward reinforcement learning to obtain an accurate, reliable and adaptive anti-collision early warning strategy. The technical route of the invention is shown in figure 1, and the specific steps are as follows:

the method comprises the following steps: building driving simulation platform

Step two: learning driving behavior of human excellent driver

in the expression (1), xi i represents the traveling locus of the i-th excellent driver, and xi_i＝{(S₁,A₁),(S₂,A₂),...,(S_m,A_m) M represents the number of driving tracks of excellent drivers collected, r_θ(ξ_i) Feature vector representing the ith excellent driver's driving track, i.e. reward function for this driving track, r_θ(S_i,A_i) Reward value, S, representing the ith "state-action" in this track_iIndicating the state at time i, A_iIndicating the operation at time i.

r_θ(S_i,A_i)＝r_θ(φ₁,φ₂,φ₃,φ₄)＝θ^rT·φ (2)

The probability of a trajectory having the maximum entropy can be expressed as:

in the formula (4), the reaction mixture is,

representing the collected driving track of the excellent driver.

Converting equation (4) into a product by using a Lagrange multiplier method:

in the formula (5), J (θ) is a loss function.

Step three: establishing an anti-collision driving decision model

Common anti-collision early warning strategies mainly comprise a method based on a system physical model and a data driving method. The anti-collision early warning strategy based on the system physical model compares an actual value representing collision danger with a set alarm threshold value, and performs collision early warning when the actual value exceeds the threshold value, however, in the vehicle movement process, uncertainty exists in vehicle movement parameters, road conditions and rear traffic states, so that the methods lack accuracy and environmental adaptability. In the data-driven-based method, the deep reinforcement learning method combines the perception capability of deep learning and the decision capability of reinforcement learning, and has the characteristic of adaptability to the uncertainty problem. Therefore, the anti-collision driving decision model of the heavy-duty operation vehicle is established by adopting a deep reinforcement learning algorithm and comprehensively considering the influence of the forward and backward barriers on the vehicle collision.

The decision method based on deep reinforcement learning mainly comprises the following steps: and the decision-making method is based on a value function, strategy search and an Actor-Critic framework. The value-based deep reinforcement learning algorithm cannot handle the problem of continuous output and cannot meet the requirement of continuously outputting a driving strategy in an anti-collision decision. Compared with a method based on strategy search, the decision method based on the Actor-Critic architecture combines value function estimation and strategy search, has a high updating speed, and obtains a good effect in the aspect of outputting a continuous action space by using a Deep Q Network (DQN) experience playback thought as a Deep Deterministic Policy Gradient (DDPG) algorithm. Therefore, the anti-collision driving decision model is established by adopting the DDPG algorithm and based on the driving behaviors of the excellent drivers collected in the step one and the excellent driving strategies obtained in the step two, and anti-collision early warning strategies under different driver operations and driving conditions are researched. The method specifically comprises the following 4 sub-steps:

Considering that the future motion state of the heavy-duty operating vehicle is influenced by the current motion state and the current action at the same time, the anti-collision driving decision problem is modeled into a Markov decision process, and the basic parameters of the model are defined: state S at time t_tState S at time t +1_t+1Action A at time t_tAnd action A_tCorresponding return value R_t(ii) a Specifically, the method comprises the following steps:

(1) defining a state space

S_t＝(v_sx,v_sy,v_sf,v_sr,a_sx,a_sy,d_sf,d_sr,ω_s,θ_s,δ_br,δ_thr) (8)

in the formula (8), vs_f,v_srRespectively representing the relative speeds of the heavy commercial vehicle, a front vehicle and a rear vehicle, and the unit is meter per second; a is_sx,a_syRespectively representing the transverse acceleration and the longitudinal acceleration of the heavy commercial vehicle, and the unit is meter per second of square; omega_sThe yaw velocity of the heavy commercial vehicle is expressed in radian per second; theta_sFor the steering wheel angle of heavy commercial vehicles in degrees, delta_br,δ_thrRespectively represents the opening of a brake pedal and the opening of a throttle valve of the heavy commercial vehicle, and the unit is percentage.

(2) Defining action decisions

(3) Defining a reward function

To realize action blockPolicy A_tAnd (4) quantitative evaluation of the advantages and the disadvantages, namely materializing and digitizing the evaluation in a mode of establishing a return function. If the action A is executed_tAnd then, the running state of the heavy commercial vehicle can be safer, the return value is reward, otherwise, the return value is punishment, and the anti-collision driving decision model can judge the last executed error action to a certain extent.

Different from passenger vehicles, heavy commercial vehicles have the characteristics of higher mass center position, larger load capacity and the like, and are easy to rollover in the processes of emergency braking, steering and lane changing. Therefore, when an anti-collision early warning strategy is established, the occurrence of vehicle collision and rollover needs to be considered at the same time. Defining the reward function as:

R_t＝r₁+r₂+r₃ (9)

In the formula (10), d₀A safe distance threshold.

In the formula (11), S_penFor penalty, in the present invention, take S_pen-100, when the vehicle crashes or rolls overThe decision model will get a penalty of-100.

Substep 2: network architecture for building anti-collision decision model

And (3) constructing an anti-collision driving decision network by using a strategy-evaluation network framework, wherein the anti-collision driving decision network comprises a strategy network and a value function network. Wherein a policy network is used for the pair state S_tAnd regressing the features to output a continuous action a_t(ii) a Value function network for receiving state S_tAnd action A_tTo evaluate the value of the current "state-action". The network architecture is shown in fig. 2, specifically:

(1) designing a policy network

(2) Design value function network

Substep 3: training strategy network and value function network

substep 3.1: collecting trajectory data of an excellent driver

Substep 3.3: taking the formula (9) as an initial strategy optimization target, and performing strategy optimization by using a DDPG algorithm to obtain an initial strategy pi₀；

substep 3.41: collection strategy pi_t-1Trajectory data of

Substep 3.42: based on trajectory data

And

fitting a partition function Z (theta);

Claims

1. The utility model provides a heavy type operation vehicle anticollision early warning method of obstacle before the comprehensive consideration, its characterized in that: the method comprises the following steps:

step one, building a driving simulation platform:

the method comprises the steps of constructing a driving simulation platform with obstacles in front and at the back of a heavy operation vehicle in the driving process of the heavy operation vehicle, and collecting driving behaviors of excellent drivers in a real driving environment; the method specifically comprises the following steps:

firstly, a driving simulation platform based on Prescan is built, a town virtual environment model comprising a straight road and a curve is built, and a driver controls a heavy operation vehicle to move through a driving simulator;

secondly, a centimeter-level high-precision differential GPS, an inertia measurement unit and a millimeter wave radar are installed on the heavy operation vehicle to obtain accurate motion state information and relative motion state information of the vehicle, wherein the accurate motion state information and the relative motion state information specifically comprise position, speed, yaw angle, acceleration, relative speed and relative distance; meanwhile, control information of a driver is obtained by utilizing a vehicle body CAN bus, wherein the control information comprises brake pedal pressure, steering wheel turning angle and throttle opening;

finally, 6 driving working conditions of lane changing, lane keeping, vehicle following, constant speed, acceleration and deceleration are designed, 30 excellent drivers with different ages and driving styles are selected to perform data acquisition tests, data acquisition of various typical driving behaviors of the excellent drivers is achieved under a space-time global unified coordinate system, and a driving database of the excellent drivers is constructed;

the definition of the front vehicle refers to a vehicle which is positioned in front of a running road of a heavy operation vehicle, positioned in the same lane line and has the same running direction; the rear vehicle is a vehicle which is positioned behind the driving road of the heavy operation vehicle, is positioned in the same lane line and has the same driving direction;

step two: learning driving behavior of human excellent driver

A reverse reinforcement learning algorithm based on the maximum entropy is introduced, and driving behaviors of the excellent driver collected in the step one under different driving conditions are learned;

representing the driving behavior of a human excellent driver by using a reward function;

in the formula (1), xi_iRepresents the travel locus of the ith excellent driver, and xi_i＝{(S₁,A₁),(S₂,A₂),...,(S_m,A_m) M represents the number of driving tracks of excellent drivers collected, r_θ(ξ_i) Feature vector representing the ith excellent driver's driving track, i.e. reward function for this driving track, r_θ(S_i,A_i) Reward value, S, representing the ith "state-action" in this track_iIndicating the state at time i, A_iAn action indicating time i;

linear fitting is carried out on the reward value by utilizing the longitudinal speed, the transverse speed, the yaw angle and the distance between the front obstacle and the rear obstacle;

r_θ(S_i,A_i)＝r_θ(φ₁,φ₂,φ₃,φ₄)＝θ^rT·φ (2)

in the formula (2), the characteristic value phi₁＝v_sxcosψ_s,φ₂＝v_sysinψ_s,φ₃＝d_sf-d₀,φ₄＝d_sr-d₀，v_sx,v_syRespectively, the lateral and longitudinal speeds of a heavy commercial vehicle in meters per second, psi_sIs the yaw angle in degrees, d_sf,d_srRespectively represents the relative distance between the heavy operation vehicle and the front vehicle and the rear vehicle, and the unit is meter and theta^rTIs a coefficient matrix, phi represents the characteristic value after fitting;

the probability of a trajectory having the maximum entropy can be expressed as:

representing a policy n_t-1The number of the lower tracks is n, and the number of the lower tracks in the current strategy is n;

in the formula (4), the reaction mixture is,

representing the collected driving track of the excellent driver;

converting equation (4) into a product by using a Lagrange multiplier method:

in the formula (5), J (theta) is a loss function;

finally, parameters of the reward function are optimized by using a gradient descent algorithm, and then the global optimal solution of the reward function is learned; according to the optimized parameter theta^rThe current reward function r can be output_θ(S_i,A_i) I.e. a function characterizing excellent driver driving behavior;

step three: establishing an anti-collision driving decision model

Establishing an anti-collision driving decision model by adopting a DDPG algorithm based on the driving behavior of the excellent driver collected in the step one and the excellent driving strategy obtained in the step two, and researching anti-collision early warning strategies under different driver operation and driving conditions; the method specifically comprises the following 4 sub-steps:

Modeling the anti-collision driving decision problem as a Markov decision process, and defining basic parameters of the model: state S at time t_tState S at time t +1_t+1Action A at time t_tAnd action A_tCorresponding return value R_t(ii) a Specifically, the method comprises the following steps:

(1) defining a state space

The running safety of the heavy operation vehicle is not only related to the motion state of the vehicle, but also related to the relative motion state of front and rear obstacles; therefore, using the motion state information obtained in step one, a state space is defined:

S_t＝(v_sx,v_sy,v_sf,v_sr,a_sx,a_sy,d_sf,d_sr,ω_s,θ_s,δ_br,δ_thr) (8)

in the formula (8), v_sf,v_srRespectively representing the relative speeds of the heavy commercial vehicle, a front vehicle and a rear vehicle, and the unit is meter per second; a is_sx,a_syRespectively representing the transverse acceleration and the longitudinal acceleration of the heavy commercial vehicle, and the unit is meter per second of square; omega_sThe yaw rate of the vehicle is expressed in radians per second; theta_sIs the steering wheel angle of the vehicle, in degrees, delta_br,δ_thrRespectively representing the opening of a brake pedal and the opening of a throttle valve of the vehicle, and the unit is percentage;

(2) defining action decisions

Considering both the transverse motion and the longitudinal motion of the vehicle and considering that the control quantity of a throttle valve and a brake pedal of the vehicle cannot appear simultaneously, the steering wheel angle and the accelerating/braking normalization quantity are used as the control quantity, and an early warning strategy output by a decision model is defined, namely an action decision A_t＝[θ_{str_out},δ_{s_out}]；

Wherein A is_tFor action decision at time t, θ_{str_out}Represents the normalized steering wheel angle control quantity in the range of [ -1,1]，δ_{s_out}Represents an acceleration/braking normalization quantity in the range of-1, 1](ii) a When delta_{tsuo_}When the value is 0, the vehicle is moving at a constant speed, and when the value is delta_{s_out}When-1, the vehicle is braked at the maximum deceleration, and when δ_{s_out}When the acceleration is 1, the vehicle is accelerated at the maximum acceleration;

(3) defining a reward function

Defining the reward function as:

R_t＝r₁+r₂+r₃ (9)

in the formula (9), R_tFor a reward function at time t, r₁For a safety distance reward function, r₂As a comfort reward function, r₃Is a penalty function;

In the formula (10), d₀A safe distance threshold;

secondly, in order to ensure the driving comfort of the vehicle, the excessive impact degree should be avoided as much as possible, and a comfort rewarding function r is designed₂＝|a_sy(t+1)-a_sy(t)|；

In the formula (11), S_penIs a penalty item;

substep 2: network architecture for building anti-collision decision model

Constructing an anti-collision driving decision network by utilizing a strategy-evaluation network framework, wherein the anti-collision driving decision network comprises a strategy network and a value function network; wherein a policy network is used for the pair state S_tAnd regressing the features to output a continuous action a_t(ii) a Value function networkFor receiving state S_tAnd action A_tTo evaluate the value of the current "state-action"; specifically, the method comprises the following steps:

(1) designing a policy network

Setting the number of neurons of the state input layer to 12 in consideration of the dimension of the state space to 12; the activation function of each fully-connected layer is a Linear rectification Unit (ReLU) with the expression F (x) max (0, x), and the fully-connected layer F₁、F₂、F₃The number of the neurons is 20, 20 and 10;

(2) design value function network

Establishing a value function network by utilizing a plurality of neural networks with full connection layer structures; the normalized state quantity S_tAnd action A_tIn turn with the full-link layer F₄Full connection layer F₅And a full joint layer F₆Connecting to obtain an output O2, namely a Q value; the activation function of each fully-connected layer is ReLU, and the fully-connected layer F₄、F₅、F₆The number of the neurons is 20, 20 and 10;

substep 3: training strategy network and value function network

The strategy network and the value function network have respective network parameters, and the network parameters of the two parts are updated during training iteration, so that the network is converged to obtain a better result; the specific training updating step comprises the following steps:

substep 3.1: collecting trajectory data of an excellent driver

substep 3.41: collection strategy pi_t-1Trajectory data of

Substep 3.42: based on trajectory data

And

fitting a partition function Z (theta);

Substep 3.45: calculating the updating amplitude of the reward function, wherein when the updating amplitude of the reward function is smaller than a given threshold value, the reward function at the moment is the optimal reward function;

substep 3.5: performing iterative updating according to the method provided by the substep 3.4 to gradually converge the policy network and the value function network; in the training process, if the vehicle is collided or turned over, the current round is stopped and a new round is started for training; when the heavy-duty operation vehicle stably and effectively avoids vehicle collision by using a decision strategy output by the model, the iteration is completed;