CN113460090B - T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle - Google Patents

T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle Download PDF

Info

Publication number
CN113460090B
CN113460090B CN202110948176.XA CN202110948176A CN113460090B CN 113460090 B CN113460090 B CN 113460090B CN 202110948176 A CN202110948176 A CN 202110948176A CN 113460090 B CN113460090 B CN 113460090B
Authority
CN
China
Prior art keywords
vehicle
control
setting condition
collision avoidance
control input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110948176.XA
Other languages
Chinese (zh)
Other versions
CN113460090A (en
Inventor
侯晓慧
张俊智
何承坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110948176.XA priority Critical patent/CN113460090B/en
Publication of CN113460090A publication Critical patent/CN113460090A/en
Application granted granted Critical
Publication of CN113460090B publication Critical patent/CN113460090B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety
    • B60W60/0016Planning or execution of driving tasks specially adapted for safety of the vehicle or its occupants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/08Active safety systems predicting or avoiding probable or impending collision or attempting to minimise its consequences
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/10Geometric CAD
    • G06F30/15Vehicle, aircraft or watercraft design
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Theoretical Computer Science (AREA)
  • Geometry (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Mathematical Physics (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
  • Feedback Control In General (AREA)

Abstract

The application relates to a T-shaped emergency collision avoidance control method, a system, a medium and equipment for an automatic driving vehicle, which comprise the following steps: calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state; when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met; and when the second setting condition is met, updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 until the third setting condition is met, and outputting the optimal control quantity. The application can furthest exert the collision avoidance potential of the automatic driving vehicle and improve the performance of high-speed emergency avoidance and extreme driving conditions of the automatic driving vehicle. The application can be widely applied to the technical field of active safety control of automatic driving automobiles.

Description

T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle
Technical Field
The application relates to the technical field of active safety control of automatic driving automobiles, in particular to a method, a system, a medium and equipment for controlling T-shaped emergency collision avoidance of an automatic driving automobile based on deep reinforcement learning.
Background
With the rapid development of the automobile industry, the active safety of automobiles is challenged more and more, and meanwhile, various active safety systems of vehicles are developed and applied by various manufacturers at home and abroad, including an Anti-lock braking system (Anti-lock Braking System), a driving Anti-skid system (Acceleration Slip Regulation), an electronic stability system (Electronic Stability Program) and the like. Currently, these active safety systems help the driver avoid "abnormal" driving scenarios, such as skidding, oversteering, understeering, etc., due to the nonlinear dynamics of the vehicle, mainly by limiting the driving state of the vehicle to a linear, stable range. However, from the viewpoint of vehicle controllability, the method for improving stability is too conservative, is mainly suitable for conventional working conditions, and cannot cope with sudden scenes and extreme driving working conditions, such as T-shaped collision. At the same time, these active safety systems do not consider how to control the vehicle to reduce collision losses when collisions are unavoidable.
A T-collision refers to a collision of one vehicle against the side of another vehicle. T-shaped collisions often occur when one vehicle enters an intersection against a red light or stop sign and collides with another vehicle traveling perpendicular thereto. Such collisions may be due to mechanical failure (stuck throttle/failed brake), insufficient braking force (wet/icy road), inattention of the driver, etc. Because of the lack of energy absorbing devices in the side structures of automobiles, T-collisions result in greater injuries and losses in traffic accidents than other collision modes. The relevant data indicate that drivers in T-crash accidents often take braking action, and that this operation is not the option of optimal collision avoidance or mitigation of crash losses. In such an emergency condition, it is necessary to fully utilize the attaching ability of the tire, and to expand the traveling limit of the vehicle as much as possible to avoid a collision or mitigate a collision loss. Conventional vehicle collision avoidance strategies generally employ a layered architecture of path planning-tracking, where certain constraints are added during path planning based on vehicle dynamics, and such constraints may result in the vehicle failing to fully develop its dynamic potential or failing to track the planned path, resulting in instability. In professional driving games, however, the driver often consciously controls the wheel locking or slipping to reduce the number of turns or avoid obstacles, an operation known as "drift". The nature of the drift is a critical steady balanced condition that allows the vehicle to be in an oversteered condition by precise control, where the rear wheels reach the attachment limit. The expert driver can achieve precise control of both the vehicle sideslip and the travel path simultaneously during drift, albeit operating entirely outside the vehicle stability limits.
Under the attachment limit working condition, the vehicle is a complex nonlinear system, and the braking, driving and steering systems control high coupling, so that the coordination control algorithm is more complex.
Disclosure of Invention
Aiming at the problems, the application aims to provide a T-shaped emergency collision avoidance control method, a system, a medium and equipment for an automatic driving vehicle based on deep reinforcement learning, which can furthest exert the collision avoidance potential of the automatic driving vehicle and improve the performances of high-speed emergency avoidance and extreme driving working conditions of the automatic driving vehicle.
In order to achieve the above purpose, the present application adopts the following technical scheme: a T-shaped emergency collision avoidance control method for an autonomous vehicle, comprising: calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state; when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met; and when the second setting condition is met, updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 until the third setting condition is met, and outputting the optimal control quantity.
Further, the method further comprises the following steps: presetting a state space and an action space in a Markov decision model based on T-shaped collision avoidance of an automatic driving vehicle;
the state space contains all information required by T-shaped emergency collision avoidance of the automatic driving vehicle, including self-vehicle state information and surrounding environment information;
the action space comprises a steering angle of front wheels of the bicycle and longitudinal slip rates of left and right rear wheels of the bicycle.
Further, the setting of the reward function includes: the first type rewards and the second type rewards are overlapped to form the rewards;
the first reward is an instant reward given after each decision in the collision avoidance process;
the second rewards are termination state rewards given based on different state modes of the bicycle after each training round is finished; the different state modes of the self-vehicle comprise collision and rollover in the collision avoidance process.
Further, the calculating the control input amount of the rule-based optimal control problem includes:
the optimal control problem based on the rules is that the vehicle is braked at full force first, and the vehicle is steered at full force after the set time so as to make the yaw movement to the greatest extent;
the control input vector is composed of the transverse force and the longitudinal force of the current tire;
the objective function of the rule-based optimal control problem is set to terminate the state rewards.
Further, the first setting condition is: epinode is less than or equal to i control
The second setting condition is: epi code>i control
The third setting condition is: epi code=i max
The epoode is the number of sequences, i, of the current training control The number of sequences for learning optimal control; i.e max Is the set maximum training round number.
Further, the updating the reinforcement-learning network parameter based on the control input amount includes:
obtaining a new measured value and a current rewarding value based on the control input quantity, forming four elements of state transition by the original measured value, the control input quantity, the new measured value and the current rewarding value, and storing the four elements in an experience pool;
randomly sampling in an experience pool, calculating target values of two evaluation networks in an Actor-Critic framework of TD3, and taking a minimum value;
updating the evaluation network parameters by minimizing the loss function;
the action network is updated by minimizing the difference in the optimal control input amount and the action network control amount, and then the target evaluation network and the target action network are updated.
Further, the updating the network parameters based on the Actor-Critic framework of TD3 comprises the following steps:
selecting a control input quantity, obtaining a new measured value and a current rewarding value according to the control input quantity, forming four elements of state transition by the original measured value, the control input quantity, the new measured value and the current rewarding value, and storing the four elements in an experience pool;
randomly sampling in an experience pool, calculating target values of two evaluation networks in an Actor-Critic framework of TD3, and taking a minimum value;
updating the evaluation network parameters by minimizing the loss function:
and updating the action network by a strategy gradient method, and then updating the target evaluation network and the target action network.
A T-shaped emergency collision avoidance control system for an autonomous vehicle, comprising: the device comprises a computing module, a first updating module and a second updating module; the calculation module calculates the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state; the first updating module updates the network parameters of reinforcement learning based on the control input quantity when the first setting condition is met until the second setting condition is met; and the second updating module updates the network parameters of reinforcement learning based on an Actor-Critic framework of TD3 when the second setting condition is met until a third setting condition is met, and outputs the optimal control quantity.
A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods described above.
A computing apparatus, comprising: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods described above.
Due to the adoption of the technical scheme, the application has the following advantages:
1. according to the application, the advanced reinforcement learning combined with priori knowledge is adopted to carry out integrated design on the decision control of the T-shaped emergency collision avoidance of the automatic driving vehicle, and compared with a layered control architecture of path planning-tracking, the control architecture can furthest exert the collision avoidance potential of the automatic driving vehicle, and even under the unavoidable extreme condition of collision, the control planning for reducing the collision loss as much as possible is realized, so that the performances of high-speed emergency collision avoidance and extreme driving working conditions of the automatic driving vehicle are improved.
2. According to the application, a priori knowledge-combined deep reinforcement learning algorithm is combined, and a T-shaped emergency collision avoidance control system arranged for a distributed rear-drive vehicle-type automatic driving vehicle is combined with an optimally controlled dual-delay depth deterministic strategy gradient (Twin Delayed Deep Deterministic policy gradient algorithm, TD 3) algorithm, so that collision avoidance or maximum reduction of collision loss of the vehicle in a T-shaped emergency collision avoidance scene can be realized.
Drawings
FIG. 1 is a schematic diagram of a T-shaped obstacle avoidance learning process of a vehicle based on a TD3 algorithm in an embodiment of the application;
FIG. 2 is a schematic representation of a vehicle dynamics model in an embodiment of the application;
FIG. 3 is a schematic view showing a combination of a collision position and a collision angle in an embodiment of the present application;
fig. 4 is a schematic diagram of a network structure of a TD3 action network according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a network architecture of a TD3 evaluation network in accordance with an embodiment of the application;
FIG. 6 is a schematic view of an initial state of T-shaped collision avoidance according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a round prize for TD3 in one embodiment of the application;
FIG. 8 is a schematic view of a T-shaped collision avoidance path according to an embodiment of the present application;
fig. 9 is a schematic diagram of a computing device in accordance with an embodiment of the application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present application. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which are obtained by a person skilled in the art based on the described embodiments of the application, fall within the scope of protection of the application.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The currently applied active safety system and collision avoidance strategy of the vehicle cannot be applied to extreme T-shaped collision conditions. In such emergency conditions, it is necessary to try to refer to drifting operations in professional driving races, and to expand the traveling limit of the vehicle as much as possible to avoid collisions or mitigate collision losses. The application discloses a T-shaped emergency collision avoidance control system of an automatic driving vehicle based on deep reinforcement learning, which combines a double-delay depth deterministic strategy gradient algorithm of optimal control, performs integrated design aiming at a T-shaped collision avoidance decision control system of a distributed rear-drive vehicle type, furthest plays the collision avoidance potential of the automatic driving vehicle, realizes control planning for reducing collision loss as much as possible even under the extreme condition of unavoidable collision, and improves the performances of high-speed emergency avoidance and extreme driving working conditions of the automatic driving vehicle. Training test results prove that the feasibility of the scheme provided by the application is provided, and a new scheme is provided for T-shaped emergency collision avoidance control of the automatic driving vehicle.
In one embodiment of the present application, as shown in fig. 1, a method for controlling T-type emergency collision of an automatic driving vehicle based on deep reinforcement learning is provided, and the embodiment is exemplified by using 6 deep neural networks, including 1 action network pi (s|θ π ) 1 target action network pi (s∣θ π′ ) 2 evaluation networksAnd 2 target evaluation networks->Because the T-shaped emergency collision avoidance scene is dangerous, the control model training process is completed in the simulation environment MATLAB/Simulink. In this embodiment, the method includes the steps of:
step 1, calculating the control input quantity of an optimal control problem based on rules according to a preset vehicle model, a reward function and an initial state;
step 2, when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met;
and step 3, when the second setting condition is met, updating the network parameters of reinforcement learning by using an Actor-Critic framework based on TD3 until the third setting condition is met, and outputting the optimal control quantity.
The control method in this embodiment further includes a step of presetting a state space and an action space in a markov decision model based on T-collision avoidance of the automatically driven vehicle.
The method comprises the following steps: a state space S, an action space a and a reward function R in a markov decision model based on T-collision avoidance of an autonomous vehicle are constructed. Wherein:
(1) State space S
The state space contains all information required by T-shaped emergency collision avoidance of the automatic driving vehicle, including self-vehicle state information and surrounding environment information, and the following formula is shown:
S=[x e ,x r ] T
x r =[X r ,Y r ,c eX ,c eY ,c rX ,c eY ] T
wherein x is e And x r The vehicle state information and the surrounding environment information are respectively. V (V) x ,V y Andlongitudinal speed, lateral speed and yaw rate of the own vehicle in the vehicle coordinate system, V e ,Y e And ψ are the centroid position and yaw angle of the own vehicle in the geodetic coordinate system, respectively. M is the current vehicle state mode, including: 1-no collision, 2-collision, 3-completion of collision avoidance, and 4-side turning in the collision avoidance process. X is X r ,Y r Is the centroid position of the other vehicle in the geodetic coordinate system. (c) eX ,c eY ) And (c) rX ,c rY ) The coordinates of a certain point on the own vehicle and the other vehicle under the geodetic coordinate system are respectively, so that the connecting line of the two points is the minimum distance between the two vehicles, and the two vehicles only exist in a non-collision state. In this embodiment, the T-collision avoidance strategy is described by taking a stationary collision avoidance scenario of another vehicle as an example.
(2) Action space A
The action space contains the following three elements:
A=[δ,λ 34 ] T
wherein delta is the steering angle of the front wheel of the bicycle, lambda 3 And lambda (lambda) 4 The longitudinal slip rates of the left rear wheel and the right rear wheel of the bicycle are respectively. The range is delta epsilon minus 30 degrees and 30 degrees],λ 3 ∈[-1,1],λ 4 ∈[-1,1]。
In this embodiment, a T-type collision avoidance strategy is set for a self-vehicle type of a distributed rear-drive vehicle. In order to enable the vehicle to sideslip more easily so as to realize collision avoidance or reduce collision avoidance loss under the limiting working condition, the braking force distribution coefficient of the front wheel and the rear wheel is 0:1, namely, only the braking force of the rear wheel is generated, and the strategy that a professional driver finishes drifting by utilizing a hand brake in a real driving environment is simulated. Based on the control quantity [ delta, lambda ] 34 ] T By combining the vehicle dynamics model and the tire model, the longitudinal and transverse forces of the corresponding tires and the current motion state of the vehicle can be obtained.
In this embodiment, a two-rail three-degree-of-freedom vehicle dynamics model is employed, as shown in fig. 2.
Wherein, coefficient matrix B is:
where, ψ is the yaw angle of the vehicle,for yaw acceleration of the vehicle, +.>For longitudinal acceleration +.>Is the lateral acceleration, m is the vehicle mass, I z For yaw moment of inertia of vehicle, L a And L b Respectively the linear distance between the mass center and the front axle/rear axle, L w Is one half of the track, F xj And F yj Respectively represent the tangential and transverse tire ground forces of the wheel, wherein j=1, 2,3,4 respectively represent the left front wheel, the right front wheel, the left rear wheel and the right rear wheel, F roll And F air Rolling resistance and air resistance of the vehicle are respectively:
F roll =fmg
wherein f is a rolling resistance coefficient, g is a gravitational acceleration coefficient, ρ is an air density, C d The air resistance coefficient, A is the cross-sectional area of the vehicle.
The tire model adopts a table look-up method based on experimental data. The tire experimental data are collected under the condition of pure slip rate or pure corneringA kind of electronic device. Whereas in practice the tire forces are the resultant of the lateral forces and traction forces, which are influencing each other. Therefore, the model adopts the Pacejka tire model considering the longitudinal and transverse coupling characteristics to carry out ovalization on two component forces of experimental data, and corrects table lookup data. Finally according to the longitudinal slip rate lambda of each tyre i Angle of slip alpha i Vertical force F zi The longitudinal force F of the tire can be obtained by looking up a table xi Transverse force F yi (i=1, 2,3, 4), i.e.
F xi =T 1ii ,F zi )
F yi =T 2ii ,F zi )
T 1 、T 2 Respectively represent the longitudinal force F of the tyre xi Transverse force F yi With a longitudinal slip rate lambda i Angle of slip alpha i Vertical force F zi Is a function of the correspondence relation of (a).
Wherein, the slip angle of each wheel is:
in the method, in the process of the application,for the total speed of the vehicle, β=arctan (V x /V y ) Is the centroid slip angle of the vehicle.
The vertical load of each wheel is:
in the method, in the process of the application,h g is the height of the center of mass of the vehicle.
(3) Reward function R
Setting of a bonus function comprising: the first type rewards and the second type rewards are overlapped to form the rewards; the first is the immediate prize awarded after each decision in the collision avoidance process; the second type of rewards is a termination state rewards given based on different state patterns of the bicycle after each training round is finished; the different state modes of the self-vehicle comprise collision and rollover in the collision avoidance process.
The method comprises the following steps: under the TD3 framework, the intelligent agent only learns how to interact with the environment according to the definition of the reward function, so that the maximization of the reward function is realized, and therefore, the control effect of the intelligent agent is directly determined by the design of the reward function. The reward function needs to define a reward and punishment of corresponding actions under different driving states, and if the definition is not clear, the model is not converged or the model is converged to a locally optimal solution easily. There are two types of rewards in the T-shaped emergency collision avoidance problem of the automatic driving vehicle, R is used respectively i And R is t And (3) representing. First prize R i The method is an instant rewarding agent given after each decision step in the collision avoidance process, and aims to overcome sparsity of rewarding in the reinforcement learning process and accelerate learning speed of the intelligent agent. Second kind of rewards R t Is a termination state reward awarded based on the different state patterns of the vehicle after each training round is completed. The three ending modes are collision, collision avoidance and rollover in the collision avoidance process. The definition of each bonus item will be described in detail below.
(31) Instant rewards R i
The setting of the instant rewards can help the learning speed of the intelligent agent to be faster and the convergence to be more stable. Instant rewards mainly consider the following aspects:
(311) Relative velocity term R i1
Relative velocity term R i1 For encouraging the relative speed of the own vehicle with respect to the other vehicle to be as small as possible, thereby reducing potential collision or collision loss, R i1 Is defined as
Wherein D is the relative minimum distance between the own vehicle and the other vehicle, and DeltaV is the component of the relative speed between the own vehicle and the other vehicle along the direction D. k (k) 1 Is a negative constant, and is used to adjust the bonus weight of the relative velocity term.
(312) Relative heading angle term R i2
Related accident studies report that impact energy lessens the impact of a collision by distributing the remaining kinetic energy over a larger surface area when the two bodies are relatively parallel at the time of collision. Thus R is i2 Is defined as
Where k is any integer, k 2 Is a negative constant, and is used to adjust the bonus weight of the relative heading angle term. Psi is the yaw angle of the own vehicle, which in this example is in a stationary state, and its yaw angle is constant pi/2.
(313) Input size and rate of change term R i3
The inputs to the intelligent system are three elements of the action space:
A=[δ,λ 34 ] T
wherein delta is the steering angle of the front wheel of the bicycle, lambda 3 And lambda (lambda) 4 Longitudinal slip ratio of left and right rear wheels of the bicycle. The range is delta epsilon minus 30 degrees and 30 degrees],λ 3 ∈[-1,1],λ 4 ∈[-1,1]. The magnitude of the input item and its rate of change is inversely related to the relationship between rewards. The smaller the input item and its rate of change, the easier the vehicle remains in a linearly stable region, less prone to instability. R is R i1 Is defined as
Wherein k is 3 、k 4 Is a negative constant and is used for adjusting the rewarding weight of the input item and the change rate of the input item.
(32) Termination state reward R t
When the T-shaped emergency collision avoidance is in a termination state, the training round is ended, and the termination state rewards are given based on different state modes of the vehicle. The termination state has three ending modes, namely, collision avoidance, collision occurrence and rollover occurrence in the collision avoidance process.
Wherein k is 5 When the vehicle completes T-shaped collision avoidance without collision and side turning, a larger reward is given for the normal number; k (k) 6 Is a negative constant, and gives larger punishment when rollover occurs in the collision avoidance process of the vehicle; r is R tc For rewards given when the own vehicle finally collides with other vehicles, the magnitude of the rewards reflects the severity of the collision and depends on a combination of factors including the collision speed, the collision position and angle, R tc Represented as
R tc =k 7 +R tc1 +R tc2
Wherein k is 7 Is a negative constant, and is a basic punishment for collision; r is R tc1 R is a collision velocity related term tc2 Is the collision position and angle related term. R will be described below tc Is defined in detail in (a).
(321) Collision velocity term R tc1
In this embodiment, it is assumed that the vehicle is stationary, so the greater the speed before the collision of the vehicle, the greater the kinetic energy it carries, and the greater the collision loss. Thus R is tc1 Represented as
Wherein k is 8 Is a negative constant for adjusting the bonus weight of the relative crash velocity term.
(322) Collision position and angle term R tc2
The impact location and angle, i.e., the area and direction of interaction forces between the impacting vehicles, directly affect the degree of transfer of impact energy, an important factor affecting the severity of the impact.
The collision position is often the most severely damaged area of the vehicle body, and can greatly influence the collision loss due to different structures, materials and collision deformation degrees of different parts of the vehicle, and the collision position I of the vehicle is used for carrying out statistical analysis on the collision accident of the vehicle p The following regions can be distinguished:
the collision angle refers to the included angle of the long shafts of two vehicles when collision occurs. According to statistical analysis of vehicle collision accident, the collision angle I a Is divided into 6 areas from 0 ° to 180 °:0±5° (180±5°), 20±15°, 50±15°, 90±25°, 130±15°, 160±15°. These 6 regions are combined according to the effect:
the two factors of collision position and collision angle are interactively coupled, and the collision severity is different for different combinations of collision states. Different combinations of collision positions and collision angles are shown in FIG. 3, and the reward function values R corresponding to different collision states tc2 Expressed as:
wherein k is 9 Is a negative constant, is used for adjusting the rewarding weight of collision position and angle item, beta i The coefficients corresponding to the different combinations of collision position and collision angle in fig. 3.
Combining all the factors to finally obtain the intelligent agent rewarding function R as
R=R i +R t
In the above embodiment, the network parameters of TD3 are initialized before the reinforcement learning network parameters are updated. The method comprises the following steps:
randomly initializing parameters θ of an action network π Evaluating parameters of a networkInitializing parameter assignment of a target action network and a target evaluation network,/->While constructing an experience pool D.
The network structure of the action network is shown in fig. 4, and is composed of an input layer, two hidden layers and an output layer. The input state is 13 dimensions, the first hidden layer is composed of 400 neurons, the second hidden layer is composed of 300 neurons, and the control output layer is 3 dimensions. The activation function of each hidden layer is a linear correction unit (ReLU), and the activation function of the control output layer is a hyperbolic tangent function (Tanh) to limit the magnitude of the control amount.
The network structure of the evaluation network is shown in fig. 5, and is composed of two input layers, three hidden layers and one output layer. Wherein, the state input is 13 dimensions, the control input is 3 dimensions, the first hidden layer is composed of 400 neurons, the second hidden layer is composed of 300 neurons, and the output is an action value function of 1 dimension. The state input layer and the control input layer skip the first hidden layer and are directly connected with the second hidden layer. The activation function of each hidden layer is a linear correction unit (ReLU), and the activation function of the output layer is an identity transformation (identity).
In the above embodiment, the first setting condition is: epinode is less than or equal to i control The method comprises the steps of carrying out a first treatment on the surface of the The second setting condition is: epi code>i control The method comprises the steps of carrying out a first treatment on the surface of the The third setting condition is: epi code=i max The method comprises the steps of carrying out a first treatment on the surface of the Wherein, epoode is the number of sequences of the current training, i control The number of sequences for learning optimal control; i.e max Is the set maximum training round number.
In the above embodiment, the initial state set in advance is shown in fig. 6.
In the present embodimentIn the example, the initial state measurement s is set 0 The method comprises the following steps:
the initial actions are as follows:
[δ,λ 34 ] T =[0,0,0] T
the total vehicle length and the total vehicle width of the own vehicle and the other vehicle are respectively set as
[L e ,W e ,L r ,W r ] T =[3.5m,1.66m,8m,3m] T
In the above embodiment, in step 1, the optimal control problem based on the rule is that the vehicle is braked fully first, and steering is performed fully after a set time to make the vehicle perform yaw motion to the greatest extent; the control input vector is composed of the transverse force and the longitudinal force of the current tire; the objective function of the rule-based optimal control problem is set to the end state reward.
In the present embodiment, for converting the T-type emergency collision avoidance problem into a rule-based optimal control problem, a rule-based collision avoidance behavior policy is set according to the operation experience of the driver performing the emergency collision avoidance. In the T-shaped collision avoidance process, the self-vehicle is braked fully first, and the time T is set 0 And then steering with full force to enable the vehicle to perform yaw movement to the greatest extent, so that the vehicle can realize collision avoidance or reduce collision loss to the greatest extent in a T-shaped emergency collision avoidance scene. The control optimization model is described as follows:
when t is less than or equal to t 0 Full-force braking of the rear axle of the vehicle (assuming that the driving force is provided by the rear wheels only), according to the vehicle model employed in the present embodiment, the input vector u is controlled at this time control The method comprises the following steps:
u control =[F y1 ,F y2 ,F y3 ,F y4 ,F x3 ,F x4 ] T =[0,0,0,0,μF z3 ,μF z4 ] T
wherein mu is the road adhesion coefficient, F zi (i=1, 2,3, 4) may beDerived from the tire vertical force formula of the vehicle model, μF zi For the maximum tire force that can be provided under the constraint of the traction conditions.
When t>t 0 As can be seen from the initial state and the reward function corresponding to the collision position and angle items shown in fig. 6, the vehicle should take a left turn and the final Y-axis displacement is as large as possible, so as to avoid collision or reduce collision loss to the greatest extent. At this time:
δ=δ max =30°
the tire slip angle formula described by the vehicle model can be used for obtaining the slip angle alpha of the front axle two wheels 1 And alpha 2 Then, the side force of the front axle two wheels is obtained by a table look-up method (the longitudinal slip rate of the front axle two wheels is assumed to be 0):
the two wheels of the rear axle respectively provide maximum longitudinal forces in opposite directions, so that the vehicle can perform yaw movement to the greatest extent under the moment and steering action. At this time, the input vector u is controlled control The method comprises the following steps:
u control =[F y1 ,F y2 ,F y3 ,F y4 ,F x3 ,F x4 ] T =[T 2 (0,α 1 ,F z1 ),T 2 (0,α 2 ,F z2 ),0,0,-μF z3 ,μF z4 ] T
the objective function J is set to the end state prize R t
J=R t
The only variable in this optimization problem is t 0 When t 0 When determining, the real-time control input u of the whole collision avoidance process of the vehicle control And the state of motion is also determined. Therefore, t which maximizes the objective function J can be solved through iteration in MATLAB/Simulink simulation software 0
In the above embodiment, in step 2, when the first setting condition epi code is not more than i is satisfied control Based on optimal control input pairsThe network parameters for reinforcement learning are updated. The method specifically comprises the following steps:
step 21, obtaining a new measured value and a current rewarding value based on the control input quantity, and forming four elements of state transition by the original measured value, the control input quantity, the new measured value and the current rewarding value, and storing the four elements in an experience pool;
the method comprises the following steps: calculating a control input u of a rule-based optimal control problem in combination with the vehicle model, the reward function and the initial state t . In the reinforcement learning training process, the control amount is executedObtaining new measured values s t+1 And the current prize value r t State transition four elements->Stored in experience pool D.
Step 22, randomly sampling in an experience pool, calculating target values of two evaluation networks in an Actor-Critic framework of TD3, and taking a minimum value;
the method comprises the following steps: randomly sampling N groups of data in the experience pool D, calculating target values of two evaluation networks, and taking the minimum value:
step 23, updating the evaluation network parameters by minimizing the loss function:
step 24, updating the action network by minimizing the difference between the optimal control input quantity and the action network control quantity, and then updating the target evaluation network and the target action network.
The method comprises the following steps: every d rounds, the action network is updated by minimizing the difference between the optimal control input quantity and the action network control quantity:
/>
where f (·) is the output pi(s) of the current action network t ∣θ π )=[δ,λ 34 ] T Control input to optimal control problem determinationThe mapping function of (2) can be determined by a vehicle dynamics equation and a table lookup method;
then updating the target evaluation network and the target action network:
θ π′ ←τθ π +(1-τ)θ π′
in the above embodiment, when the second setting condition epi code is satisfied in step 3>i control The method for updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 comprises the following steps:
step 31, selecting a control input quantity, obtaining a new measured value and a current rewarding value according to the control input quantity, forming four elements of state transition by the original measured value, the control input quantity, the new measured value and the current rewarding value, and storing the four elements in an experience pool;
the method comprises the following steps: selecting a control quantity u according to an action network strategy and an exploration strategy t =π(s t ∣θ π ) E, e is noise,
according to the control quantity u t Obtaining new measured values s t+1 And the current prize value r t State transition four elements (s t ,u t ,r t ,s t+1 ) The experience pool D is stored;
step 32, randomly sampling in an experience pool, calculating target values of two evaluation networks in an Actor-Critic framework of TD3, and taking a minimum value;
the method comprises the following steps: randomly sampling N groups of data in the experience pool D, calculating a target value of an evaluation network, and taking a minimum value:
step 33, updating the evaluation network parameters by minimizing the loss function:
step 34, updating the action network through a strategy gradient method, and then updating the target evaluation network and the target action network;
the method comprises the following steps: every d rounds, updating the action network through a strategy gradient algorithm:
and updating the target evaluation network and the target action network:
θ π′ ←τθ π +(1-τ)θ π′
until the third setting condition epi code=i is satisfied max
To sum up, as shown in fig. 7 and 8, the effect schematic diagram after training and testing in the simulation environment is shown by using the T-shaped emergency collision avoidance control method for the automatic driving vehicle based on deep reinforcement learning provided by the application.
Fig. 7 is a graph showing the round prize training of the TD3 algorithm during learning, wherein the gray curve is the actual prize for each round and the dark curve is the average prize for each 200 rounds. As can be seen from fig. 7, the return value obtained in the previous 8000 rounds has a general trend of increasing as the number of rounds increases, which indicates that the control capability of the algorithm is improved from the interaction process. The return values obtained for rounds 8000-12000 gradually tended to be smooth, indicating that the algorithm had a near optimal strategy at the end of training.
Fig. 8 is a schematic view of T-shaped collision avoidance trajectory, based on the set initial state condition, although collision cannot be avoided under the extreme working condition, the own vehicle is in yaw motion through steering, and finally is basically parallel to the bodies of two vehicles when the own vehicle collides with other vehicles, so that the collision contact area is increased, and the collision loss is reduced.
In one embodiment of the present application, there is provided a T-type emergency collision avoidance control system for an autonomous vehicle, including: the device comprises a computing module, a first updating module and a second updating module;
the calculation module is used for calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state;
the first updating module is used for updating the network parameters of reinforcement learning based on the control input quantity when the first setting condition is met until the second setting condition is met;
and the second updating module is used for updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 when the second setting condition is met until the third setting condition is met, and outputting the optimal control quantity.
The system provided in this embodiment is used to execute the above method embodiments, and specific flow and details refer to the above embodiments, which are not described herein.
As shown in fig. 9, a schematic structural diagram of a computing device provided in an embodiment of the present application, where the computing device may be a terminal, and may include: a processor (processor), a communication interface (Communications Interface), a memory (memory), a display screen, and an input device. The processor, the communication interface and the memory complete communication with each other through a communication bus. The processor is configured to provide computing and control capabilities. The memory includes a non-volatile storage medium storing an operating system and a computer program which when executed by the processor implements a control method; the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a manager network, NFC (near field communication) or other technologies. The display screen can be a liquid crystal display screen or an electronic ink display screen, the input device can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computing equipment, and can also be an external keyboard, a touch pad or a mouse and the like. The processor may call logic instructions in memory to perform the following method:
calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state; when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met; and when the second setting condition is met, updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 until the third setting condition is met, and outputting the optimal control quantity.
Further, the logic instructions in the memory described above may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be appreciated by those skilled in the art that the architecture shown in fig. 9 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computing devices to which the present inventive arrangements may be applied, and that a particular computing device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment of the present application, there is provided a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the methods provided by the method embodiments described above, for example comprising: calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state; when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met; and when the second setting condition is met, updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 until the third setting condition is met, and outputting the optimal control quantity.
In one embodiment of the present application, there is provided a non-transitory computer-readable storage medium storing server instructions that cause a computer to perform the methods provided by the above embodiments, for example, including: calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state; when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met; and when the second setting condition is met, updating the network parameters of reinforcement learning based on the Actor-Critic framework of TD3 until the third setting condition is met, and outputting the optimal control quantity.
The foregoing embodiment provides a computer readable storage medium, which has similar principles and technical effects to those of the foregoing method embodiment, and will not be described herein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (7)

1. A T-shaped emergency collision avoidance control method for an autonomous vehicle, comprising:
calculating the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state;
when the first setting condition is met, updating the network parameters of reinforcement learning based on the control input quantity until the second setting condition is met;
updating the network parameters of reinforcement learning based on an Actor-Critic framework of TD3 when the second setting condition is met until a third setting condition is met, and outputting the optimal control quantity;
the setting of the reward function comprises the following steps: the first type rewards and the second type rewards are overlapped to form the rewards;
the first reward is an instant reward given after each decision in the collision avoidance process;
the second rewards are termination state rewards given based on different state modes of the bicycle after each training round is finished; the different state modes of the self-vehicle comprise collision and rollover in the collision avoidance process;
the calculating the control input quantity of the optimal control problem based on the rule comprises the following steps:
the optimal control problem based on the rules is that the vehicle is braked at full force first, and the vehicle is steered at full force after the set time so as to make the yaw movement to the greatest extent;
the control input quantity consists of the transverse force and the longitudinal force of the current tire;
setting an objective function of the rule-based optimal control problem as a termination state reward;
the first setting condition is: epinode is less than or equal to i control
The second setting condition is: epi code>i control
The third setting condition is: epi code=i max
The epinode is the current trainingNumber of sequences of training, i control The number of sequences for learning optimal control; i.e max Is the set maximum training round number.
2. The control method as set forth in claim 1, further comprising: presetting a state space and an action space in a Markov decision model based on T-shaped collision avoidance of an automatic driving vehicle;
the state space contains all information required by T-shaped emergency collision avoidance of the automatic driving vehicle, including self-vehicle state information and surrounding environment information;
the action space comprises a steering angle of front wheels of the bicycle and longitudinal slip rates of left and right rear wheels of the bicycle.
3. The control method of claim 1, wherein updating the reinforcement-learned network parameters based on the control input comprises:
obtaining a new measured value and a current rewarding value based on the control input quantity, forming four elements of state transition by the original measured value, the control input quantity, the new measured value and the current rewarding value, and storing the four elements in an experience pool;
randomly sampling in an experience pool, calculating target values of two evaluation networks in an Actor-Critic framework of TD3, and taking a minimum value;
updating the evaluation network parameters by minimizing the loss function;
the action network is updated by minimizing the difference in the optimal control input amount and the action network control amount, and then the target evaluation network and the target action network are updated.
4. The control method of claim 1, wherein the TD 3-based Actor-Critic framework updates reinforcement-learned network parameters, comprising:
selecting a control input quantity, obtaining a new measured value and a current rewarding value according to the control input quantity, forming four elements of state transition by the original measured value, the control input quantity, the new measured value and the current rewarding value, and storing the four elements in an experience pool;
randomly sampling in an experience pool, calculating target values of two evaluation networks in an Actor-Critic framework of TD3, and taking a minimum value;
updating the evaluation network parameters by minimizing the loss function:
and updating the action network by a strategy gradient method, and then updating the target evaluation network and the target action network.
5. A T-shaped emergency collision avoidance control system for an autonomous vehicle, comprising: the device comprises a computing module, a first updating module and a second updating module;
the calculation module calculates the control input quantity of the optimal control problem based on the rule according to a preset vehicle model, a reward function and an initial state;
the first updating module updates the network parameters of reinforcement learning based on the control input quantity when the first setting condition is met until the second setting condition is met;
the second updating module updates the network parameters of reinforcement learning based on an Actor-Critic framework of TD3 when the second setting condition is met until a third setting condition is met, and outputs the optimal control quantity;
the setting of the reward function comprises the following steps: the first type rewards and the second type rewards are overlapped to form the rewards;
the first reward is an instant reward given after each decision in the collision avoidance process;
the second rewards are termination state rewards given based on different state modes of the bicycle after each training round is finished; the different state modes of the self-vehicle comprise collision and rollover in the collision avoidance process;
the calculating the control input quantity of the optimal control problem based on the rule comprises the following steps:
the optimal control problem based on the rules is that the vehicle is braked at full force first, and the vehicle is steered at full force after the set time so as to make the yaw movement to the greatest extent;
the control input quantity consists of the transverse force and the longitudinal force of the current tire;
setting an objective function of the rule-based optimal control problem as a termination state reward;
the first setting condition is: epinode is less than or equal to i control
The second setting condition is: epi code>i control
The third setting condition is: epi code=i max
The epoode is the number of sequences, i, of the current training control The number of sequences for learning optimal control; i.e max Is the set maximum training round number.
6. A computer readable storage medium storing one or more programs, wherein the one or more programs comprise instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-4.
7. A computing device, comprising: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-4.
CN202110948176.XA 2021-08-18 2021-08-18 T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle Active CN113460090B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110948176.XA CN113460090B (en) 2021-08-18 2021-08-18 T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110948176.XA CN113460090B (en) 2021-08-18 2021-08-18 T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle

Publications (2)

Publication Number Publication Date
CN113460090A CN113460090A (en) 2021-10-01
CN113460090B true CN113460090B (en) 2023-09-12

Family

ID=77866713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110948176.XA Active CN113460090B (en) 2021-08-18 2021-08-18 T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle

Country Status (1)

Country Link
CN (1) CN113460090B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116946162B (en) * 2023-09-19 2023-12-15 东南大学 Intelligent network combined commercial vehicle safe driving decision-making method considering road surface attachment condition

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018052444A (en) * 2016-09-30 2018-04-05 株式会社Subaru Collision input reduction device of vehicle
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN111985614A (en) * 2020-07-23 2020-11-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system
CN112224202A (en) * 2020-10-14 2021-01-15 南京航空航天大学 Multi-vehicle cooperative collision avoidance system and method under emergency working condition
WO2021053474A1 (en) * 2019-09-17 2021-03-25 Kpit Technologies Limited System and method for dynamic evasive maneuver trajectory planning of a host vehicle
CN112633474A (en) * 2020-12-20 2021-04-09 东南大学 Backward collision avoidance driving decision method for heavy commercial vehicle
CN112906126A (en) * 2021-01-15 2021-06-04 北京航空航天大学 Vehicle hardware in-loop simulation training system and method based on deep reinforcement learning
CN112896170A (en) * 2021-01-30 2021-06-04 同济大学 Automatic driving transverse control method under vehicle-road cooperative environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018052444A (en) * 2016-09-30 2018-04-05 株式会社Subaru Collision input reduction device of vehicle
WO2021053474A1 (en) * 2019-09-17 2021-03-25 Kpit Technologies Limited System and method for dynamic evasive maneuver trajectory planning of a host vehicle
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN111985614A (en) * 2020-07-23 2020-11-24 中国科学院计算技术研究所 Method, system and medium for constructing automatic driving decision system
CN112224202A (en) * 2020-10-14 2021-01-15 南京航空航天大学 Multi-vehicle cooperative collision avoidance system and method under emergency working condition
CN112633474A (en) * 2020-12-20 2021-04-09 东南大学 Backward collision avoidance driving decision method for heavy commercial vehicle
CN112906126A (en) * 2021-01-15 2021-06-04 北京航空航天大学 Vehicle hardware in-loop simulation training system and method based on deep reinforcement learning
CN112896170A (en) * 2021-01-30 2021-06-04 同济大学 Automatic driving transverse control method under vehicle-road cooperative environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度Q值网络的自动小车控制方法;王立群;朱舜;韩笑;何军;;电子测量技术(11);第226-229页 *

Also Published As

Publication number Publication date
CN113460090A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN109849899B (en) Electro-hydraulic composite vehicle body stability control system and method for electric wheel vehicle
CN111890951B (en) Intelligent electric automobile trajectory tracking and motion control method
Li et al. Comprehensive tire–road friction coefficient estimation based on signal fusion method under complex maneuvering operations
Yoon et al. Design and evaluation of a unified chassis control system for rollover prevention and vehicle stability improvement on a virtual test track
CN103213582B (en) Anti-rollover pre-warning and control method based on body roll angular estimation
CN106004870A (en) Vehicle stability integrated control method based on variable-weight model prediction algorithm
EP4253181A1 (en) Vehicle front and rear drive torque distribution method and apparatus, and vehicle
Wang et al. Constrained H∞ control for road vehicles after a tire blow-out
Chakraborty et al. Vehicle posture control through aggressive maneuvering for mitigation of T-bone collisions
CN110606079A (en) Layered control vehicle rollover prevention method and multi-shaft distributed driving vehicle
JP2002087310A (en) Action to vehicle track based on measurement of lateral force
Singh et al. Trajectory tracking and integrated chassis control for obstacle avoidance with minimum jerk
US20190276009A1 (en) Control apparatus for vehicle and control method for vehicle
Chakraborty et al. Time-optimal vehicle posture control to mitigate unavoidable collisions using conventional control inputs
CN113733929B (en) Wheel torque coordination control method and device for in-wheel motor driven vehicle
CN113460090B (en) T-shaped emergency collision avoidance control method, system, medium and equipment for automatic driving vehicle
Mok et al. A post impact stability control for four hub-motor independent-drive electric vehicles
CN108569288B (en) definition and collision avoidance control method for dangerous working conditions of automobile
CN113002527B (en) Robust fault-tolerant control method for lateral stability of autonomous electric vehicle
Hajiloo et al. A model predictive control of electronic limited slip differential and differential braking for improving vehicle yaw stability
Zhang et al. A fuzzy control strategy and optimization for four wheel steering system
Guastadisegni et al. Vehicle stability control through pre-emptive braking
CN114162110B (en) Transverse stability control method for unmanned vehicle
CN115230687A (en) Vehicle drifting and collision avoidance control method and system under brake failure working condition
Hakima et al. Designing a fuzzy logic controller to adjust the angle of tires in four wheel steering vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant