CN114179835B

CN114179835B - Automatic driving vehicle decision training method based on reinforcement learning in real scene

Info

Publication number: CN114179835B
Application number: CN202111653767.0A
Authority: CN
Inventors: 孙辉; 戴一凡
Original assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Current assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2024-01-05
Anticipated expiration: 2041-12-30
Also published as: CN114179835A

Abstract

The invention discloses an automatic driving vehicle decision training method based on reinforcement learning in a real scene. The automatic driving vehicle is provided with a wire control chassis, a positioning device, a laser radar device and an automatic driving controller, and the method comprises the following steps: when a vehicle runs in a real scene according to a track point of a preset running path, intermittently executing exploration behaviors and recording input information of a reinforcement learning model, wherein the input information comprises an input state, an action space and return after single step execution; training the reinforcement learning decision algorithm according to the input information. The invention breaks through the limitation that the reinforcement learning algorithm depends on the virtual environment through the key technologies of the drive-by-wire chassis, four laser radars, an RTK positioning unit, a computer controller and other basic hardware, such as preset running track, small-range sampling action exploration, reliable safety protection, automatic reset and the like, and realizes the on-line automatic acquisition, training and verification of the reinforcement learning decision algorithm of the automatic driving vehicle.

Description

Automatic driving vehicle decision training method based on reinforcement learning in real scene

Technical Field

The embodiment of the invention relates to the technical field of automatic driving, in particular to an automatic driving vehicle decision training method based on reinforcement learning in a real scene.

Background

Autonomous vehicles, also known as intelligent automobiles, are an important application of outdoor wheeled mobile robots in the traffic field. It senses the surrounding environment of the vehicle using on-vehicle sensors such as cameras, lidar, ultrasonic sensors, microwave radar, GPS, odometer, magnetic compass, etc., and can control the steering and speed of the vehicle according to the road, vehicle position and obstacle information obtained by sensing, thereby enabling the vehicle to run safely and reliably on the road.

The intelligent automobile fundamentally changes the traditional closed-loop control mode of 'one-way-one-man', and the uncontrollable driver is requested out of the closed-loop system, so that the artificial influence factors are reduced, and the accurate machine control is realized by the machine driving brain, so that the efficiency and the safety of the traffic system are greatly improved.

The traditional prediction method based on artificial features or vehicle dynamics models cannot solve the problems of high dynamics, uncertainty, strong nonlinearity and the like in the actual road traffic environment, and the industrialized development process of the intelligent driving technology is influenced and limited.

The deep reinforcement learning theory is used for exploring and solving the problem of random uncertainty of the intelligent driving technology through analyzing and researching and calculating big data, and laying a scientific theoretical support for further realizing industrialization of the intelligent driving automobile. Reinforcement learning algorithms rely mostly on virtual simulation environment data acquisition and training, which greatly limits their application in real scenes.

Disclosure of Invention

The invention provides an automatic driving vehicle decision training method based on reinforcement learning in a real scene, which breaks through the limitation that the reinforcement learning algorithm depends on a virtual environment and realizes the online automatic acquisition, training and verification of the automatic driving vehicle reinforcement learning decision algorithm.

The invention provides an automatic driving vehicle decision training method based on reinforcement learning in a real scene, wherein the automatic driving vehicle is provided with a drive-by-wire chassis, a positioning device, a laser radar device and an automatic driving controller, the drive-by-wire chassis runs along a track point of a preset running path after being started, the positioning device is used for acquiring position information of the vehicle, the radar device is used for acquiring environmental data of the running process of the vehicle, and the automatic driving controller is used for controlling the running process of the vehicle according to a preset algorithm, and the method comprises the following steps:

when a vehicle runs in a real scene according to a track point of a preset running path, intermittently executing exploration behaviors and recording input information of a reinforcement learning model, wherein the input information comprises an input state S, an action space A and a return R after single step execution;

training a reinforcement learning decision algorithm according to the input information.

Optionally, the input state S includes: the track point S1 of the travel path and the abstract information S2 of the surrounding environment acquired by the laser radar device are preset.

Optionally, the action space A comprises two decomposition actions of a transverse action space A1 and a longitudinal action space A2;

wherein, the transverse action space A1 is assumed to accord with Gaussian distribution and is used as the basis of subsequent random action sampling;

the longitudinal movement space A2 is set with a reference value.

Optionally, the return R after single step execution is an evaluation obtained after single step execution of the a action in the input state S;

factors related to the return after the single step R include: and evaluating the offset of the vehicle driving path and the preset driving path, the offset of the vehicle driving speed and the expected driving speed, and the collision risk and lane departure of the vehicle.

Optionally, when the number of times of continuously executing the exploring action reaches a set threshold, the vehicle is controlled to reset, and the vehicle is driven according to a preset driving track point.

Optionally, the reinforcement learning algorithm is an offline reinforcement learning algorithm.

The invention has the beneficial effects that:

1. the invention provides an automatic driving vehicle decision training algorithm based on reinforcement learning in a real scene, which breaks through the limitation that the reinforcement learning algorithm depends on a virtual environment through the key technologies of presetting a driving track, small-range sampling action exploration, reliable safety protection, automatic resetting and the like, and has reference and guiding significance for popularization and use of reinforcement learning in the real environment.

2. According to the invention, during the whole sample collection period, the full-automatic driving behavior is realized, the manual workload is greatly reduced, the sampling efficiency is improved, and the synchronous operation and sampling of a plurality of automatic driving vehicles are supported.

3. The invention sets the reference value in the longitudinal motion space by applying Gaussian distribution limitation in the transverse motion space, and the designs are beneficial to the rapid convergence of the intelligent body model.

Drawings

FIG. 1 is a flow chart of an autonomous vehicle decision training method based on reinforcement learning in a real scene in the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

The embodiment of the invention provides an automatic driving vehicle decision training method based on reinforcement learning in a real scene, wherein the automatic driving vehicle is provided with a drive-by-wire chassis, a positioning device, a laser radar device and an automatic driving controller, the drive-by-wire chassis runs along a track point of a preset running path after being started, the positioning device is used for acquiring position information of the vehicle, the radar device is used for acquiring environmental data of the running process of the vehicle, and the automatic driving controller is used for controlling the running process of the vehicle according to a preset algorithm.

Preferably, the laser radars can be 4 360-degree laser radars arranged on the front, rear, left and right vehicle bodies of the wire control chassis; the positioning device may be a roof mounted RTK high precision positioning unit.

Referring to fig. 1, the method includes:

s110, intermittently executing exploration behaviors and recording input information of a reinforcement learning model when a vehicle runs in a real scene according to track points of a preset running path, wherein the input information comprises an input state S, an action space A and a return R after single step execution;

s120, training the reinforcement learning decision algorithm according to the input information.

The reinforcement learning model in this embodiment selects an off-policy reinforcement learning algorithm, such as DDPG (Deep Deterministic Policy Gradient) algorithm, TD3 (Twin Delayed Deep Deterministic policy gradient) algorithm, or SAC (Soft Actor-Critic) algorithm in deep reinforcement learning. Offline reinforcement learning algorithms can make full use of historical data.

Further, a high-precision map of a target driving area and preset driving track points are pre-stored in automatic driving control, and the vehicle drives along the track under a policy environment. And the whole sampling process is full-automatic driving, so that the sampling efficiency can be improved, and parallel sampling of multiple vehicles can be realized. In order to explore more action spaces, a random exploration and automatic resetting method is provided to fully explore the environment.

Specifically, the input state S includes two parts S1, S2. S1 is a track point of a preset running path and is global information; s2 is abstraction of surrounding environment perceived by the laser radar, and comprises dynamic and static barriers and a travelable area around a vehicle.

The action space A comprises two decomposition actions of a transverse action space A1 and a longitudinal action space A2;

wherein, the transverse action space A1 is assumed to accord with Gaussian distribution and is used as the basis of subsequent random action sampling; the longitudinal action space A2 is provided with a reference value, and the rapid convergence of the intelligent body model is realized through the design.

The return R after single step execution is the evaluation obtained after single step execution of the action A under the input state S, and R is related to 3 factors: 1) And the offset of the preset path, the smaller the offset is, the larger R is; 2) And the offset amount of the expected running speed, the smaller the offset amount, the larger R; 3) Evaluation of collision risk and lane departure.

In this embodiment, a high-precision map of a specific scene and a closed running track τ under normal conditions of the vehicle are preset in the automatic driving controller, and the vehicle should always run along the track according to the positioning information received by the RTK positioning unit under normal barrier-free conditions. However, since the full exploration of the environment cannot be obtained all the time along the track, a method A1_ for sampling the strategy conforming to the Gaussian distribution in the transverse operation is adopted to replace the standard preset action A1, random noise is added in the speed operation to form the action A2_, and the action A2_ is replaced, so that the full exploration of the environment is realized.

Because the automatic exploration process is set, the vehicle running has strong random uncertainty objectively, and the deviation phenomenon occurs. In order to solve this problem, when the number of times of continuously executing the search behavior reaches a set threshold, the vehicle is controlled to reset and travel according to a preset travel locus point. In the embodiment, the safety of the vehicle in the random exploration process can be ensured by presetting the constraint of the running track, the RTK high-precision positioning unit and the four laser radar data and in a full-automatic driving state.

Examples

The embodiment of the invention provides an application case of an automatic driving vehicle decision training method based on reinforcement learning in a real scene, which comprises the following steps:

1. preparing a line control chassis after debugging, respectively installing four 360-degree laser radars on front, rear, left and right vehicle bodies of the line control chassis, and combining to form full coverage of a 360-degree view field. And installing the RTK high-precision positioning unit on the roof, and fixing the automatic driving controller on the vehicle body.

2. And downloading the high-precision map of the fixed park and the preset running track point into an automatic driving controller.

3. And downloading the written driving control and exploration algorithm into the automatic driving controller.

4. And downloading the written safe collision avoidance and behavior evaluation algorithm into an automatic driving controller.

5. When the drive-by-wire chassis and the automatic driving controller are started, the drive-by-wire chassis can run along a preset track, the exploration is intermittently executed, and at the moment, the data of S, A, R are recorded. If yaw is generated, the track point is automatically reset to a preset track point, and behavior exploration is restarted.

6. If the search operation does not reach the automatic reset condition (reset after searching for a certain number of times), continuous search is performed.

7. And the training is carried out until the training requirement is met, and an off-policy (offline) reinforcement learning algorithm such as DDPG, SAC and the like is adopted during training.

8. And downloading the trained model into an automatic driving controller, and evaluating the effect of the model by using a real vehicle.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. The utility model provides an automatic driving vehicle decision training method based on reinforcement learning under real scene, characterized in that, automatic driving vehicle is provided with drive-by-wire chassis, positioner, laser radar device and automatic driving controller, the drive-by-wire chassis is followed the orbit point of predetermineeing the travel path after starting, positioner is used for obtaining the positional information of vehicle, radar device is used for obtaining the environmental data of vehicle travel process, automatic driving controller is used for controlling vehicle travel process according to predetermineeing the algorithm, the method includes:

training a reinforcement learning decision algorithm according to the input information;

the input state S includes: presetting a track point S1 of a driving path and abstract information S2 of surrounding environment acquired by a laser radar device;

a reference value is set on the longitudinal movement space A2;

the return R after single step execution is the evaluation obtained after single step execution of the action A under the input state S;

2. The method according to claim 1, wherein when the number of times of continuously performing the exploring action reaches the set threshold, the vehicle is controlled to be reset and travel according to the preset travel locus point.

3. The method of claim 1, wherein the reinforcement learning algorithm is an offline reinforcement learning algorithm.