CN116476861A

CN116476861A - Automatic driving decision system based on multi-mode sensing and layering actions

Info

Publication number: CN116476861A
Application number: CN202310191784.XA
Authority: CN
Inventors: 郑凯; 苏涵; 刘顺程; 夏宇阳; 余泉霖; 胡锐; 许志; 陈旭
Original assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Current assignee: Yangtze River Delta Research Institute of UESTC Huzhou
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-07-25

Abstract

The invention discloses an automatic driving decision system based on multi-mode perception and layering actions, which comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module, wherein the data preprocessing module is used for preprocessing data of the data; the training speed of the model is increased and the training performance of the model is improved through parallel training. The invention has better applicability, and utilizes various sensors and deep learning algorithms to comprehensively sense the surrounding environment information, so that the automatic driving vehicle can better cope with some emergency situations, and the cross-mode aggregation model of the lane can better extract useful characteristics. The reinforcement learning model based on layered actions better optimizes the lane keeping behavior and lane changing behavior of the automatic driving vehicle. The safety, efficiency and comfort of each decision of the automatic driving vehicle are met, traffic fluctuation caused by abrupt speed change and lane change of the automatic driving vehicle is reduced, and the traffic efficiency and traffic stability of the whole traffic are improved.

Description

Automatic driving decision system based on multi-mode sensing and layering actions

Technical Field

The invention relates to the technical field of traffic, in particular to an automatic driving decision system based on multi-mode sensing and layering actions.

Background

Due to the development of urbanization and the popularization of private cars, many large cities around the world frequently suffer from traffic congestion. Generally, traffic congestion is mainly caused by road environment and drivers. Although traffic jam is easily caused in environmental factors such as road construction and converging lanes, researchers investigate and find that poor driving behaviors (such as sudden braking and forced lane change) of drivers are main factors causing traffic jam and even traffic accidents. This is because the driving performance of human driving is often limited by factors such as limited field of view and long response delays. Therefore, it is difficult for a human driver to maintain good driving performance at all times.

In the prior art, automatic driving is receiving extensive social attention. Automated driving vehicles use a variety of sensor technologies to acquire information about the surrounding environment and computer technology to calculate an ideal speed decision and lane change decision as compared to human drivers using eyes and brain to perceive the surrounding environment and make decisions. Conventional decision algorithms use rule matching and mathematical calculations to determine the speed and lane change of an autonomous vehicle. But this approach requires complex parameter tuning and is difficult to adapt to complex traffic scenarios. With the development of deep learning technology, researchers have utilized neural networks to mimic the driving behavior of human drivers. But deep learning models are prone to modeling errors and decisions by the human driver themselves are often not optimal. In view of the workflow of first acquiring surrounding information and then making decisions in an autopilot task, reinforcement learning seems to be a promising approach to solve autopilot problems. Many recent researchers have used reinforcement learning algorithms to optimize the decision making of an automated driving vehicle and achieved better decision performance than traditional algorithms and deep learning algorithms. However, the existing reinforcement learning decision frames have two drawbacks: poor environmental suitability and less optimization objectives. For the first drawback, most of the existing algorithms are experiments performed in a simple straight-path environment and assume that surrounding traffic information can be accurately acquired. Therefore, these algorithms cannot accommodate complex traffic scenes and cannot cope with uncertainty factors in real traffic scenes. Recently, some researchers began to conduct autopilot experiments on urban roads and perceived complex traffic information using cameras and lidar. However, acquiring only image data and point cloud data of the surrounding environment is not sufficient for the automated driving vehicle to calculate an ideal decision. Automated driving vehicles also need to acquire road information ahead and dynamic information of the own vehicle, such as the vehicle's direction and distance from the lane. With respect to the second limitation, existing algorithms primarily optimize the safety, efficiency and comfort of the autonomous vehicle with little consideration given to reducing the impact of the autonomous vehicle on other vehicles. In high density scenarios, a bad driving behaviour may trigger a chain reaction of the following vehicle, just like the "domino effect" propagating in the traffic flow, eventually leading to traffic congestion. Recently, a tree search-based decision framework searches for decisions that can reduce the impact on other vehicles through decision tree algorithms. However, this method ideally separates lane-change behavior into lane-change to the left, lane-keeping, and lane-change to the right, ignoring the intermediate states of lane-change behavior. This means that the autopilot cannot withdraw this lane in the middle of a lane change even if it would cause safety problems. For example, an autonomous vehicle may find a sudden acceleration of the vehicle behind the target lane during a lane change, which may result in a collision with the vehicle if the autonomous vehicle continues to perform the lane change.

Disclosure of Invention

The invention aims to provide an automatic driving decision system based on multi-mode sensing and layering actions.

In order to achieve the above purpose, the invention is implemented according to the following technical scheme:

the system comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module;

and a data processing module: acquiring state information of surrounding environment through a high-precision map and various sensors;

the state characterization module: extracting useful state information through a neural network and learning potential characterizations of the information;

actor commentator module: generating a decision of the automatic driving vehicle and a decision of evaluating the automatic driving vehicle respectively through an actor network model and a critique network model;

a hybrid bonus function module: guiding the reinforcement learning model to make good decision performance;

a multi-process training module: the training speed of the model is increased and the training performance of the model is improved through parallel training.

The beneficial effects of the invention are as follows:

the invention relates to an automatic driving decision system based on multi-mode sensing and layering actions, which has the following advantages compared with the prior art:

(1) The invention has better applicability, and applies various sensors and deep learning algorithms to comprehensively sense the surrounding environment information, so that the automatic driving vehicle can better cope with some emergency conditions, such as: sudden braking and forced plugging of surrounding vehicles.

(2) The invention provides a cross-mode aggregation model of lane division, so that an automatic driving vehicle can extract useful characteristics faster and better.

(3) The invention provides a reinforcement learning model based on layered actions, so that lane keeping behavior and lane changing behavior of an automatic driving vehicle are better optimized.

(4) The invention satisfies the safety, efficiency and comfort of each decision of the automatic driving vehicle, and greatly reduces the fluctuation of the traffic flow caused by abrupt speed change and lane change of the automatic driving vehicle, thereby improving the traffic efficiency and the traffic flow stability of the whole traffic flow.

Drawings

FIG. 1 is a diagram of the overall architecture of the system of the present invention;

FIG. 2 is a network block diagram of a state characterization module of the present invention;

fig. 3 is a block diagram of layered actions in the actor commentator module of the present invention.

Detailed Description

The invention will be further described with reference to the accompanying drawings and specific embodiments, wherein the exemplary embodiments and descriptions of the invention are for purposes of illustration, but are not intended to be limiting.

As shown in fig. 1: the invention is divided into five parts: the system comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module.

The data preprocessing module acquires information of the surrounding environment through a high-precision map and various sensors. The method comprises the following steps: the method comprises the steps of obtaining traffic light information by using a camera, obtaining surrounding vehicle information by using a laser radar, obtaining road topology structure and lane information by using a high-precision map, and obtaining information such as the position and the speed of an automatic driver by using a positioning technology and an inertial measurement unit. Finally, these information are combined into state information of the automated driving vehicle and input into the reinforcement learning model.

The state characterization module extracts useful state information through the neural network and learns potential characterizations of the information. As shown in fig. 2, the multi-modal information (lanes, regular vehicles and traffic lights) around the autopilot is first assembled into three autopilot-centered star charts (G _k-1 ，G _k ，G _k+1 ). Each star map corresponds to multi-modal information on a lane. To aggregate such multimodal information, a cross-modality attention mechanism is used to aggregate information from different modalities. Next, a method of calculating the attention score will be described taking as an example a multi-modal feature on the current lane (lane k) of the automated guided vehicle.

Wherein e _A Coding of feature vectors representing an autonomous vehicle e _l Encoding of feature vectors representing structural features of lane k, e _tl Coding of a feature vector representing a traffic light belonging to lane k, e _v The sum of codes, alpha, representing the feature vectors of a conventional vehicle located on lane k _k Attention score, W, representing characteristics of each modality ₁ And W is ₂ Parameters representing the linear layer encoding these feature vectors, D _f Representing the dimension of the feature vector, softmax represents the activation function before the attention score is obtained. These attention scores alpha are then applied _k Corresponding to the feature vector (e _l ，e _tl ，e _v ) Multiplying again to obtain state representation E of multi-mode information on lane k _k ：

E _k ＝α _k (W ₃ [e _l ，e _tl ，e _v ])

Wherein W is ₃ Representing parameters of a linear layer that aggregates these multimodal information.

The state representations of the multi-modal features on each lane are then stitched together and a linear layer is used to calculate a representation E comprising multi-modal information on all lanes _s ：

Wherein W is ₄ Parameters representing the linear layer, E _k-1 Representation of multimodal information on lane k-1, E _k Representation of multimodal information on lane k, E _k+1 Representing a representation of the multimodal information on lane k+1.

The actor commentator module generates decisions for and evaluates the automated driving vehicle via an actor network model and a commentator network model, respectively. Because both lane-following decisions and lane-changing decisions require control of the steering angle of the autonomous vehicle, a layered structure is designed to decouple the different decisions. As shown in fig. 3, the decision of the automated driving vehicle is first divided into three high level decisions: lane change to the left, lane keeping and lane change to the right. Each high level decision then requires calculation of an independent action value, including steering angle value, throttle value and brake value. In an actor network, a linear layer is used to calculate the action value X at three high level decisions simultaneously:

X＝TAnh(W ₅ E _s )

wherein W is ₅ Parameters representing the linear layer, E _s Representing the state characterization, TAnh represents the activation function of the linear layer. After the actor network has calculated the action values, the reviewer network will be used to calculate the Q value for each action.

In reinforcement learning, the Q value represents a future expected return value for an action. The larger the Q value, the better the performance of the action. Specifically: a linear layer is used to calculate Q values for three action values:

wherein W is ₆ Parameters representing the linear layer, E _s Representing state representation, E _a Representing a representation of the action value X. After the action value and the Q value of each decision are obtained, the action value under the decision with the maximum Q value is selected as the action finally executed by the automatic driving vehicle.

The hybrid rewards function module is intended to guide the reinforcement learning model to make good decision performance. For an autopilot, an ideal decision first needs to meet safety, efficiency and comfort requirements. Second, the decision needs to minimize the impact on surrounding vehicles. Thus, the hybrid bonus function is divided into 4 parts: safety, efficiency, comfort and impact on surrounding vehicles.

For the prize value in terms of safety, the collision risk of the autonomous vehicle and the distance from the lane center line are used for calculation. The calculation formula is as follows:

where ttc indicates how long it takes for the autonomous vehicle to collide with the preceding vehicle if it remains at the current speed value,is a threshold value for ttc to identify collision risk, dis is the distance of the autonomous vehicle from the center line of the lane, and d is half the width of the lane. If the autonomous vehicle is closer to the lead vehicle or is farther from the lane, the autonomous vehicle will receive a small safety benefit value.

For the prize value in terms of efficiency, the speed of the autonomous vehicle and the choice of lane change are used for calculation. The calculation formula is as follows:

where v represents the speed of the autonomous vehicle, v _m The maximum speed of the automatic driving vehicle is represented, D' represents the distance between the automatic driving vehicle and the front vehicle after lane change, and D represents the distance between the automatic driving vehicle and the front vehicle before lane change. The reward is calculated only when a lane change occurs in the autonomous vehicle. The prize will be a positive number when the autonomous vehicle changes from a crowded lane to an empty lane. Whereas the prize will be a negative number.

For the prize value in terms of comfort, the change in acceleration of the automated driving vehicle is used for calculation, the calculation formula of which is as follows:

wherein acc' and acc represent acceleration values of the automatic driving vehicle at the current time and the last time, respectively _m A threshold value representing acceleration of the autonomous vehicle.

For the prize value in terms of impact, the change in speed of the following vehicle of the autonomous vehicle is used for calculation, the calculation formula of which is as follows:

wherein Deltav represents the speed variation of the rear vehicle of the automatic driving vehicle, acc _m Representing the threshold value of the acceleration, Δt represents the time interval between two time steps.

In addition, if the automatic driving vehicle runs a red light, collides or exits the lane, it receives a prize value of-20.

The multi-process training module accelerates the training speed of the model and improves the training performance of the model through parallel training. Due to the slow rendering of the traffic environment and the insufficient exploration capability of the individual models. Multiple processes will be used to collect exploration experience in parallel. In the present invention, each exploration experience includes state information of the autopilot at the current time, an action value, a reward value, and state information of the next time. Furthermore, we will have a process to continually sample the explored experience and then update the parameters of the network model in our framework based on this.

The technical scheme of the invention is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the invention fall within the protection scope of the invention.

Claims

1. An automatic driving decision system based on multi-modal sensing and layering actions is characterized in that: the system comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module;

2. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the data processing module acquires traffic light information by using a camera, acquires surrounding vehicle information by using a laser radar, acquires road topology and lane information by using a high-precision map, acquires position and speed information of an automatic driving self by using a positioning technology and an inertia measurement unit, and finally, combines the information into state information of the automatic driving vehicle and inputs the state information into a reinforcement learning model.

3. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the state characterization module first composes the multi-modal information around the autopilot into three autopilot-centric star-shaped figures (G) _k-1 ，G _k ，G _k+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Each star map corresponds to multi-mode information on a lane; information from different modalities is aggregated using a cross-modality attention mechanism, taking multi-modality features on the current lane (lane k) of an automatic driving vehicle as an example, the attention score is calculated as follows:

wherein e _A Coding of feature vectors representing an autonomous vehicle e _l Encoding of feature vectors representing structural features of lane k, e _tl Coding of a feature vector representing a traffic light belonging to lane k, e _v The sum of codes, alpha, representing the feature vectors of a conventional vehicle located on lane k _k Attention score, W, representing characteristics of each modality ₁ And W is ₂ Parameters representing the linear layer encoding these feature vectors, D _f Representing the dimension of the feature vector, softmax represents the activation function before the attention score is obtained; these attention scores alpha are then applied _k Corresponding to the feature vector (e _l ，e _tl ，e _v ) Multiplying again to obtain state representation E of multi-mode information on lane k _k ：

E _k ＝α _k (W ₃ [e _l ，e _tl ，e _v ])

Wherein W is ₃ Parameters representing a linear layer aggregating the multimodal information;

then, the state representations of the multi-modal features on each lane are stitched together and a linear layer is used to calculate a representation E comprising multi-modal information on all lanes _s ：

4. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the actor commentator module first divides the decision of the automated driving vehicle into three high level decisions: lane changing to the left, lane keeping and lane changing to the right, then each high level decision requires calculation of an independent action value including steering angle value, throttle value and brake value; in an actor network, a linear layer is used to calculate action values X at three high level decisions simultaneously:

X＝Tanh(W ₅ E _s )

wherein W is ₅ Parameters representing the linear layer, E _s Representing a state characterization, tanh representing an activation function of the linear layer; after the actor network calculates the action values, a commentator network is used for calculating the Q value of each action; in reinforcement learning, the Q value represents a future expected return value for an action; the larger the Q value is, the better the performance of the action is; a linear layer is used to calculate Q values for three action values:

wherein W is ₆ Parameters representing the linear layer, E _s Representing state representation, E _a A representation of the action value X; after the action value and the Q value of each decision are obtained, the action value under the decision with the maximum Q value is selected as the action finally executed by the automatic driving vehicle.

5. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the hybrid bonus function of the hybrid bonus function module is divided into 4 parts: safety, efficiency, comfort and impact on surrounding vehicles;

for the bonus value in terms of safety, the collision risk of the autonomous vehicle and the distance from the lane center line are used for calculation, the calculation formula is as follows:

where ttc indicates how long it takes for the autonomous vehicle to collide with the preceding vehicle if it remains at the current speed value,is a threshold value for ttc to identify collision risk, dis is the distance of the automatic driving vehicle from the center line of the lane, and d is half the width of the lane;

for the prize value in efficiency, the speed and lane change selection calculation of the automatic driving vehicle is used, and the calculation formula is as follows:

where v represents the speed of the autonomous vehicle, v _m The maximum speed of the automatic driving vehicle is represented, D' represents the distance between the automatic driving vehicle and the front vehicle after lane change, and D represents the distance between the automatic driving vehicle and the front vehicle before lane change;

for the prize value in terms of comfort, a change calculation of the acceleration of the autonomous vehicle is used, the calculation formula of which is as follows:

wherein acc' and acc represent acceleration values of the automatic driving vehicle at the current time and the last time, respectively _m A threshold value representing acceleration of the automated driving vehicle;

the prize value of the influence on the surrounding vehicle is calculated using the speed change of the rear vehicle of the automatic driving vehicle, and the calculation formula is as follows:

6. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the multi-process training module collects exploration experiences in parallel using a plurality of processes, each exploration experience including state information of automatic driving at a current time, an action value, a reward value, and state information of a next time.