CN116476861A - Automatic driving decision system based on multi-mode sensing and layering actions - Google Patents

Automatic driving decision system based on multi-mode sensing and layering actions Download PDF

Info

Publication number
CN116476861A
CN116476861A CN202310191784.XA CN202310191784A CN116476861A CN 116476861 A CN116476861 A CN 116476861A CN 202310191784 A CN202310191784 A CN 202310191784A CN 116476861 A CN116476861 A CN 116476861A
Authority
CN
China
Prior art keywords
lane
automatic driving
value
vehicle
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310191784.XA
Other languages
Chinese (zh)
Inventor
郑凯
苏涵
刘顺程
夏宇阳
余泉霖
胡锐
许志
陈旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202310191784.XA priority Critical patent/CN116476861A/en
Publication of CN116476861A publication Critical patent/CN116476861A/en
Pending legal-status Critical Current

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses an automatic driving decision system based on multi-mode perception and layering actions, which comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module, wherein the data preprocessing module is used for preprocessing data of the data; the training speed of the model is increased and the training performance of the model is improved through parallel training. The invention has better applicability, and utilizes various sensors and deep learning algorithms to comprehensively sense the surrounding environment information, so that the automatic driving vehicle can better cope with some emergency situations, and the cross-mode aggregation model of the lane can better extract useful characteristics. The reinforcement learning model based on layered actions better optimizes the lane keeping behavior and lane changing behavior of the automatic driving vehicle. The safety, efficiency and comfort of each decision of the automatic driving vehicle are met, traffic fluctuation caused by abrupt speed change and lane change of the automatic driving vehicle is reduced, and the traffic efficiency and traffic stability of the whole traffic are improved.

Description

Automatic driving decision system based on multi-mode sensing and layering actions
Technical Field
The invention relates to the technical field of traffic, in particular to an automatic driving decision system based on multi-mode sensing and layering actions.
Background
Due to the development of urbanization and the popularization of private cars, many large cities around the world frequently suffer from traffic congestion. Generally, traffic congestion is mainly caused by road environment and drivers. Although traffic jam is easily caused in environmental factors such as road construction and converging lanes, researchers investigate and find that poor driving behaviors (such as sudden braking and forced lane change) of drivers are main factors causing traffic jam and even traffic accidents. This is because the driving performance of human driving is often limited by factors such as limited field of view and long response delays. Therefore, it is difficult for a human driver to maintain good driving performance at all times.
In the prior art, automatic driving is receiving extensive social attention. Automated driving vehicles use a variety of sensor technologies to acquire information about the surrounding environment and computer technology to calculate an ideal speed decision and lane change decision as compared to human drivers using eyes and brain to perceive the surrounding environment and make decisions. Conventional decision algorithms use rule matching and mathematical calculations to determine the speed and lane change of an autonomous vehicle. But this approach requires complex parameter tuning and is difficult to adapt to complex traffic scenarios. With the development of deep learning technology, researchers have utilized neural networks to mimic the driving behavior of human drivers. But deep learning models are prone to modeling errors and decisions by the human driver themselves are often not optimal. In view of the workflow of first acquiring surrounding information and then making decisions in an autopilot task, reinforcement learning seems to be a promising approach to solve autopilot problems. Many recent researchers have used reinforcement learning algorithms to optimize the decision making of an automated driving vehicle and achieved better decision performance than traditional algorithms and deep learning algorithms. However, the existing reinforcement learning decision frames have two drawbacks: poor environmental suitability and less optimization objectives. For the first drawback, most of the existing algorithms are experiments performed in a simple straight-path environment and assume that surrounding traffic information can be accurately acquired. Therefore, these algorithms cannot accommodate complex traffic scenes and cannot cope with uncertainty factors in real traffic scenes. Recently, some researchers began to conduct autopilot experiments on urban roads and perceived complex traffic information using cameras and lidar. However, acquiring only image data and point cloud data of the surrounding environment is not sufficient for the automated driving vehicle to calculate an ideal decision. Automated driving vehicles also need to acquire road information ahead and dynamic information of the own vehicle, such as the vehicle's direction and distance from the lane. With respect to the second limitation, existing algorithms primarily optimize the safety, efficiency and comfort of the autonomous vehicle with little consideration given to reducing the impact of the autonomous vehicle on other vehicles. In high density scenarios, a bad driving behaviour may trigger a chain reaction of the following vehicle, just like the "domino effect" propagating in the traffic flow, eventually leading to traffic congestion. Recently, a tree search-based decision framework searches for decisions that can reduce the impact on other vehicles through decision tree algorithms. However, this method ideally separates lane-change behavior into lane-change to the left, lane-keeping, and lane-change to the right, ignoring the intermediate states of lane-change behavior. This means that the autopilot cannot withdraw this lane in the middle of a lane change even if it would cause safety problems. For example, an autonomous vehicle may find a sudden acceleration of the vehicle behind the target lane during a lane change, which may result in a collision with the vehicle if the autonomous vehicle continues to perform the lane change.
Disclosure of Invention
The invention aims to provide an automatic driving decision system based on multi-mode sensing and layering actions.
In order to achieve the above purpose, the invention is implemented according to the following technical scheme:
the system comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module;
and a data processing module: acquiring state information of surrounding environment through a high-precision map and various sensors;
the state characterization module: extracting useful state information through a neural network and learning potential characterizations of the information;
actor commentator module: generating a decision of the automatic driving vehicle and a decision of evaluating the automatic driving vehicle respectively through an actor network model and a critique network model;
a hybrid bonus function module: guiding the reinforcement learning model to make good decision performance;
a multi-process training module: the training speed of the model is increased and the training performance of the model is improved through parallel training.
The beneficial effects of the invention are as follows:
the invention relates to an automatic driving decision system based on multi-mode sensing and layering actions, which has the following advantages compared with the prior art:
(1) The invention has better applicability, and applies various sensors and deep learning algorithms to comprehensively sense the surrounding environment information, so that the automatic driving vehicle can better cope with some emergency conditions, such as: sudden braking and forced plugging of surrounding vehicles.
(2) The invention provides a cross-mode aggregation model of lane division, so that an automatic driving vehicle can extract useful characteristics faster and better.
(3) The invention provides a reinforcement learning model based on layered actions, so that lane keeping behavior and lane changing behavior of an automatic driving vehicle are better optimized.
(4) The invention satisfies the safety, efficiency and comfort of each decision of the automatic driving vehicle, and greatly reduces the fluctuation of the traffic flow caused by abrupt speed change and lane change of the automatic driving vehicle, thereby improving the traffic efficiency and the traffic flow stability of the whole traffic flow.
Drawings
FIG. 1 is a diagram of the overall architecture of the system of the present invention;
FIG. 2 is a network block diagram of a state characterization module of the present invention;
fig. 3 is a block diagram of layered actions in the actor commentator module of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments, wherein the exemplary embodiments and descriptions of the invention are for purposes of illustration, but are not intended to be limiting.
As shown in fig. 1: the invention is divided into five parts: the system comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module.
The data preprocessing module acquires information of the surrounding environment through a high-precision map and various sensors. The method comprises the following steps: the method comprises the steps of obtaining traffic light information by using a camera, obtaining surrounding vehicle information by using a laser radar, obtaining road topology structure and lane information by using a high-precision map, and obtaining information such as the position and the speed of an automatic driver by using a positioning technology and an inertial measurement unit. Finally, these information are combined into state information of the automated driving vehicle and input into the reinforcement learning model.
The state characterization module extracts useful state information through the neural network and learns potential characterizations of the information. As shown in fig. 2, the multi-modal information (lanes, regular vehicles and traffic lights) around the autopilot is first assembled into three autopilot-centered star charts (G k-1 ,G k ,G k+1 ). Each star map corresponds to multi-modal information on a lane. To aggregate such multimodal information, a cross-modality attention mechanism is used to aggregate information from different modalities. Next, a method of calculating the attention score will be described taking as an example a multi-modal feature on the current lane (lane k) of the automated guided vehicle.
Wherein e A Coding of feature vectors representing an autonomous vehicle e l Encoding of feature vectors representing structural features of lane k, e tl Coding of a feature vector representing a traffic light belonging to lane k, e v The sum of codes, alpha, representing the feature vectors of a conventional vehicle located on lane k k Attention score, W, representing characteristics of each modality 1 And W is 2 Parameters representing the linear layer encoding these feature vectors, D f Representing the dimension of the feature vector, softmax represents the activation function before the attention score is obtained. These attention scores alpha are then applied k Corresponding to the feature vector (e l ,e tl ,e v ) Multiplying again to obtain state representation E of multi-mode information on lane k k
E k =α k (W 3 [e l ,e tl ,e v ])
Wherein W is 3 Representing parameters of a linear layer that aggregates these multimodal information.
The state representations of the multi-modal features on each lane are then stitched together and a linear layer is used to calculate a representation E comprising multi-modal information on all lanes s
Wherein W is 4 Parameters representing the linear layer, E k-1 Representation of multimodal information on lane k-1, E k Representation of multimodal information on lane k, E k+1 Representing a representation of the multimodal information on lane k+1.
The actor commentator module generates decisions for and evaluates the automated driving vehicle via an actor network model and a commentator network model, respectively. Because both lane-following decisions and lane-changing decisions require control of the steering angle of the autonomous vehicle, a layered structure is designed to decouple the different decisions. As shown in fig. 3, the decision of the automated driving vehicle is first divided into three high level decisions: lane change to the left, lane keeping and lane change to the right. Each high level decision then requires calculation of an independent action value, including steering angle value, throttle value and brake value. In an actor network, a linear layer is used to calculate the action value X at three high level decisions simultaneously:
X=TAnh(W 5 E s )
wherein W is 5 Parameters representing the linear layer, E s Representing the state characterization, TAnh represents the activation function of the linear layer. After the actor network has calculated the action values, the reviewer network will be used to calculate the Q value for each action.
In reinforcement learning, the Q value represents a future expected return value for an action. The larger the Q value, the better the performance of the action. Specifically: a linear layer is used to calculate Q values for three action values:
wherein W is 6 Parameters representing the linear layer, E s Representing state representation, E a Representing a representation of the action value X. After the action value and the Q value of each decision are obtained, the action value under the decision with the maximum Q value is selected as the action finally executed by the automatic driving vehicle.
The hybrid rewards function module is intended to guide the reinforcement learning model to make good decision performance. For an autopilot, an ideal decision first needs to meet safety, efficiency and comfort requirements. Second, the decision needs to minimize the impact on surrounding vehicles. Thus, the hybrid bonus function is divided into 4 parts: safety, efficiency, comfort and impact on surrounding vehicles.
For the prize value in terms of safety, the collision risk of the autonomous vehicle and the distance from the lane center line are used for calculation. The calculation formula is as follows:
where ttc indicates how long it takes for the autonomous vehicle to collide with the preceding vehicle if it remains at the current speed value,is a threshold value for ttc to identify collision risk, dis is the distance of the autonomous vehicle from the center line of the lane, and d is half the width of the lane. If the autonomous vehicle is closer to the lead vehicle or is farther from the lane, the autonomous vehicle will receive a small safety benefit value.
For the prize value in terms of efficiency, the speed of the autonomous vehicle and the choice of lane change are used for calculation. The calculation formula is as follows:
where v represents the speed of the autonomous vehicle, v m The maximum speed of the automatic driving vehicle is represented, D' represents the distance between the automatic driving vehicle and the front vehicle after lane change, and D represents the distance between the automatic driving vehicle and the front vehicle before lane change. The reward is calculated only when a lane change occurs in the autonomous vehicle. The prize will be a positive number when the autonomous vehicle changes from a crowded lane to an empty lane. Whereas the prize will be a negative number.
For the prize value in terms of comfort, the change in acceleration of the automated driving vehicle is used for calculation, the calculation formula of which is as follows:
wherein acc' and acc represent acceleration values of the automatic driving vehicle at the current time and the last time, respectively m A threshold value representing acceleration of the autonomous vehicle.
For the prize value in terms of impact, the change in speed of the following vehicle of the autonomous vehicle is used for calculation, the calculation formula of which is as follows:
wherein Deltav represents the speed variation of the rear vehicle of the automatic driving vehicle, acc m Representing the threshold value of the acceleration, Δt represents the time interval between two time steps.
In addition, if the automatic driving vehicle runs a red light, collides or exits the lane, it receives a prize value of-20.
The multi-process training module accelerates the training speed of the model and improves the training performance of the model through parallel training. Due to the slow rendering of the traffic environment and the insufficient exploration capability of the individual models. Multiple processes will be used to collect exploration experience in parallel. In the present invention, each exploration experience includes state information of the autopilot at the current time, an action value, a reward value, and state information of the next time. Furthermore, we will have a process to continually sample the explored experience and then update the parameters of the network model in our framework based on this.
The technical scheme of the invention is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the invention fall within the protection scope of the invention.

Claims (6)

1. An automatic driving decision system based on multi-modal sensing and layering actions is characterized in that: the system comprises a data preprocessing module, a state characterization module, an actor commentator module, a hybrid rewarding function module and a multi-process training module;
and a data processing module: acquiring state information of surrounding environment through a high-precision map and various sensors;
the state characterization module: extracting useful state information through a neural network and learning potential characterizations of the information;
actor commentator module: generating a decision of the automatic driving vehicle and a decision of evaluating the automatic driving vehicle respectively through an actor network model and a critique network model;
a hybrid bonus function module: guiding the reinforcement learning model to make good decision performance;
a multi-process training module: the training speed of the model is increased and the training performance of the model is improved through parallel training.
2. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the data processing module acquires traffic light information by using a camera, acquires surrounding vehicle information by using a laser radar, acquires road topology and lane information by using a high-precision map, acquires position and speed information of an automatic driving self by using a positioning technology and an inertia measurement unit, and finally, combines the information into state information of the automatic driving vehicle and inputs the state information into a reinforcement learning model.
3. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the state characterization module first composes the multi-modal information around the autopilot into three autopilot-centric star-shaped figures (G) k-1 ,G k ,G k+1 ) The method comprises the steps of carrying out a first treatment on the surface of the Each star map corresponds to multi-mode information on a lane; information from different modalities is aggregated using a cross-modality attention mechanism, taking multi-modality features on the current lane (lane k) of an automatic driving vehicle as an example, the attention score is calculated as follows:
wherein e A Coding of feature vectors representing an autonomous vehicle e l Encoding of feature vectors representing structural features of lane k, e tl Coding of a feature vector representing a traffic light belonging to lane k, e v The sum of codes, alpha, representing the feature vectors of a conventional vehicle located on lane k k Attention score, W, representing characteristics of each modality 1 And W is 2 Parameters representing the linear layer encoding these feature vectors, D f Representing the dimension of the feature vector, softmax represents the activation function before the attention score is obtained; these attention scores alpha are then applied k Corresponding to the feature vector (e l ,e tl ,e v ) Multiplying again to obtain state representation E of multi-mode information on lane k k
E k =α k (W 3 [e l ,e tl ,e v ])
Wherein W is 3 Parameters representing a linear layer aggregating the multimodal information;
then, the state representations of the multi-modal features on each lane are stitched together and a linear layer is used to calculate a representation E comprising multi-modal information on all lanes s
Wherein W is 4 Parameters representing the linear layer, E k-1 Representation of multimodal information on lane k-1, E k Representation of multimodal information on lane k, E k+1 Representing a representation of the multimodal information on lane k+1.
4. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the actor commentator module first divides the decision of the automated driving vehicle into three high level decisions: lane changing to the left, lane keeping and lane changing to the right, then each high level decision requires calculation of an independent action value including steering angle value, throttle value and brake value; in an actor network, a linear layer is used to calculate action values X at three high level decisions simultaneously:
X=Tanh(W 5 E s )
wherein W is 5 Parameters representing the linear layer, E s Representing a state characterization, tanh representing an activation function of the linear layer; after the actor network calculates the action values, a commentator network is used for calculating the Q value of each action; in reinforcement learning, the Q value represents a future expected return value for an action; the larger the Q value is, the better the performance of the action is; a linear layer is used to calculate Q values for three action values:
wherein W is 6 Parameters representing the linear layer, E s Representing state representation, E a A representation of the action value X; after the action value and the Q value of each decision are obtained, the action value under the decision with the maximum Q value is selected as the action finally executed by the automatic driving vehicle.
5. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the hybrid bonus function of the hybrid bonus function module is divided into 4 parts: safety, efficiency, comfort and impact on surrounding vehicles;
for the bonus value in terms of safety, the collision risk of the autonomous vehicle and the distance from the lane center line are used for calculation, the calculation formula is as follows:
where ttc indicates how long it takes for the autonomous vehicle to collide with the preceding vehicle if it remains at the current speed value,is a threshold value for ttc to identify collision risk, dis is the distance of the automatic driving vehicle from the center line of the lane, and d is half the width of the lane;
for the prize value in efficiency, the speed and lane change selection calculation of the automatic driving vehicle is used, and the calculation formula is as follows:
where v represents the speed of the autonomous vehicle, v m The maximum speed of the automatic driving vehicle is represented, D' represents the distance between the automatic driving vehicle and the front vehicle after lane change, and D represents the distance between the automatic driving vehicle and the front vehicle before lane change;
for the prize value in terms of comfort, a change calculation of the acceleration of the autonomous vehicle is used, the calculation formula of which is as follows:
wherein acc' and acc represent acceleration values of the automatic driving vehicle at the current time and the last time, respectively m A threshold value representing acceleration of the automated driving vehicle;
the prize value of the influence on the surrounding vehicle is calculated using the speed change of the rear vehicle of the automatic driving vehicle, and the calculation formula is as follows:
wherein Deltav represents the speed variation of the rear vehicle of the automatic driving vehicle, acc m Representing the threshold value of the acceleration, Δt represents the time interval between two time steps.
6. The multi-modal awareness and layered action based automatic driving decision system of claim 1 wherein: the multi-process training module collects exploration experiences in parallel using a plurality of processes, each exploration experience including state information of automatic driving at a current time, an action value, a reward value, and state information of a next time.
CN202310191784.XA 2023-03-02 2023-03-02 Automatic driving decision system based on multi-mode sensing and layering actions Pending CN116476861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310191784.XA CN116476861A (en) 2023-03-02 2023-03-02 Automatic driving decision system based on multi-mode sensing and layering actions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310191784.XA CN116476861A (en) 2023-03-02 2023-03-02 Automatic driving decision system based on multi-mode sensing and layering actions

Publications (1)

Publication Number Publication Date
CN116476861A true CN116476861A (en) 2023-07-25

Family

ID=87222146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310191784.XA Pending CN116476861A (en) 2023-03-02 2023-03-02 Automatic driving decision system based on multi-mode sensing and layering actions

Country Status (1)

Country Link
CN (1) CN116476861A (en)

Similar Documents

Publication Publication Date Title
CN110796856B (en) Vehicle lane change intention prediction method and training method of lane change intention prediction network
US11651240B2 (en) Object association for autonomous vehicles
CN108595823B (en) Autonomous main vehicle lane changing strategy calculation method combining driving style and game theory
Lee et al. Convolution neural network-based lane change intention prediction of surrounding vehicles for ACC
WO2022052406A1 (en) Automatic driving training method, apparatus and device, and medium
CN110562258B (en) Method for vehicle automatic lane change decision, vehicle-mounted equipment and storage medium
CN112133089B (en) Vehicle track prediction method, system and device based on surrounding environment and behavior intention
CN110843789B (en) Vehicle lane change intention prediction method based on time sequence convolution network
US20200189597A1 (en) Reinforcement learning based approach for sae level-4 automated lane change
CN113044064B (en) Vehicle self-adaptive automatic driving decision method and system based on meta reinforcement learning
CN111845754B (en) Decision prediction method of automatic driving vehicle based on edge calculation and crowd-sourcing algorithm
CN114312830A (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
WO2023231569A1 (en) Bayesian-game-based vehicle-road cooperative decision-making algorithm for lane changing behavior of autonomous vehicle
CN113743469A (en) Automatic driving decision-making method fusing multi-source data and comprehensive multi-dimensional indexes
CN116331221A (en) Driving assistance method, driving assistance device, electronic equipment and storage medium
WO2023230740A1 (en) Abnormal driving behavior identification method and device and vehicle
Chen et al. Platoon separation strategy optimization method based on deep cognition of a driver’s behavior at signalized intersections
CN116476861A (en) Automatic driving decision system based on multi-mode sensing and layering actions
Siboo et al. An Empirical Study of DDPG and PPO-Based Reinforcement Learning Algorithms for Autonomous Driving
Lv et al. A lane-changing decision-making model of bus entering considering bus priority based on GRU neural network
CN113553934B (en) Ground unmanned vehicle intelligent decision-making method and system based on deep reinforcement learning
CN113552883B (en) Ground unmanned vehicle autonomous driving method and system based on deep reinforcement learning
EP4219261A1 (en) Estimation of risk exposure for autonomous vehicles
EP4353560A1 (en) Vehicle control method and apparatus
EP4220581A1 (en) Estimation of risk exposure for autonomous vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination