CN113353102B - Unprotected left-turn driving control method based on deep reinforcement learning - Google Patents

Unprotected left-turn driving control method based on deep reinforcement learning Download PDF

Info

Publication number
CN113353102B
CN113353102B CN202110773027.4A CN202110773027A CN113353102B CN 113353102 B CN113353102 B CN 113353102B CN 202110773027 A CN202110773027 A CN 202110773027A CN 113353102 B CN113353102 B CN 113353102B
Authority
CN
China
Prior art keywords
function
fuzzy
deep
unprotected
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110773027.4A
Other languages
Chinese (zh)
Other versions
CN113353102A (en
Inventor
赵敏
孙棣华
陈进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202110773027.4A priority Critical patent/CN113353102B/en
Publication of CN113353102A publication Critical patent/CN113353102A/en
Application granted granted Critical
Publication of CN113353102B publication Critical patent/CN113353102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/043Architecture, e.g. interconnection topology based on fuzzy logic, fuzzy membership or fuzzy inference, e.g. adaptive neuro-fuzzy inference systems [ANFIS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Automation & Control Theory (AREA)
  • Mathematical Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Human Computer Interaction (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses an unprotected left-turn driving control method based on deep reinforcement learning, which comprises the following steps: 1. establishing a simulation and training environment, wherein the specific method comprises the following steps: 1) Constructing two identical closed road environment simulation scenes; 2) Setting proper simulation running time, and generating any number of unprotected LTAP/OD events; 3) Setting a plurality of straight vehicles and three left-turning vehicle candidate paths; 2. designing a reward function, and processing a complex unprotected LTAP/OD event by adopting the driving skill of a human driver; 3. designing a strategy structure, updating parameters of a deep convolution fuzzy system by using a learning algorithm, and searching an optimal value function; 4. and designing a learning algorithm, and improving training efficiency by adopting data of a human driver and a deep convolution fuzzy system algorithm. The combination of the driving skill of the human driver and the deep convolution fuzzy algorithm effectively improves the interpretability of the deep reinforcement learning algorithm, the error correction capability of the training efficiency and the traffic efficiency of the vehicle.

Description

Unprotected left-turn driving control method based on deep reinforcement learning
Technical Field
The invention belongs to the field of motion control of medium and high-grade automatic driving automobiles, and particularly relates to a method for training an unprotected left-turning control model for generating an automatic driving strategy.
Background
At a crossroad without traffic signals or other stop sign guidance, a straight-going vehicle (SDV) and a left-Turning Vehicle (TV) drive oppositely (LTAP/OD, figure 1), and an unprotected left-turning is efficiently and safely finished, so that the method has strong challenge for an automatic driving vehicle and is also suitable for human drivers. When the existing automatic driving automobile finishes unprotected left turning, the robustness of the algorithm is emphasized more, the manual customization rule is mainly adopted, the over-conservative strategy is often adopted, and the passing efficiency is low although the safety is ensured to a certain extent. Against experienced human drivers, they attempt to "negotiate" with a straight-ahead vehicle in the process of road right competition, primarily through body motions such as steering, braking, and acceleration, in an attempt to quickly complete a left turn.
In the aspect of research of simulating human driving strategies, a deep neural network-based reinforcement learning paradigm is commonly adopted in the industry, and a patent CN110824912B directly obtains available automatic driving strategies by using high-dimensional data; the patent CN112784485A discloses an automatic driving key scene generation method based on reinforcement learning; patent CN108009587B invented a method and apparatus for determining driving strategy based on reinforcement learning and rules. But due to the inexplicability of the deep neural network model, the training efficiency and the error correction capability of the model are greatly limited.
Disclosure of Invention
The invention aims to provide a reinforcement learning method based on a deep convolution fuzzy system, which is used for learning the driving skill of a human driver, improving the traffic efficiency and improving the interpretability of a deep reinforcement learning algorithm.
In order to achieve the above object, the technical scheme of the invention is as follows: an unprotected left-turn driving control method based on deep reinforcement learning is characterized by comprising the following steps:
step (1) establishes a simulation and training environment, and the specific method comprises the following steps:
(1.1) constructing two identical closed road environment simulation scenes;
(1.2) setting a proper simulation running time, and generating any number of unprotected LTAP/OD events;
(1.3) setting a plurality of straight-going vehicles (SDV) and three left-Turning Vehicle (TV) candidate paths;
designing a reward function, and processing a complex unprotected LTAP/OD event by adopting the driving skill of a human driver;
designing a strategy structure, updating parameters of a deep convolution fuzzy system by using a learning algorithm, and searching an optimal value function;
designing a learning algorithm, and improving training efficiency by adopting data of a human driver and a deep convolution fuzzy system algorithm, wherein the specific method comprises the following steps:
(4.1) setting a function Q for recording a learning algorithm;
(4.2) initializing function Q using data of a human driver;
(4.3) obtaining a new value of the function Q through deep convolution fuzzy system operation;
and (4.4) updating the value of the function Q by using deep reinforcement learning to obtain an optimal solution.
In step (1), the unprotected LTAP/OD events, each unprotected LTAP/OD event, are a deep reinforcement learning training round.
In step (2), the reward function functions as follows:
Figure BDA0003154585820000021
s is t The state of the environment at the moment t;
a is a t An action taken by the agent at time t;
c is mentioned 1 ,c 2 ,c 3 ,c 4 0 is its weight parameter, where c1=0.5, c2=4, c3=0.5, c4=4, the vehicle maximum speed limit is 17m/s ≈ 60km/h;
the | v TV -v SDV I is the absolute value of the speed difference between TV and SDV;
the described
Figure BDA0003154585820000022
Indicating the distance of the TV to the border of the collision area,
Figure BDA0003154585820000023
i.e. the first bonus function is active before the TV passes the conflict area and the second bonus function is active after the TV passes the conflict areaTV velocity v TV The larger the traffic efficiency is;
d represents the distance between the centers of gravity of the TV and the SDV, the larger the distance is, the smaller the collision risk is, and when D is less than or equal to 3.5m, a third reward function acts.
In step (2), the driving skills of the human driver include vehicle steering, braking, and accelerating body motion.
In step (3), the membership function of the fuzzy system is A 1 ,A 2 ,…A q The mathematical expression of the ith fuzzy subsystem of the ith layer is as follows:
Figure BDA0003154585820000031
Figure BDA0003154585820000032
input set corresponding to the fuzzy subsystem
Figure BDA0003154585820000033
Is selected from the input space of the l-th layer through a sliding window with width m and moving step length s, and the input of the l-th layer is composed of all the outputs of the l-1-th layer, wherein,
Figure BDA0003154585820000034
then fuzzy system
Figure BDA0003154585820000035
Can be represented by the following q m The bar fuzzy IF-THEN rule constitutes:
if it is not
Figure BDA0003154585820000036
Is composed of
Figure BDA0003154585820000037
And is
Figure BDA0003154585820000038
Is composed of
Figure BDA0003154585820000039
X is then
Figure BDA00031545858200000310
Is composed of
Figure BDA00031545858200000311
Said parameter
Figure BDA00031545858200000312
Is a fuzzy set
Figure BDA00031545858200000313
Is the core parameter of the deep convolution fuzzy system;
the value function based on the deep convolution fuzzy system comprises the following input and output data pairs according to the collected data: (x) 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ;y)=(x TV ,y TV ,v TV ,v SDV ,D,a SDV Action; value), 7 inputs and 1 output, where x TV For the transverse position, y, of a vehicle turning left under the geodetic coordinate system TV For the longitudinal position of a vehicle turning left under the geodetic coordinate system, v TV Speed of a left-turn vehicle, v SDV Is the speed of the straight-driving vehicle, D is the distance between the straight-driving and left-turning vehicles, a SDV The acceleration of the straight-ahead vehicle, the action is the control action taken by the intelligent agent, and the value is the value of the action value function; the deep convolution fuzzy system structure is divided into three layers, 9 fuzzy subsystems in total, wherein each fuzzy subsystem
Figure BDA00031545858200000314
There are 3 inputs, i.e. m =3, and the convolution window is shifted by a step size s =1.
The invention has the beneficial effects that: a deep convolution fuzzy system model is adopted, the non-linear mapping relation between input and output is simulated by utilizing the universal approximation characteristic of a fuzzy system, a high-dimensional input space is processed by adopting a layering mode and a convolution window, and dimension cursing is overcome. By adopting the driving skill of a human driver and the deep convolution fuzzy system algorithm, the interpretability of the deep reinforcement learning algorithm, the error correction capability of the training efficiency and the traffic efficiency of vehicles are improved.
Drawings
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings:
FIG. 1 is a schematic diagram of two cases of successful negotiation between a left-turn vehicle and a straight-ahead vehicle;
FIG. 2 is a schematic diagram of a traffic simulation scenario in Prescan: the method comprises the following steps of (a) schematically representing a double-loop scene, (b) schematically representing a multi-straight-going vehicle scene, and (c) schematically representing three candidate paths;
FIG. 3 is a schematic diagram of an ensemble structure of reinforcement learning based on a deep convolution fuzzy system;
FIG. 4 is a schematic diagram of membership functions of a fuzzy subsystem in a deep convolutional fuzzy system;
fig. 5 is a diagram of a learning paradigm based on a cost function.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Example 1: as shown in fig. 1 to 5, the invention provides an unprotected left-turn driving control method based on deep reinforcement learning, in order to create enough unprotected LTAP/OD events in one simulation, two identical closed road loop simulation scenes are constructed, as shown in fig. 2 a. It can be seen that after two vehicles pass through the target intersection of interest (the box location), they will return through the loop, forming a loop. By setting the appropriate simulation runtime, any number of unprotected LTAP/OD events can be obtained. For the training process of deep reinforcement learning, each unprotected LTAP/OD event becomes a round (Episode). Training and testing also requires a scenario of multiple straight-going vehicles (see fig. 2 b) and three candidate paths (see fig. 2 c).
It is desirable for the intelligence to be able to master human-like negotiation skills to handle complex unprotected LTAP/OD events. In particular, it is desirable for the TV to be able to safely complete a left turn while avoiding inefficiencies resulting from overly conservative driving decisions. The reward function is therefore as follows:
Figure BDA0003154585820000051
wherein, said s t Is the state of the environment at time t, said a t For the action taken by the agent at time t, in a first reward function, c 1 ,c 2 ,c 3 ,c 4 > 0 is its weight parameter, where c1=0.5, c2=4, c3=0.5, c4=4, the vehicle maximum speed limit is 17m/s ≈ 60km/h. First term | v TV -v SDV The greater the absolute value of the speed difference between the TV and the SDV, the more effective the "opposing vehicles" can be avoided, since a large speed difference means that the two vehicles do not accelerate or decelerate in synchronization. Second item
Figure BDA0003154585820000052
Represents the distance of the TV to the border of the collision area, and a smaller distance represents a higher efficiency of the TV. If it is not
Figure BDA0003154585820000053
I.e. before the TV passes through the conflict area, the first bonus function is active, and after the TV passes through the conflict area, the second bonus function is active for pursuing efficient traffic. For the third term, TV velocity v TV The larger, the largerThe higher the traffic efficiency. The last term is to ensure safety, D represents the distance between the centers of gravity of both the TV and SDV, the greater the distance, the less the risk of collision. When D is less than or equal to 3.5m, which means that two vehicles collide with each other, the third reward function is acted.
The value of each action is examined using a nonlinear function approximator (deep convolutional fuzzy system) as a Critic (criticic) and the action with the maximum value is selected. Therefore, the parameters of the deep convolution fuzzy system are updated by using a learning algorithm, and an optimal value function is searched.
The model structure of the deep convolution fuzzy system has the main modeling idea that the universal approximation characteristic of the fuzzy system is utilized to simulate the nonlinear mapping relation between input and output; a high-dimensional input space is processed by adopting a layering mode and a convolution window, and the problem of regular explosion caused by dimension cursing is solved.
As shown in fig. 3, in the DCFS-based value function, each square represents a fuzzy system. Membership function A 1 ,A 2 ,…A q As given by figure 4. The mathematical expression of the ith fuzzy subsystem of the ith layer is as follows:
Figure BDA0003154585820000054
Figure BDA0003154585820000061
input set corresponding to the fuzzy subsystem
Figure BDA0003154585820000062
Is selected from the input space of the l-th layer through a sliding window with width m and moving step length s, and the input of the l-th layer is composed of all the outputs of the l-1-th layer. Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003154585820000063
then fuzzy system
Figure BDA0003154585820000064
Can be represented by the following q m The bar fuzzy IF-THEN rule constitutes:
if it is not
Figure BDA0003154585820000065
Is composed of
Figure BDA0003154585820000066
And is
Figure BDA0003154585820000067
Is composed of
Figure BDA0003154585820000068
X is then
Figure BDA0003154585820000069
Is composed of
Figure BDA00031545858200000610
(3)
Wherein the parameters
Figure BDA00031545858200000611
Is a fuzzy set
Figure BDA00031545858200000612
The parameters are the core parameters of the deep convolution fuzzy system, and are designed by an online training algorithm introduced in the next step, so that the parameters of the deep convolution fuzzy system have clear physical meanings, which is the reason why the method has interpretability.
According to the collected data, for the value function based on the depth convolution fuzzy system, the input-output data pair is as follows: (x) 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ;y)=(x TV ,y TV ,v TV ,v SDV ,D,a SDV Action; value), a deep convolution fuzzy system structure composed of 7 inputs and 1 output is shown in fig. 5, wherein x TV For the lateral position, y, of a left-hand vehicle in a geodetic coordinate system TV As coordinates of the earthLongitudinal position of a left-turning vehicle, v TV Speed of a left-turn vehicle, v SDV Is the speed of the straight-driving vehicle, D is the distance between the straight-driving and left-turning vehicles, a SDV The action is the control action taken by the agent for the acceleration of the straight-ahead vehicle, and the value is the value of the action value function. In general, the system is divided into three layers, 9 fuzzy subsystems, wherein each fuzzy subsystem
Figure BDA00031545858200000613
There are 3 inputs, i.e. m =3, and the convolution window is shifted by a step size s =1.
To improve training efficiency, the Q function (i.e., action-state-value function) is initialized with data of a human driver, see pseudo code of algorithm 1 for details. And obtaining a Q function after the initial parameters of the depth convolution fuzzy system, and updating the parameters of the Q function by using an algorithm 2 to obtain an optimal solution.
Figure BDA0003154585820000071
Figure BDA0003154585820000081
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (4)

1. An unprotected left-turn driving control method based on deep reinforcement learning is characterized by comprising the following steps:
step (1) establishes a simulation and training environment, and the specific method comprises the following steps:
(1.1) constructing two identical closed road environment simulation scenes;
(1.2) setting a proper simulation running time, and generating any number of unprotected LTAP/OD events;
(1.3) setting a plurality of straight-going vehicles (SDV) and three left-Turning Vehicle (TV) candidate paths;
designing a reward function, and processing a complex unprotected LTAP/OD event by adopting the driving skill of a human driver;
designing a strategy structure, updating parameters of a deep convolution fuzzy system by using a learning algorithm, and searching an optimal value function;
designing a learning algorithm, and improving training efficiency by adopting data of a human driver and a deep convolution fuzzy system algorithm, wherein the specific method comprises the following steps:
(4.1) setting a function Q for recording a learning algorithm;
(4.2) initializing function Q using data of a human driver;
(4.3) obtaining a new value of the function Q through deep convolution fuzzy system operation;
(4.4) updating the value of the function Q by using deep reinforcement learning to obtain an optimal solution;
in step (2), the reward function functions are as follows:
Figure FDA0003851541720000011
s is t The state of the environment at the moment t;
a is a t An action taken by the agent at time t;
c is said 1 ,c 2 ,c 3 ,c 4 0 is its weight parameter, where c1=0.5, c2=4, c3=0.5, c4=4, the vehicle maximum speed limit is 17m/s ≈ 60km/h;
the | v TV -v SDV I is the absolute value of the speed difference between TV and SDV;
the above-mentioned
Figure FDA0003851541720000012
Indicating the distance of the TV to the border of the collision area,
Figure FDA0003851541720000013
i.e. the first bonus function is active before the TV passes the conflict area and the second bonus function is active after the TV passes the conflict area, the TV speed v TV The larger the traffic efficiency is;
d represents the distance between the centers of gravity of the TV and the SDV, the larger the distance is, the smaller the collision risk is, and when D is less than or equal to 3.5m, a third reward function acts.
2. The deep-reinforcement-learning-based unprotected left-turn driving control method according to claim 1, wherein in step (1), the unprotected LTAP/OD events are each an intensive learning training round.
3. The unprotected left-turn driving control method based on deep reinforcement learning of claim 1, wherein in step (2), the driving technique of the human driver comprises vehicle steering, braking and accelerating vehicle body action.
4. The unprotected left-turn driving control method based on deep reinforcement learning of claim 1, wherein in step (3), the fuzzy system membership function is A 1 ,A 2 ,…A q The mathematical expression of the ith fuzzy subsystem of the ith layer is as follows:
Figure FDA0003851541720000021
input set corresponding to the fuzzy subsystem
Figure FDA0003851541720000022
Is selected from the input space of the l-th layer through a sliding window with width m and moving step length s, and the input of the l-th layer is composed of all the outputs of the l-1-th layer, wherein,
Figure FDA0003851541720000023
then fuzzy system
Figure FDA0003851541720000024
Can be represented by the following q m The bar fuzzy IF-THEN rule constitutes:
if it is not
Figure FDA0003851541720000025
Is composed of
Figure FDA00038515417200000212
And is
Figure FDA0003851541720000026
Is composed of
Figure FDA0003851541720000027
Then x is
Figure FDA0003851541720000028
y is
Figure FDA0003851541720000029
Said parameter
Figure FDA00038515417200000210
Is a fuzzy set
Figure FDA00038515417200000211
Is the core parameter of the deep convolution fuzzy system;
the value function based on the deep convolution fuzzy system comprises the following input and output data pairs according to the collected data: (x) 1 ,x 2 ,x 3 ,x 4 ,x 5 ,x 6 ,x 7 ;y)=(x TV ,y TV ,v TV ,v SDV ,D,a SDV Action; value), 7 input and 1 output groupsA deep convolution fuzzy system structure where x TV For the transverse position, y, of a vehicle turning left under the geodetic coordinate system TV For the longitudinal position of a vehicle turning left under the geodetic coordinate system, v TV Speed of a left-turn vehicle, v SDV Is the speed of the straight-driving vehicle, D is the distance between the straight-driving and left-turning vehicles, a SDV The acceleration of the straight-ahead vehicle, the action is the control action taken by the intelligent agent, and the value is the value of the action value function; the deep convolution fuzzy system structure is divided into three layers, 9 fuzzy subsystems in total, wherein each fuzzy subsystem
Figure FDA0003851541720000031
There are 3 inputs, i.e. m =3, and the convolution window is shifted by a step size s =1.
CN202110773027.4A 2021-07-08 2021-07-08 Unprotected left-turn driving control method based on deep reinforcement learning Active CN113353102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110773027.4A CN113353102B (en) 2021-07-08 2021-07-08 Unprotected left-turn driving control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110773027.4A CN113353102B (en) 2021-07-08 2021-07-08 Unprotected left-turn driving control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113353102A CN113353102A (en) 2021-09-07
CN113353102B true CN113353102B (en) 2022-11-25

Family

ID=77539020

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110773027.4A Active CN113353102B (en) 2021-07-08 2021-07-08 Unprotected left-turn driving control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113353102B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114330109A (en) * 2021-12-14 2022-04-12 深圳先进技术研究院 Interpretability method and system of deep reinforcement learning model under unmanned scene

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537746A (en) * 2018-03-21 2018-09-14 华南理工大学 A kind of fuzzy variable method for blindly restoring image based on depth convolutional network
CN109709956A (en) * 2018-12-26 2019-05-03 同济大学 A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding
CN110647839A (en) * 2019-09-18 2020-01-03 深圳信息职业技术学院 Method and device for generating automatic driving strategy and computer readable storage medium
CN110659692A (en) * 2019-09-26 2020-01-07 重庆大学 Pathological image automatic labeling method based on reinforcement learning and deep neural network
CN111222630A (en) * 2020-01-17 2020-06-02 北京工业大学 Autonomous driving rule learning method based on deep reinforcement learning
CN111605565A (en) * 2020-05-08 2020-09-01 昆山小眼探索信息科技有限公司 Automatic driving behavior decision method based on deep reinforcement learning
CN112232490A (en) * 2020-10-26 2021-01-15 大连大学 Deep simulation reinforcement learning driving strategy training method based on vision
WO2021008798A1 (en) * 2019-07-12 2021-01-21 Elektrobit Automotive Gmbh Training of a convolutional neural network

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5648627A (en) * 1995-09-27 1997-07-15 Yamaha Corporation Musical performance control apparatus for processing a user's swing motion with fuzzy inference or a neural network
US7751713B2 (en) * 2007-01-19 2010-07-06 Infinera Corporation Communication network with skew path monitoring and adjustment
US10845815B2 (en) * 2018-07-27 2020-11-24 GM Global Technology Operations LLC Systems, methods and controllers for an autonomous vehicle that implement autonomous driver agents and driving policy learners for generating and improving policies based on collective driving experiences of the autonomous driver agents
US11733703B2 (en) * 2019-01-30 2023-08-22 Perceptive Automata, Inc. Automatic braking of autonomous vehicles using machine learning based prediction of behavior of a traffic entity
US11074480B2 (en) * 2019-01-31 2021-07-27 StradVision, Inc. Learning method and learning device for supporting reinforcement learning by using human driving data as training data to thereby perform personalized path planning
KR20200135630A (en) * 2019-05-23 2020-12-03 현대자동차주식회사 Apparatus and method for controlling an autonomous vehicle
US20200372410A1 (en) * 2019-05-23 2020-11-26 Uber Technologies, Inc. Model based reinforcement learning based on generalized hidden parameter markov decision processes
JP2022536030A (en) * 2019-06-03 2022-08-12 エヌビディア コーポレーション Multiple Object Tracking Using Correlation Filters in Video Analytics Applications
EP3832420B1 (en) * 2019-12-06 2024-02-07 Elektrobit Automotive GmbH Deep learning based motion control of a group of autonomous vehicles
CN111462019A (en) * 2020-04-20 2020-07-28 武汉大学 Image deblurring method and system based on deep neural network parameter estimation
CN112464820A (en) * 2020-11-30 2021-03-09 江苏金鑫信息技术有限公司 Intelligent identification method for unmanned vehicle

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108537746A (en) * 2018-03-21 2018-09-14 华南理工大学 A kind of fuzzy variable method for blindly restoring image based on depth convolutional network
CN109709956A (en) * 2018-12-26 2019-05-03 同济大学 A kind of automatic driving vehicle speed control multiple-objection optimization with algorithm of speeding
WO2021008798A1 (en) * 2019-07-12 2021-01-21 Elektrobit Automotive Gmbh Training of a convolutional neural network
CN110647839A (en) * 2019-09-18 2020-01-03 深圳信息职业技术学院 Method and device for generating automatic driving strategy and computer readable storage medium
CN110659692A (en) * 2019-09-26 2020-01-07 重庆大学 Pathological image automatic labeling method based on reinforcement learning and deep neural network
CN111222630A (en) * 2020-01-17 2020-06-02 北京工业大学 Autonomous driving rule learning method based on deep reinforcement learning
CN111605565A (en) * 2020-05-08 2020-09-01 昆山小眼探索信息科技有限公司 Automatic driving behavior decision method based on deep reinforcement learning
CN112232490A (en) * 2020-10-26 2021-01-15 大连大学 Deep simulation reinforcement learning driving strategy training method based on vision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Min Zhao等.DCFS-base deep learning supervisory control for modeling lane keeping of expert drivers.《Physica A:Staticstical Mechanics and Applications》.2021,第567卷 *
吕迪等.融合类人驾驶行为的无人驾驶深度强化学习方法.《集成技术》.2020,(第05期),36-49. *
陈德旺等.面向可解释性人工智能与大数据的模糊系统发展展望.《智能科学与技术学报》.2019,(第04期),12-19. *

Also Published As

Publication number Publication date
CN113353102A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
Mo et al. Safe reinforcement learning for autonomous vehicle using monte carlo tree search
CN114013443B (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
CN111679660B (en) Unmanned deep reinforcement learning method integrating human-like driving behaviors
Huang et al. Collision-probability-aware human-machine cooperative planning for safe automated driving
CN110956851B (en) Intelligent networking automobile cooperative scheduling lane changing method
CN112183288B (en) Multi-agent reinforcement learning method based on model
CN113581182A (en) Method and system for planning track change of automatic driving vehicle based on reinforcement learning
CN116679719A (en) Unmanned vehicle self-adaptive path planning method based on dynamic window method and near-end strategy
Yuan et al. Multi-reward architecture based reinforcement learning for highway driving policies
CN113511222A (en) Scene self-adaptive vehicle interactive behavior decision and prediction method and device
CN113353102B (en) Unprotected left-turn driving control method based on deep reinforcement learning
Al-Sharman et al. Self-learned autonomous driving at unsignalized intersections: A hierarchical reinforced learning approach for feasible decision-making
CN117227755A (en) Automatic driving decision method and system based on reinforcement learning under complex traffic scene
CN111824182A (en) Three-axis heavy vehicle self-adaptive cruise control algorithm based on deep reinforcement learning
Hu et al. A rear anti-collision decision-making methodology based on deep reinforcement learning for autonomous commercial vehicles
CN113033902B (en) Automatic driving lane change track planning method based on improved deep learning
CN113110359A (en) Online training method and device for constraint type intelligent automobile autonomous decision system
Chen et al. Attention-based highway safety planner for autonomous driving via deep reinforcement learning
Sukthankar et al. Adaptive intelligent vehicle modules for tactical driving
CN116127853A (en) Unmanned driving overtaking decision method based on DDPG (distributed data base) with time sequence information fused
CN114701517A (en) Multi-target complex traffic scene automatic driving solution based on reinforcement learning
DE102022109385A1 (en) Reward feature for vehicles
Yu et al. Lane change decision-making of autonomous driving based on interpretable Soft Actor-Critic algorithm with safety awareness
Wang et al. An end-to-end deep reinforcement learning model based on proximal policy optimization algorithm for autonomous driving of off-road vehicle
CN118269929B (en) Longitudinal and transverse control method and device for automatic driving automobile

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant