CN112644516A - Unmanned control system and control method suitable for roundabout scene - Google Patents

Unmanned control system and control method suitable for roundabout scene Download PDF

Info

Publication number
CN112644516A
CN112644516A CN202011482837.6A CN202011482837A CN112644516A CN 112644516 A CN112644516 A CN 112644516A CN 202011482837 A CN202011482837 A CN 202011482837A CN 112644516 A CN112644516 A CN 112644516A
Authority
CN
China
Prior art keywords
vehicle
lane
time
driving
relative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011482837.6A
Other languages
Chinese (zh)
Other versions
CN112644516B (en
Inventor
张羽翔
李鑫
丛岩峰
王玉海
高炳钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Automotive Research Institute Jilin University
Jilin University
Original Assignee
Qingdao Automotive Research Institute Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Automotive Research Institute Jilin University filed Critical Qingdao Automotive Research Institute Jilin University
Priority to CN202011482837.6A priority Critical patent/CN112644516B/en
Publication of CN112644516A publication Critical patent/CN112644516A/en
Application granted granted Critical
Publication of CN112644516B publication Critical patent/CN112644516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W2050/0001Details of the control system
    • B60W2050/0019Control system elements or transfer functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

The invention discloses an unmanned control system and a control method suitable for a roundabout scene, wherein a perception cognition module of the control system is used for obtaining the running state information of a current vehicle and an environmental vehicle and carrying out signal processing; the driving control module is used for learning appropriate decision parameter values; the track control module is used for obtaining a feasible track after optimized planning; the method belongs to the technical field of automatic driving, and relates to a driving decision method based on a reinforcement learning method for designing, wherein the reinforcement learning state and action are specially designed according to driving decision characteristics, and a network framework of an Actor-Critic framework of the reinforcement learning Actor-Critic framework is optimized, so that the decision method can be better suitable for the driving decision of a roundabout unmanned scene.

Description

Unmanned control system and control method suitable for roundabout scene
Technical Field
The invention relates to the technical field of unmanned driving, in particular to an unmanned driving control system and method suitable for a rotary island scene.
Background
Because of the interactive learning mode, the reinforcement learning method is increasingly applied to the driving decision of the unmanned vehicle. The application scene of research is expanded from a highway with relatively simple road conditions to a relatively complex roundabout scene. For a roundabout scene, a driving task is relatively complex, and the intelligent vehicle needs to consider multiple factors at the same time to make more detailed driving decisions. The driving distance of the inner lane is shorter, the traffic efficiency is higher, and the vehicle can only drive in or out from the outermost lane. Therefore, the driving strategy of the vehicle needs to be on different road sections in the roundabout, different decision strategies are considered, and balance is achieved between high traffic efficiency and the driving task of driving in and out. Based on the requirements of the practical problems, for the driving decision problem in the roundabout scene, when a reinforcement learning algorithm is used, special design needs to be carried out on states and actions according to the driving decision characteristics, and a network framework of an Actor of a reinforcement learning Actor-critical framework is optimized, so that the decision method can be better suitable for the driving decision problem in the roundabout unmanned scene.
Disclosure of Invention
The invention provides an unmanned control system and a control method suitable for a roundabout scene, which are specially designed for a reinforcement state and an action according to driving decision characteristics aiming at the driving requirement of the roundabout driving scene, and optimize a network framework of an Actor-critical framework for reinforcement learning, so that the decision method can be better suitable for the driving decision problem of the roundabout unmanned scene.
The invention provides an unmanned control system suitable for a roundabout scene, which comprises a perception and cognition module, a driving control module and a track control module;
the perception cognition module is used for acquiring the running state information of the current vehicle and the environmental vehicle and processing signals;
the driving control module is used for learning appropriate decision parameter values;
and the track control module is used for obtaining the feasible track after the optimization planning.
Another aspect of the present invention provides an unmanned control method for a roundabout scene, which is implemented by an unmanned control system for a roundabout scene according to an aspect of the present invention, comprising the following steps,
step one, designing states and actions in a Markov driving decision process;
the driving decision is modeled into a Markov decision process based on a reinforcement learning method, and comprises a state vector S for representing factors influencing the driving decision of the intelligent agent and a design of an action vector A for enhancing the refined decision of the decision intelligence of the intelligent agent;
step two, designing a network framework of the Actor;
in a reinforcement learning Actor-criticic framework, an Actor selects an action according to a state vector, namely representing a driving decision; the state vector comprises two parts of an environment characterization part and a task characterization part; through the redesign of the network framework of the Actor, the state vector has different strategies in different stages, and different dimensions of the balance environment characterization and the task characterization are achieved, so that the driving environment under different conditions can be accurately identified and the driving task can be accurately completed when the intelligent vehicle runs in the roundabout;
step three, designing a return function;
the agent selects an action A in the environment according to the state vector S to obtain a return signal, and updates the strategy according to the return signal.
The invention relates to an unmanned control method suitable for a rotary island scene, further comprising the following steps in the Markov driving decision process state and action design of the first step,
firstly, designing a state variable;
the state variables are used for action selection and value function estimation in a reinforcement learning algorithm, and the state variables comprise state variable designs of an Environment Representation (ER) related to the relative state of the vehicle and the surrounding vehicle and a Task Representation (TR) related to a vehicle driving task, wherein the environment representation is used for an intelligent agent to complete safety decision, and the task representation is used for the intelligent agent to complete the driving task;
secondly, designing action variables;
taking multi-layer driving behaviors into consideration at a decision layer; the motion vector A representing the driving decision of the vehicle comprises discrete macroscopic driving behaviors which are the lateral deviation T of the terminal relative to the central line of the vehicle channelyAnd continuous micro-and meso-driving behavior, anticipating acceleration a for adding decision variablestarTime of action ta(ii) a Lateral offset T of terminal relative to central line of laneyBelongs to E { -L,0, L }, and respectively represents left lane changing, lane keeping and right lane changing; l is the distance between two adjacent lanes; then by motion vector a ═ Ty,atar,ta)TAnd comprehensively representing the driving decision, and inputting the driving decision as an input variable into a lower track planning layer and a vehicle control layer.
The unmanned control method suitable for the roundabout scene further comprises the steps of designing a state variable in the first step; for environmental characterization, in the roundabout, a part of the vehicles in the week is adjacent to the vehicle, and the vehicles are vehicles which are in direct contact interaction and need attention; the position of these vehicles is P1,P2,....,P7(ii) a Relative lanes Δ L of the vehicle at these positions at time kn(k) Relative velocity Δ vn(k) Acceleration an(k) Relative distance dn(k) Intention of driving In(k) Considered in the environment characterization, the subscript n corresponds to the position number PnVehicle information of (d); here relative lane Δ Ln(k) By Δ Ln(k)=Ln(k)-Lh(k) Is calculated to obtain wherein Ln(k),Lh(k) Respectively at time point P of knA lane of a host vehicle and a lane of the host vehicle; relative velocity Δ vn(k) By Δ vn(k)=vn(k)-vh(k) Is calculated to obtain wherein vn(k),vh(k) Respectively at time point P of knThe speed of the host vehicle and the speed of the host vehicle; driving intention In(k) E { -1,0,1} represents time P of knThe vehicle has the intention of changing lanes left, keeping lanes and changing lanes right; meanwhile, a human driver makes a decision according to the state of surrounding vehicles and selects a smooth lane according to traffic flow information on a certain lane, so that traffic jam is reducedThe probability of a pause; near forward and backward traffic, e.g. position P8,P9,....,P12As another part of the environmental characterization; position P8,P9,....,P12The state of (A) is determined by the average relative speed of the traffic at time k
Figure RE-GDA0002959541870000031
Average headway
Figure RE-GDA0002959541870000032
And (4) showing. Here k time PnThe time interval between the front vehicle and the vehicle j is THn,j(k)=dn,j(k)/vn,j(k) Wherein d isn,j(k),vn,j(k) The relative distance between the vehicle j and the front vehicle at the moment k and the vehicle speed of the vehicle j are respectively; then k time, position P1,P2,....,P7At each position PnIs expressed by equation (1),
SPn(k)=(Fn(k),ΔLn(k),Δvn(k),an(k),dn(k),In(k))T, (1)
wherein FnE {1,0} indicates whether the corresponding location is a viable lane; time k, position P8,P9,....,P12The state variable at the state variable is expressed as equation (2),
Figure RE-GDA0002959541870000033
then at time k, the Environment Representation (ER) is expressed as equation (3),
Figure RE-GDA0002959541870000034
for the task representation, in the roundabout, the driving control module finishes a set driving task in route navigation planning, so that the intelligent vehicle drives in the roundabout from a certain entrance and then drives out from another exit; then at time k, the host vehicle is opposite to the exitRelative longitudinal distance Δ lh(k) And relative lane Δ Lh(k) In task characterization; relative longitudinal distance Deltal of the vehicle relative to the exith(k) Represented by the formula (4),
Figure RE-GDA0002959541870000035
wherein Δ αh(k),DE,Dh(k),αEh(k) The central angles corresponding to the central angle of the vehicle at the moment k relative to the exit position E, the diameter of a lane where the vehicle is located at the moment k, the exit position E and the position of the vehicle at the moment k are respectively the central angles; relative lane Δ Lh(k)=LE-Lh(k) Wherein L isE,Lh(k) Respectively as an exit position E and a lane where the vehicle k is located at the moment; then at time k, the task characterization (TR) is expressed as equation (5),
STR(k)=(Δlh(k),ΔLh(k))T. (5)
the state vector S is then jointly characterized using the environmental characterization and task characterization of the above design.
The unmanned control method suitable for the roundabout scene further comprises the step three of returning r according to the safety in the return function designsTasking reward rtExecutive reporting reThree layers; time k security report rs(k) According to the own lane Lh(k) And a target lane Ltar(k)=Lh(k)+sign(Ty(k) Distance of the vehicle from the host vehicle, where sign (T)y(k) Left and right lane changing actions selected by the vehicle at the time k; also including vehicles that will switch into both lanes in the future 5S; when the terminal is laterally offset T relative to the center line of the laney(k) When the vehicle speed is 0, the vehicle performs a lane keeping operation, and only the front part P of the vehicle4Vehicle considerations of location; when the terminal is laterally offset T relative to the center line of the laney(k) If < 0, P is considered1,P2,P3,P4Four positionsThe vehicle of (1). Suppose a position P at time knAt a distance d from the host vehicle in the lane directionn(k) Then the security at this moment is reported rs(k) Can be incrementally calculated as equation (6),
Figure RE-GDA0002959541870000041
wherein d iseIs a dangerous distance, dcIs the collision distance;
time k mission-specific reporting rt(k) The calculation is carried out from the following three aspects, the first aspect is the final completion situation of the intelligent vehicle for the driving task of going out of the roundabout, the incremental calculation is the formula (7),
Figure RE-GDA0002959541870000042
wherein | Δ lh(k)|=|(αEh(k))DEI is the longitudinal distance of the vehicle from the exit E on the lane, alphaEh(k),DEThe central angle of the vehicle relative to the exit position E at the time of the exit position E and k, and the diameter of the lane where the exit position E is located. Relative lane Δ Lh(k)=LE-Lh(k),LE,Lh(k) The exit position E and the time k are the lanes where the vehicle is located;
the second aspect is related to the decision of different positions of the intelligent vehicle, and due to the fact that the inner lane has higher traffic efficiency, the vehicle tends to select the inner lane to pass through the rotary island faster, and then the expected relative lane delta L at the moment k isexp(k) The calculation is as in equation (8),
Figure RE-GDA0002959541870000043
wherein alpha isElcRespectively the exit position E and the central angle required for completing one lane change operation,
Figure RE-GDA0002959541870000044
for rounding-down the sign of the operation, the relative lane Δ Lh(k)=LE-Lh(k),LE,Lh(k) The exit position E and the time k are the lanes where the vehicle is located; then another portion of the tasking return at time k rt(k) The incremental calculation is equation (9),
Figure RE-GDA0002959541870000045
wherein, Δ Lexp(k) Desired relative lane at time k, Ty(k) Is the lateral offset of the terminal relative to the central line of the lane; meanwhile, when the vehicle selects the lane change decision-making behavior, the target lane L is processedtar(k) And the road Lh(k) Comparing the front vehicle with the traffic flow conditions; assume that the preceding vehicle requiring comparison is position P1,P4If the traffic flow condition to be compared is P8,P9The reward is calculated as equations (10a), (10b), (10c) and (10d),
Figure RE-GDA0002959541870000046
Figure RE-GDA0002959541870000051
Figure RE-GDA0002959541870000052
Figure RE-GDA0002959541870000053
wherein v is1(k),v4(k),TH1(k),TH4(k),d1(k),d4(k),
Figure RE-GDA0002959541870000054
Respectively, at time k1,P4Speed, headway from host vehicle, longitudinal distance, and k time position P8,P9Average time interval of traffic flow of (1);
the last part of the mission-based reward r at time kt(k) The incremental calculation is equation (11),
rt(t)=rt(t)+k1rt,1+k2rt,2+k3rt,3+k4rt,4 (11)
wherein k is1,k2,k3,k4Are parameters respectively;
finally, an executive report r is given at time ke(k) As shown in formula (12),
Figure RE-GDA0002959541870000055
wherein k is5,k6Are respectively a parameter, LTIs the total number of lanes in the roundabout, Lh(k) Time k of the vehicle lane, Ty(k) Is the lateral offset of the terminal relative to the central line of the lane;
finally, the time k returns r (k) as the formula (13),
r(t)=rs(t)+rt(t)+re(t) (13)
wherein r iss(t),rt(t),re(t) security returns r at time k, respectivelys(k) Tasking reward rt(k) Executive reporting re(k)。
The unmanned control system and the unmanned control method suitable for the roundabout scene can achieve the following beneficial effects:
the unmanned control system and the unmanned control method suitable for the roundabout scene have the following advantages: (1) considering an Environment Representation (ER) related to the relative state of the vehicle and the surrounding vehicle and a Task Representation (TR) related to a vehicle driving task aiming at the driving requirement of the roundabout driving scene so as to better adapt to the driving decision problem of the roundabout unmanned driving scene; (2) the method is based on refined driving decision requirements, and the decided action vector simultaneously comprises a discrete variable pointing to macro driving behavior of lane changing and a continuous variable pointing to micro driving behavior of lane changing, so that better system performance is realized; (3) according to different characteristics and characteristics of an environment characterization (ER) and a task characterization (TR), an Actor network framework of an Actor-critical framework established by a reinforcement learning decision algorithm is specially designed to balance the dimension difference of the two characterization modes; and (4) the return function is designed by considering the performance indexes of safety return, mission return and executive return, so that the intelligent agent can effectively learn to obtain the driving strategy.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a block diagram of an unmanned control system suitable for a roundabout scene according to the present invention;
FIG. 2 is a schematic view of a vehicle and its surroundings;
fig. 3 is a network configuration diagram of an Actor.
In the figure, 1 is an environment characterization, and 2 is a task characterization.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.
Example 1
The unmanned control system suitable for the roundabout scene, as shown in fig. 1, comprises a perception and cognition module, a driving control module and a track control module;
the perception cognition module is used for acquiring the running state information of the current vehicle and the environmental vehicle and processing signals;
the driving control module is used for learning appropriate decision parameter values;
and the track control module is used for obtaining the feasible track after the optimization planning.
Example 2
The unmanned control method suitable for the roundabout scene is realized by the unmanned control system suitable for the roundabout scene as described in embodiment 1, and comprises the following steps,
step one, designing states and actions in a Markov driving decision process;
the driving decision may be modeled as a markov decision process based on a reinforcement learning approach. The method comprises a state vector S representing factors influencing intelligent driving decision factors, and can enhance the design of an action vector A for the refined decision of intelligent decision making intelligence of the intelligent agent. The specific method comprises the following steps:
the first step, the state variable design,
the state variables are used for action selection and value function estimation in the reinforcement learning algorithm, so that the relation between the current state of the intelligent agent and the environment and the characteristics among tasks required to be completed by the current state of the intelligent agent can be accurately represented in the design of the state variables, the sensitivity of the intelligent agent to the environment and the state of the intelligent agent can be improved, the intelligent agent can be helped to reasonably act in the changing environment, and the learning process can be more effective. Meanwhile, the efficiency of the learning algorithm and the learning result are not only related to the design of the return function, but also have a certain degree of relation with the design of the state variable.
In the design of the state variables, the present embodiment considers two parts of the state variable design, an environment representation 1 related to the relative state of the vehicle and the surrounding vehicle, and a task representation 2 related to the driving task of the vehicle. The environment representation 1 can help the intelligent agent to complete safety decision, and the task representation 2 helps the intelligent agent to smoothly complete driving task.
For environmental characterization 1, in the roundabout, the week vehicle can be divided into two parts as shown in fig. 2, and the numbering is as shown in the figure. P1-P7The range of different positions is shown in table 1.
Table 1:
position of Range Position of Range
P4 THn∈[0,3] P1,P5 dn∈[10,40]
P2,P6 dn∈[-10,10] P3,P7 dn∈[-40,-10]
P8,P9,P10 dn<40 P11,P12 dn>-40
TH in Table 1n(k)=dn(k)/vh(k),THn(k),dn(k),vh(k) Respectively at time point P of knThe location is relative to the headway, relative distance of the host vehicle, and the speed of the host vehicle.
Some of the vehicles shown in fig. 2, which are adjacent to the host vehicle, may interact with each other in direct contact and require close attention. The position of these vehicles is P1,P2,....,P7. Relative lanes Δ L of the vehicle at these positions at time kn(k) Relative velocity Δ vn(k) Acceleration an(k) Relative distance dn(k) Intention of driving In(k) Considered in the environment representation 1, the subscript n corresponds to the position number PnThe vehicle information of (c). Here relative lane Δ Ln(k) By Δ Ln(k)=Ln(k)-Lh(k) Is calculated to obtain wherein Ln(k),Lh(k) Respectively at time point P of knThe lane of the host vehicle and the lane of the host vehicle. Relative velocity Δ vn(k) By Δ vn(k)=vn(k)-vh(k) Is calculated to obtain wherein vn(k),vh(k) Respectively at time point P of knThe speed of the vehicle and the speed of the vehicle. Driving intention In(k) E { -1,0,1} represents time P of knThe vehicle has the intention to change lanes left, to keep lanes, and to change lanes right. Meanwhile, a human driver can make a decision according to the state of surrounding vehicles and also consider traffic flow information on a certain lane, and if a smooth lane is selected, the probability of traffic jam and pause can be reduced. Thus, near forward and backward traffic, e.g. position P8,P9,....,P12And the environment as another part is characterized 1. Position P8,P9,....,P12The state of (A) is determined by the average relative speed of the traffic at time k
Figure RE-GDA0002959541870000074
Average headTime distance
Figure RE-GDA0002959541870000071
And (4) showing. Here k time PnThe time interval between the front vehicle and the vehicle j is THn,j(k)=dn,j(k)/vn,j(k) Wherein d isn,j(k),vn,j(k) The relative distance between the vehicle j and the front vehicle at the moment k and the vehicle speed of the vehicle j are respectively. From the above, time k, position P1,P2,....,P7At each position PnCan be expressed as equation (1),
SPn(k)=(Fn(k),ΔLn(k),Δvn(k),an(k),dn(k),In(k))T, (1)
wherein FnE 1,0 indicates whether the corresponding location is a feasible lane. Time k, position P8,P9,....,P12The state variable at the state variable may be represented by equation (2),
Figure RE-GDA0002959541870000072
therefore, at time k, the environment characterization 1 can be expressed as formula (3),
Figure RE-GDA0002959541870000073
for the task characterization 2, in the roundabout, the driving decision module needs to complete a specific driving task in the route navigation planning, that is, the intelligent vehicle enters the roundabout from a certain entrance and then exits from another exit. Thus, at time k, the relative longitudinal distance Δ l of the host vehicle with respect to the exith(k) And relative lane Δ Lh(k) Are considered in task characterization 2. Relative longitudinal distance Deltal of the vehicle relative to the exith(k) Can be represented by the formula (4),
Figure RE-GDA0002959541870000081
wherein Δ αh(k),DE,Dh(k),αEh(k) The central angle of the vehicle at the time k with respect to the exit position E, the diameters of the exit position E and the lane where the vehicle is located at the time k, and the central angles of the exit position E and the vehicle at the time k are respectively corresponding to the central angles. Relative lane Δ Lh(k)=LE-Lh(k) Wherein L isE,Lh(k) The exit position E and the lane where the host vehicle k is located at the moment are respectively. Therefore, at time k, task representation 2 can be expressed as equation (5),
STR(k)=(Δlh(k),ΔLh(k))T. (5)
finally, the state vector S is jointly characterized using the environment characterization 1 and the task characterization 2 of the above design.
The second step, the design of the action variables,
the refined driving decision should consider more driving behaviors in the decision layer. The motion vector A representing the driving decision of the vehicle comprises discrete macroscopic driving behaviors, namely the lateral deviation T of the terminal relative to the central line of the vehicle channelyAnd continuous microscopic driving behavior, i.e. adding a decision variable to the desired acceleration atarTime of action ta. Lateral offset T of terminal relative to central line of laneyAnd e { -L,0, L }, which respectively represent a left lane change, a lane keeping and a right lane change. And L is the distance between two adjacent lanes. Final use motion vector a ═ Ty,atar,ta)TAnd comprehensively representing a more refined driving decision, and inputting the driving decision as an input variable into a lower track planning layer and a vehicle control layer. In particular, when the motion vector a takes different values, it can be described as different driving behaviors as shown in table 2.
TABLE 2
(Ty,atar,ta)T Description of the invention
(-L,0.5,4)T Accelerating left lane change in a gentle manner
(-L,1,1)T Accelerated lane keeping
(-L,-1,1)T Deceleration lane keeping
(L,0,2)T Speed-keeping fast right lane change
Step two, designing a network framework of the Actor;
the reinforcement learning decision algorithm of the embodiment is built on an Actor-Critic framework. In a reinforcement learning Actor-critical framework, an Actor selects an action according to a state vector, namely, a driving decision is represented. The state vector considered by this patent contains two parts, environment characterization 1 and task characterization 2. These two parts have equal effect in driving decision. For example, when the intelligent vehicle enters a lane change scene, the intelligent agent has more freedom to select actions with higher returns, for example, entering an inner lane or a lane with sparse traffic flow to obtain higher traffic efficiency, and when approaching an exit of the roundabout, the intelligent agent should change the lane outside the lane as much as possible so as to smoothly leave the roundabout from a given exit. These cases cause the state vector to have different policies at different stages. As described in step 21), the dimension of the environment characterization 1 is 52 and the dimension of the task characterization 2 is 2. Such dimension differences can make it difficult for a few-dimensional task representation 2 to function as a state representation as does environment representation 1 in a fully connected BP neural network. Therefore, in order to balance the dimension difference, the patent redesigns the network framework of the Actor, and the specific method is as follows:
as shown in fig. 3, at the input layer, task representation 2 is copied into 26 copies, and environment representation 1 is input to the Actor network, while at the first hidden layer and the second hidden layer, assuming that the number of neurons input by the previous layer is 2m, environment representation 1 is copied into the current layer m times. The above steps are repeated once for both hidden layers. Through the redesign of the network framework of the Actor, the problem of different dimensions of the environment representation 1 and the task representation 2 can be balanced, and finally, the driving environments of different conditions can be accurately identified and the driving tasks can be accurately completed when the intelligent vehicle runs in the roundabout.
Step three, designing a return function;
the agent selects an action A in the environment according to the state vector S to obtain a return signal, and updates the strategy according to the return signal. Therefore, the design of the reward function is closely related to the driving problem, and is the key to effectively learn the driving strategy.
The specific method for designing the return function under the roundabout scene considered in the patent is as follows:
the design of the return function mainly considers the safety return rsTasking reward rtExecutive reporting reThree levels. Time k security report rs(k) Mainly considering the own lane Lh(k) And a target lane Ltar(k)=Lh(k)+sign(Ty(k) Distance of the vehicle from the host vehicle, where sign (T)y(k) Left and right lane-changing actions selected by the host vehicle at time k. Also included are vehicles that will switch in both lanes in the future 5S. In particular, the lateral offset T of the terminal relative to the center line of the laney(k) When the vehicle speed is 0, the vehicle performs a lane keeping operation, and only the front part P of the vehicle4The vehicle of the location needs to be considered. When the terminal is laterally offset T relative to the center line of the laney(k) If < 0, P is considered1,P2,P3,P4Four position vehicles.Suppose a position P at time knAt a distance d from the host vehicle in the lane directionn(k) Then the security at this moment is reported rs(k) Can be incrementally calculated as equation (6),
Figure RE-GDA0002959541870000091
wherein d iseIs a dangerous distance, dcIs the collision distance.
Time k mission-specific reporting rt(k) The calculation can be carried out from the following three aspects, one is the final completion situation of the intelligent vehicle for the driving task of going out of the roundabout, the calculation can be carried out in an incremental mode as an equation (7),
Figure RE-GDA0002959541870000092
wherein | Δ lh(k)|=|(αEh(k))DEI is the longitudinal distance of the vehicle from the exit E on the lane, alphaEh(k),DEThe central angle of the vehicle relative to the exit position E at the time of the exit position E and k, and the diameter of the lane where the exit position E is located. Relative lane Δ Lh(k)=LE-Lh(k),LE,Lh(k) The exit positions E and k are the lanes in which the host vehicle is located.
One is related to the decision of different positions of the intelligent vehicle. Because of the higher traffic efficiency of the inboard lane, vehicles tend to select the inboard lane to pass through the rotary faster. The desired relative lane Δ L at time kexp(k) Can be calculated as equation (8),
Figure RE-GDA0002959541870000093
wherein alpha isElcRespectively the exit position E and the central angle required for completing one lane change operation,
Figure RE-GDA0002959541870000101
for rounding-down the sign of the operation, the relative lane Δ Lh(k)=LE-Lh(k),LE,Lh(k) The exit positions E and k are the lanes in which the host vehicle is located. Thus, another portion of the tasking reward r at time kt(k) Can be incrementally calculated as equation (9),
Figure RE-GDA0002959541870000102
wherein, Δ Lexp(k) Desired relative lane at time k, Ty(k) Is the lateral offset of the terminal relative to the centerline of the lane. Meanwhile, when the vehicle selects the lane change decision-making behavior, the target lane L is processedtar(k) And the road Lh(k) Comparing the front vehicle with the traffic flow. Assume that the preceding vehicle requiring comparison is position P1,P4If the traffic flow condition to be compared is P8,P9The reward is calculated as equations (10a) to (10d),
Figure RE-GDA0002959541870000103
Figure RE-GDA0002959541870000104
Figure RE-GDA0002959541870000105
Figure RE-GDA0002959541870000106
wherein v is1(k),v4(k),TH1(k),TH4(k),d1(k),d4(k),
Figure RE-GDA0002959541870000107
Respectively, at time k1,P4Speed, headway from host vehicle, longitudinal distance, and k time position P8,P9Average time interval of traffic flow.
Correspondingly, the last portion of the tasking return at time k, rt(k) Can be incrementally calculated as equation (11),
rt(t)=rt(t)+k1rt,1+k2rt,2+k3rt,3+k4rt,4 (11)
wherein k is1,k2,k3,k4Are parameters respectively.
Finally, the k-time executive report re(k) As shown in formula (12),
Figure RE-GDA0002959541870000108
wherein k is5,k6Are respectively a parameter, LTIs the total number of lanes in the roundabout, Lh(k) Time k of the vehicle lane, Ty(k) Is the lateral offset of the terminal relative to the centerline of the lane.
Finally, the time k returns r (k) as the formula (13),
r(t)=rs(t)+rt(t)+re(t) (13)
wherein r iss(t),rt(t),re(t) security returns r at time k, respectivelys(k) Tasking reward rt(k) Executive reporting re(k)。
The unmanned control system and the unmanned control method suitable for the roundabout scene belong to the technical field of automatic driving, and relate to a driving decision method based on a reinforcement learning method for designing, wherein the reinforcement learning state and action are specially designed according to driving decision characteristics, and a network framework of an Actor-critical framework of reinforcement learning is optimized, so that the decision method can be better suitable for the driving decision of the roundabout unmanned scene. Each sub-control system of the automatic driving control system of the unmanned vehicle needs to realize automatic control through system design, as shown in fig. 1, the automatic driving control system comprises a perception and cognition module, a driving control module and a track control module, and the embodiment mainly relates to the driving control module.
The above description is only an example of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (5)

1. The unmanned control system suitable for the roundabout scene comprises a perception and cognition module, a driving control module and a track control module; it is characterized in that the preparation method is characterized in that,
the perception cognition module is used for acquiring the running state information of the current vehicle and the environmental vehicle and processing signals;
the driving control module is used for learning appropriate decision parameter values;
and the track control module is used for obtaining the feasible track after the optimization planning.
2. The unmanned control method suitable for the roundabout scene is realized by the unmanned control system suitable for the roundabout scene in claim 1, and is characterized by comprising the following steps,
step one, designing states and actions in a Markov driving decision process;
the driving decision is modeled into a Markov decision process based on a reinforcement learning method, and comprises a state vector S for representing factors influencing the driving decision of the intelligent agent and a design of an action vector A for enhancing the refined decision of the decision intelligence of the intelligent agent;
step two, designing a network framework of the Actor;
in a reinforcement learning Actor-criticic framework, an Actor selects an action according to a state vector, namely representing a driving decision; the state vector comprises two parts of an environment characterization part and a task characterization part; through the redesign of the network framework of the Actor, the state vector has different strategies in different stages, and different dimensions of the balance environment characterization and the task characterization are achieved, so that the driving environment under different conditions can be accurately identified and the driving task can be accurately completed when the intelligent vehicle runs in the roundabout;
step three, designing a return function;
the agent selects an action A in the environment according to the state vector S to obtain a return signal, and updates the strategy according to the return signal.
3. The unmanned control method for roundabout scenes according to claim 2, wherein the Markov driving decision process state and action design of the first step comprises the following steps,
firstly, designing a state variable;
the state variables are used for action selection and value function estimation in a reinforcement learning algorithm and comprise state variable designs of an environment representation related to the relative state of the vehicle and the surrounding vehicle and a task representation related to a driving task of the vehicle, wherein the environment representation is used for an intelligent agent to complete safety decision, and the task representation is used for the intelligent agent to complete the driving task;
secondly, designing action variables;
taking multi-layer driving behaviors into consideration at a decision layer; the motion vector A representing the driving decision of the vehicle comprises discrete macroscopic driving behaviors which are the lateral deviation T of the terminal relative to the central line of the vehicle channelyAnd continuous micro-and meso-driving behavior, anticipating acceleration a for adding decision variablestarTime of action ta(ii) a Lateral offset T of terminal relative to central line of laneyBelongs to E { -L,0, L }, and respectively represents left lane changing, lane keeping and right lane changing; l is the distance between two adjacent lanes; then by motion vector a ═ Ty,atar,ta)TAnd comprehensively representing the driving decision, and inputting the driving decision as an input variable into a lower track planning layer and a vehicle control layer.
4. The unmanned control method for rotary island scene as claimed in claim 3, wherein in the first step of state variable design; for environmental characterization, in the roundabout, a part of the vehicles in the week is adjacent to the vehicle, and the vehicles are vehicles which are in direct contact interaction and need attention; the position of these vehicles is P1,P2,....,P7(ii) a Relative lanes Δ L of the vehicle at these positions at time kn(k) Relative velocity Δ vn(k) Acceleration an(k) Relative distance dn(k) Intention of driving In(k) Considered in the environment characterization, the subscript n corresponds to the position number PnVehicle information of (d); here relative lane Δ Ln(k) By Δ Ln(k)=Ln(k)-Lh(k) Is calculated to obtain wherein Ln(k),Lh(k) Respectively at time point P of knA lane of a host vehicle and a lane of the host vehicle; relative velocity Δ vn(k) By Δ vn(k)=vn(k)-vh(k) Is calculated to obtain wherein vn(k),vh(k) Respectively at time point P of knThe speed of the host vehicle and the speed of the host vehicle; driving intention In(k) E { -1,0,1} represents time P of knThe vehicle has the intention of changing lanes left, keeping lanes and changing lanes right; meanwhile, a human driver makes a decision according to the state of surrounding vehicles and selects a smooth lane according to traffic flow information on a certain lane, so that the probability of traffic jam and pause is reduced; near forward and backward traffic, e.g. position P8,P9,....,P12As another part of the environmental characterization; position P8,P9,....,P12The state of (A) is determined by the average relative speed of the traffic at time k
Figure RE-FDA0002959541860000021
Average headway
Figure RE-FDA0002959541860000022
And (4) showing. Here k time PnThe time interval between the front vehicle and the vehicle j is THn,j(k)=dn,j(k)/vn,j(k) Wherein d isn,j(k),vn,j(k) The relative distance between the vehicle j and the front vehicle at the moment k and the vehicle speed of the vehicle j are respectively; then k time, position P1,P2,....,P7At each position PnIs expressed by equation (1),
SPn(k)=(Fn(k),ΔLn(k),Δvn(k),an(k),dn(k),In(k))T, (1)
wherein FnE {1,0} indicates whether the corresponding location is a viable lane; time k, position P8,P9,....,P12The state variable at the state variable is expressed as equation (2),
Figure RE-FDA0002959541860000023
then at time k, the environment characterization is expressed as equation (3),
Figure RE-FDA0002959541860000024
for the task representation, in the roundabout, the driving control module finishes a set driving task in route navigation planning, so that the intelligent vehicle drives in the roundabout from a certain entrance and then drives out from another exit; then at time k the relative longitudinal distance deltal of the vehicle with respect to the exith(k) And relative lane Δ Lh(k) In task characterization; relative longitudinal distance Deltal of the vehicle relative to the exith(k) Represented by the formula (4),
Figure RE-FDA0002959541860000031
wherein Δ αh(k),DE,Dh(k),αEh(k) The central angle of the vehicle at time k relative to the exit position E, the diameter of the exit position E and the lane where the vehicle at time k is located, the exit position E and the time of the vehicle at time kCarving the corresponding central angle of the position; relative lane Δ Lh(k)=LE-Lh(k) Wherein L isE,Lh(k) Respectively as an exit position E and a lane where the vehicle k is located at the moment; then at time k, the task characterization (TR) is expressed as equation (5),
STR(k)=(Δlh(k),ΔLh(k))T. (5)
the state vector S is then jointly characterized using the environmental characterization and task characterization of the above design.
5. The method as claimed in claim 2, wherein the return function in step three is designed according to the return r for safetysTasking reward rtExecutive reporting reThree layers; time k security report rs(k) According to the own lane Lh(k) And a target lane Ltar(k)=Lh(k)+sign(Ty(k) Distance of the vehicle from the host vehicle, where sign (T)y(k) Left and right lane changing actions selected by the vehicle at the time k; also including vehicles that will switch into both lanes in the future 5S; when the terminal is laterally offset T relative to the center line of the laney(k) When the vehicle speed is 0, the vehicle performs a lane keeping operation, and only the front part P of the vehicle4Vehicle considerations of location; when the terminal is laterally offset T relative to the center line of the laney(k) If < 0, P is considered1,P2,P3,P4A four-position vehicle; suppose a position P at time knAt a distance d from the host vehicle in the lane directionn(k) Then the security at this moment is reported rs(k) Can be incrementally calculated as equation (6),
Figure RE-FDA0002959541860000032
wherein d iseIs a dangerous distance, dcIs the collision distance;
time k mission-specific reportingrt(k) The calculation is carried out from the following three aspects, the first aspect is the final completion situation of the intelligent vehicle for the driving task of going out of the roundabout, the incremental calculation is the formula (7),
Figure RE-FDA0002959541860000033
wherein | Δ lh(k)|=|(αEh(k))DEI is the longitudinal distance of the vehicle from the exit E on the lane, alphaEh(k),DEThe central angle of the vehicle relative to the exit position E at the time of the exit position E and k, and the diameter of the lane where the exit position E is located. Relative lane Δ Lh(k)=LE-Lh(k),LE,Lh(k) The exit position E and the time k are the lanes where the vehicle is located;
the second aspect is related to the decision of different positions of the intelligent vehicle, and due to the fact that the inner lane has higher traffic efficiency, the vehicle tends to select the inner lane to pass through the rotary island faster, and then the expected relative lane delta L at the moment k isexp(k) The calculation is as in equation (8),
Figure RE-FDA0002959541860000041
wherein alpha isElcRespectively the exit position E and the central angle required for completing one lane change operation,
Figure RE-FDA0002959541860000042
for rounding-down the sign of the operation, the relative lane Δ Lh(k)=LE-Lh(k),LE,Lh(k) The exit position E and the time k are the lanes where the vehicle is located; then another portion of the tasking return at time k rt(k) The incremental calculation is equation (9),
Figure RE-FDA0002959541860000043
wherein, Δ Lexp(k) Desired relative lane at time k, Ty(k) Is the lateral offset of the terminal relative to the central line of the lane; meanwhile, when the vehicle selects the lane change decision-making behavior, the target lane L is processedtar(k) And the road Lh(k) Comparing the front vehicle with the traffic flow conditions; assume that the preceding vehicle requiring comparison is position P1,P4If the traffic flow condition to be compared is P8,P9The reward is calculated as equations (10a), (10b), (10c) and (10d),
Figure RE-FDA0002959541860000044
Figure RE-FDA0002959541860000045
Figure RE-FDA0002959541860000046
Figure RE-FDA0002959541860000047
wherein v is1(k),v4(k),TH1(k),TH4(k),d1(k),d4(k),
Figure RE-FDA0002959541860000048
Respectively, at time k1,P4Speed, headway from host vehicle, longitudinal distance, and k time position P8,P9Average time interval of traffic flow of (1);
the last part of the mission-based reward r at time kt(k) The incremental calculation is equation (11),
rt(t)=rt(t)+k1rt,1+k2rt,2+k3rt,3+k4rt,4 (11)
wherein k is1,k2,k3,k4Are parameters respectively;
finally, an executive report r is given at time ke(k) As shown in formula (12),
Figure RE-FDA0002959541860000049
wherein k is5,k6Are respectively a parameter, LTIs the total number of lanes in the roundabout, Lh(k) Time k of the vehicle lane, Ty(k) Is the lateral offset of the terminal relative to the central line of the lane;
finally, the time k returns r (k) as the formula (13),
r(t)=rs(t)+rt(t)+re(t) (13)
wherein r iss(t),rt(t),re(t) security returns r at time k, respectivelys(k) Tasking reward rt(k) Executive reporting re(k)。
CN202011482837.6A 2020-12-16 2020-12-16 Unmanned control system and control method suitable for roundabout scene Active CN112644516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011482837.6A CN112644516B (en) 2020-12-16 2020-12-16 Unmanned control system and control method suitable for roundabout scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011482837.6A CN112644516B (en) 2020-12-16 2020-12-16 Unmanned control system and control method suitable for roundabout scene

Publications (2)

Publication Number Publication Date
CN112644516A true CN112644516A (en) 2021-04-13
CN112644516B CN112644516B (en) 2022-03-29

Family

ID=75354529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011482837.6A Active CN112644516B (en) 2020-12-16 2020-12-16 Unmanned control system and control method suitable for roundabout scene

Country Status (1)

Country Link
CN (1) CN112644516B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113071524A (en) * 2021-04-29 2021-07-06 深圳大学 Decision control method, decision control device, autonomous driving vehicle and storage medium
CN114153213A (en) * 2021-12-01 2022-03-08 吉林大学 Deep reinforcement learning intelligent vehicle behavior decision method based on path planning
CN114162140A (en) * 2021-12-08 2022-03-11 武汉中海庭数据技术有限公司 Optimal lane matching method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110187639A (en) * 2019-06-27 2019-08-30 吉林大学 A kind of trajectory planning control method based on Parameter Decision Making frame
US20190291726A1 (en) * 2018-03-20 2019-09-26 Mobileye Vision Technologies Ltd. Systems and methods for navigating a vehicle
US20200062262A1 (en) * 2018-08-24 2020-02-27 Ford Global Technologies, Llc Vehicle action control
US20200139973A1 (en) * 2018-11-01 2020-05-07 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
EP3716285A1 (en) * 2019-03-29 2020-09-30 Tata Consultancy Services Limited Modeling a neuronal controller exhibiting human postural sway
CN111833597A (en) * 2019-04-15 2020-10-27 哲内提 Autonomous decision making in traffic situations with planning control
CN111845773A (en) * 2020-07-06 2020-10-30 北京邮电大学 Automatic driving vehicle micro-decision-making method based on reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190291726A1 (en) * 2018-03-20 2019-09-26 Mobileye Vision Technologies Ltd. Systems and methods for navigating a vehicle
US20200062262A1 (en) * 2018-08-24 2020-02-27 Ford Global Technologies, Llc Vehicle action control
US20200139973A1 (en) * 2018-11-01 2020-05-07 GM Global Technology Operations LLC Spatial and temporal attention-based deep reinforcement learning of hierarchical lane-change policies for controlling an autonomous vehicle
EP3716285A1 (en) * 2019-03-29 2020-09-30 Tata Consultancy Services Limited Modeling a neuronal controller exhibiting human postural sway
CN111833597A (en) * 2019-04-15 2020-10-27 哲内提 Autonomous decision making in traffic situations with planning control
CN110187639A (en) * 2019-06-27 2019-08-30 吉林大学 A kind of trajectory planning control method based on Parameter Decision Making frame
CN111845773A (en) * 2020-07-06 2020-10-30 北京邮电大学 Automatic driving vehicle micro-decision-making method based on reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113071524A (en) * 2021-04-29 2021-07-06 深圳大学 Decision control method, decision control device, autonomous driving vehicle and storage medium
CN114153213A (en) * 2021-12-01 2022-03-08 吉林大学 Deep reinforcement learning intelligent vehicle behavior decision method based on path planning
CN114162140A (en) * 2021-12-08 2022-03-11 武汉中海庭数据技术有限公司 Optimal lane matching method and system
CN114162140B (en) * 2021-12-08 2023-08-01 武汉中海庭数据技术有限公司 Optimal lane matching method and system

Also Published As

Publication number Publication date
CN112644516B (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN112644516B (en) Unmanned control system and control method suitable for roundabout scene
Sun et al. A fast integrated planning and control framework for autonomous driving via imitation learning
Yu et al. A human-like game theory-based controller for automatic lane changing
Li et al. Shared control driver assistance system based on driving intention and situation assessment
Zhan et al. Spatially-partitioned environmental representation and planning architecture for on-road autonomous driving
Zhang et al. Adaptive decision-making for automated vehicles under roundabout scenarios using optimization embedded reinforcement learning
Sun et al. Behavior planning of autonomous cars with social perception
CN111679660B (en) Unmanned deep reinforcement learning method integrating human-like driving behaviors
CN114013443B (en) Automatic driving vehicle lane change decision control method based on hierarchical reinforcement learning
Liu et al. Enabling safe freeway driving for automated vehicles
Aradi et al. Policy gradient based reinforcement learning approach for autonomous highway driving
CN114564016A (en) Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning
WO2024088068A1 (en) Automatic parking decision making method based on fusion of model predictive control and reinforcement learning
CN113627239A (en) Remote driving vehicle track prediction method combined with driver lane changing intention
CN115257745A (en) Automatic driving lane change decision control method based on rule fusion reinforcement learning
CN112550314A (en) Embedded optimization type control method suitable for unmanned driving, driving control module and automatic driving control system thereof
US20220144309A1 (en) Navigation trajectory using reinforcement learning for an ego vehicle in a navigation network
Capasso et al. End-to-end intersection handling using multi-agent deep reinforcement learning
CN113581182A (en) Method and system for planning track change of automatic driving vehicle based on reinforcement learning
CN114580302A (en) Decision planning method for automatic driving automobile based on maximum entropy reinforcement learning
Yan et al. A multi-vehicle game-theoretic framework for decision making and planning of autonomous vehicles in mixed traffic
Duan et al. Encoding distributional soft actor-critic for autonomous driving in multi-lane scenarios
Hang et al. Human-like lane-change decision making for automated driving with a game theoretic approach
Miura et al. Toward vision-based intelligent navigator: its concept and prototype
Wille et al. Comprehensive treated sections in a trajectory planner for realizing autonomous driving in Braunschweig's urban traffic

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240331

Address after: 266000 No.1 Loushan Road, Licang District, Qingdao City, Shandong Province

Patentee after: QINGDAO AUTOMOTIVE RESEARCH INSTITUTE, JILIN University

Country or region after: China

Patentee after: Jilin University

Address before: 266000 No.1 Loushan Road, Licang District, Qingdao City, Shandong Province

Patentee before: QINGDAO AUTOMOTIVE RESEARCH INSTITUTE, JILIN University

Country or region before: China