CN112644516A

CN112644516A - Unmanned control system and control method suitable for roundabout scene

Info

Publication number: CN112644516A
Application number: CN202011482837.6A
Authority: CN
Inventors: 张羽翔; 李鑫; 丛岩峰; 王玉海; 高炳钊
Original assignee: Qingdao Automotive Research Institute Jilin University
Current assignee: Qingdao Automotive Research Institute Jilin University; Jilin University
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2021-04-13
Anticipated expiration: 2040-12-16
Also published as: CN112644516B

Abstract

The invention discloses an unmanned control system and a control method suitable for a roundabout scene, wherein a perception cognition module of the control system is used for obtaining the running state information of a current vehicle and an environmental vehicle and carrying out signal processing; the driving control module is used for learning appropriate decision parameter values; the track control module is used for obtaining a feasible track after optimized planning; the method belongs to the technical field of automatic driving, and relates to a driving decision method based on a reinforcement learning method for designing, wherein the reinforcement learning state and action are specially designed according to driving decision characteristics, and a network framework of an Actor-Critic framework of the reinforcement learning Actor-Critic framework is optimized, so that the decision method can be better suitable for the driving decision of a roundabout unmanned scene.

Description

Unmanned control system and control method suitable for roundabout scene

Technical Field

The invention relates to the technical field of unmanned driving, in particular to an unmanned driving control system and method suitable for a rotary island scene.

Background

Because of the interactive learning mode, the reinforcement learning method is increasingly applied to the driving decision of the unmanned vehicle. The application scene of research is expanded from a highway with relatively simple road conditions to a relatively complex roundabout scene. For a roundabout scene, a driving task is relatively complex, and the intelligent vehicle needs to consider multiple factors at the same time to make more detailed driving decisions. The driving distance of the inner lane is shorter, the traffic efficiency is higher, and the vehicle can only drive in or out from the outermost lane. Therefore, the driving strategy of the vehicle needs to be on different road sections in the roundabout, different decision strategies are considered, and balance is achieved between high traffic efficiency and the driving task of driving in and out. Based on the requirements of the practical problems, for the driving decision problem in the roundabout scene, when a reinforcement learning algorithm is used, special design needs to be carried out on states and actions according to the driving decision characteristics, and a network framework of an Actor of a reinforcement learning Actor-critical framework is optimized, so that the decision method can be better suitable for the driving decision problem in the roundabout unmanned scene.

Disclosure of Invention

The invention provides an unmanned control system and a control method suitable for a roundabout scene, which are specially designed for a reinforcement state and an action according to driving decision characteristics aiming at the driving requirement of the roundabout driving scene, and optimize a network framework of an Actor-critical framework for reinforcement learning, so that the decision method can be better suitable for the driving decision problem of the roundabout unmanned scene.

The invention provides an unmanned control system suitable for a roundabout scene, which comprises a perception and cognition module, a driving control module and a track control module;

the perception cognition module is used for acquiring the running state information of the current vehicle and the environmental vehicle and processing signals;

the driving control module is used for learning appropriate decision parameter values;

and the track control module is used for obtaining the feasible track after the optimization planning.

Another aspect of the present invention provides an unmanned control method for a roundabout scene, which is implemented by an unmanned control system for a roundabout scene according to an aspect of the present invention, comprising the following steps,

step one, designing states and actions in a Markov driving decision process;

the driving decision is modeled into a Markov decision process based on a reinforcement learning method, and comprises a state vector S for representing factors influencing the driving decision of the intelligent agent and a design of an action vector A for enhancing the refined decision of the decision intelligence of the intelligent agent;

step two, designing a network framework of the Actor;

in a reinforcement learning Actor-criticic framework, an Actor selects an action according to a state vector, namely representing a driving decision; the state vector comprises two parts of an environment characterization part and a task characterization part; through the redesign of the network framework of the Actor, the state vector has different strategies in different stages, and different dimensions of the balance environment characterization and the task characterization are achieved, so that the driving environment under different conditions can be accurately identified and the driving task can be accurately completed when the intelligent vehicle runs in the roundabout;

step three, designing a return function;

the agent selects an action A in the environment according to the state vector S to obtain a return signal, and updates the strategy according to the return signal.

The invention relates to an unmanned control method suitable for a rotary island scene, further comprising the following steps in the Markov driving decision process state and action design of the first step,

firstly, designing a state variable;

the state variables are used for action selection and value function estimation in a reinforcement learning algorithm, and the state variables comprise state variable designs of an Environment Representation (ER) related to the relative state of the vehicle and the surrounding vehicle and a Task Representation (TR) related to a vehicle driving task, wherein the environment representation is used for an intelligent agent to complete safety decision, and the task representation is used for the intelligent agent to complete the driving task;

secondly, designing action variables;

taking multi-layer driving behaviors into consideration at a decision layer; the motion vector A representing the driving decision of the vehicle comprises discrete macroscopic driving behaviors which are the lateral deviation T of the terminal relative to the central line of the vehicle channel_yAnd continuous micro-and meso-driving behavior, anticipating acceleration a for adding decision variables_tarTime of action t_a(ii) a Lateral offset T of terminal relative to central line of lane_yBelongs to E { -L,0, L }, and respectively represents left lane changing, lane keeping and right lane changing; l is the distance between two adjacent lanes; then by motion vector a ═ T_y,a_tar,t_a)^TAnd comprehensively representing the driving decision, and inputting the driving decision as an input variable into a lower track planning layer and a vehicle control layer.

The unmanned control method suitable for the roundabout scene further comprises the steps of designing a state variable in the first step; for environmental characterization, in the roundabout, a part of the vehicles in the week is adjacent to the vehicle, and the vehicles are vehicles which are in direct contact interaction and need attention; the position of these vehicles is P₁,P₂,....,P₇(ii) a Relative lanes Δ L of the vehicle at these positions at time k_n(k) Relative velocity Δ v_n(k) Acceleration a_n(k) Relative distance d_n(k) Intention of driving I_n(k) Considered in the environment characterization, the subscript n corresponds to the position number P_nVehicle information of (d); here relative lane Δ L_n(k) By Δ L_n(k)＝L_n(k)-L_h(k) Is calculated to obtain wherein L_n(k),L_h(k) Respectively at time point P of k_nA lane of a host vehicle and a lane of the host vehicle; relative velocity Δ v_n(k) By Δ v_n(k)＝v_n(k)-v_h(k) Is calculated to obtain wherein v_n(k),v_h(k) Respectively at time point P of k_nThe speed of the host vehicle and the speed of the host vehicle; driving intention I_n(k) E { -1,0,1} represents time P of k_nThe vehicle has the intention of changing lanes left, keeping lanes and changing lanes right; meanwhile, a human driver makes a decision according to the state of surrounding vehicles and selects a smooth lane according to traffic flow information on a certain lane, so that traffic jam is reducedThe probability of a pause; near forward and backward traffic, e.g. position P₈,P₉,....,P₁₂As another part of the environmental characterization; position P₈,P₉,....,P₁₂The state of (A) is determined by the average relative speed of the traffic at time k

Average headway

And (4) showing. Here k time P_nThe time interval between the front vehicle and the vehicle j is TH_n,j(k)＝d_n,j(k)/v_n,j(k) Wherein d is_n,j(k),v_n,j(k) The relative distance between the vehicle j and the front vehicle at the moment k and the vehicle speed of the vehicle j are respectively; then k time, position P₁,P₂,....,P₇At each position P_nIs expressed by equation (1),

S_Pn(k)＝(F_n(k),ΔL_n(k),Δv_n(k),a_n(k),d_n(k),I_n(k))^T, (1)

wherein F_nE {1,0} indicates whether the corresponding location is a viable lane; time k, position P₈,P₉,....,P₁₂The state variable at the state variable is expressed as equation (2),

then at time k, the Environment Representation (ER) is expressed as equation (3),

for the task representation, in the roundabout, the driving control module finishes a set driving task in route navigation planning, so that the intelligent vehicle drives in the roundabout from a certain entrance and then drives out from another exit; then at time k, the host vehicle is opposite to the exitRelative longitudinal distance Δ l_h(k) And relative lane Δ L_h(k) In task characterization; relative longitudinal distance Deltal of the vehicle relative to the exit_h(k) Represented by the formula (4),

wherein Δ α_h(k),D_E,D_h(k),α_E,α_h(k) The central angles corresponding to the central angle of the vehicle at the moment k relative to the exit position E, the diameter of a lane where the vehicle is located at the moment k, the exit position E and the position of the vehicle at the moment k are respectively the central angles; relative lane Δ L_h(k)＝L_E-L_h(k) Wherein L is_E,L_h(k) Respectively as an exit position E and a lane where the vehicle k is located at the moment; then at time k, the task characterization (TR) is expressed as equation (5),

S_TR(k)＝(Δl_h(k),ΔL_h(k))^T. (5)

the state vector S is then jointly characterized using the environmental characterization and task characterization of the above design.

The unmanned control method suitable for the roundabout scene further comprises the step three of returning r according to the safety in the return function design_sTasking reward r_tExecutive reporting r_eThree layers; time k security report r_s(k) According to the own lane L_h(k) And a target lane L_tar(k)＝L_h(k)+sign(T_y(k) Distance of the vehicle from the host vehicle, where sign (T)_y(k) Left and right lane changing actions selected by the vehicle at the time k; also including vehicles that will switch into both lanes in the future 5S; when the terminal is laterally offset T relative to the center line of the lane_y(k) When the vehicle speed is 0, the vehicle performs a lane keeping operation, and only the front part P of the vehicle₄Vehicle considerations of location; when the terminal is laterally offset T relative to the center line of the lane_y(k) If < 0, P is considered₁,P₂,P₃,P₄Four positionsThe vehicle of (1). Suppose a position P at time k_nAt a distance d from the host vehicle in the lane direction_n(k) Then the security at this moment is reported r_s(k) Can be incrementally calculated as equation (6),

wherein d is_eIs a dangerous distance, d_cIs the collision distance;

time k mission-specific reporting r_t(k) The calculation is carried out from the following three aspects, the first aspect is the final completion situation of the intelligent vehicle for the driving task of going out of the roundabout, the incremental calculation is the formula (7),

wherein | Δ l_h(k)|＝|(α_E-α_h(k))D_EI is the longitudinal distance of the vehicle from the exit E on the lane, alpha_E,α_h(k),D_EThe central angle of the vehicle relative to the exit position E at the time of the exit position E and k, and the diameter of the lane where the exit position E is located. Relative lane Δ L_h(k)＝L_E-L_h(k)，L_E,L_h(k) The exit position E and the time k are the lanes where the vehicle is located;

the second aspect is related to the decision of different positions of the intelligent vehicle, and due to the fact that the inner lane has higher traffic efficiency, the vehicle tends to select the inner lane to pass through the rotary island faster, and then the expected relative lane delta L at the moment k is_exp(k) The calculation is as in equation (8),

wherein alpha is_E,α_lcRespectively the exit position E and the central angle required for completing one lane change operation,

for rounding-down the sign of the operation, the relative lane Δ L_h(k)＝L_E-L_h(k)，L_E,L_h(k) The exit position E and the time k are the lanes where the vehicle is located; then another portion of the tasking return at time k r_t(k) The incremental calculation is equation (9),

wherein, Δ L_exp(k) Desired relative lane at time k, T_y(k) Is the lateral offset of the terminal relative to the central line of the lane; meanwhile, when the vehicle selects the lane change decision-making behavior, the target lane L is processed_tar(k) And the road L_h(k) Comparing the front vehicle with the traffic flow conditions; assume that the preceding vehicle requiring comparison is position P₁,P₄If the traffic flow condition to be compared is P₈,P₉The reward is calculated as equations (10a), (10b), (10c) and (10d),

wherein v is₁(k),v₄(k),TH₁(k),TH₄(k),d₁(k),d₄(k),

Respectively, at time k₁,P₄Speed, headway from host vehicle, longitudinal distance, and k time position P₈,P₉Average time interval of traffic flow of (1);

the last part of the mission-based reward r at time k_t(k) The incremental calculation is equation (11),

r_t(t)＝r_t(t)+k₁r_t,1+k₂r_t,2+k₃r_t,3+k₄r_t,4 (11)

wherein k is₁,k₂,k₃,k₄Are parameters respectively;

finally, an executive report r is given at time k_e(k) As shown in formula (12),

wherein k is₅,k₆Are respectively a parameter, L_TIs the total number of lanes in the roundabout, L_h(k) Time k of the vehicle lane, T_y(k) Is the lateral offset of the terminal relative to the central line of the lane;

finally, the time k returns r (k) as the formula (13),

r(t)＝r_s(t)+r_t(t)+r_e(t) (13)

wherein r is_s(t),r_t(t),r_e(t) security returns r at time k, respectively_s(k) Tasking reward r_t(k) Executive reporting r_e(k)。

The unmanned control system and the unmanned control method suitable for the roundabout scene can achieve the following beneficial effects:

the unmanned control system and the unmanned control method suitable for the roundabout scene have the following advantages: (1) considering an Environment Representation (ER) related to the relative state of the vehicle and the surrounding vehicle and a Task Representation (TR) related to a vehicle driving task aiming at the driving requirement of the roundabout driving scene so as to better adapt to the driving decision problem of the roundabout unmanned driving scene; (2) the method is based on refined driving decision requirements, and the decided action vector simultaneously comprises a discrete variable pointing to macro driving behavior of lane changing and a continuous variable pointing to micro driving behavior of lane changing, so that better system performance is realized; (3) according to different characteristics and characteristics of an environment characterization (ER) and a task characterization (TR), an Actor network framework of an Actor-critical framework established by a reinforcement learning decision algorithm is specially designed to balance the dimension difference of the two characterization modes; and (4) the return function is designed by considering the performance indexes of safety return, mission return and executive return, so that the intelligent agent can effectively learn to obtain the driving strategy.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a block diagram of an unmanned control system suitable for a roundabout scene according to the present invention;

FIG. 2 is a schematic view of a vehicle and its surroundings;

fig. 3 is a network configuration diagram of an Actor.

In the figure, 1 is an environment characterization, and 2 is a task characterization.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the specific embodiments of the present invention and the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The technical solutions provided by the embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Example 1

The unmanned control system suitable for the roundabout scene, as shown in fig. 1, comprises a perception and cognition module, a driving control module and a track control module;

Example 2

The unmanned control method suitable for the roundabout scene is realized by the unmanned control system suitable for the roundabout scene as described in embodiment 1, and comprises the following steps,

step one, designing states and actions in a Markov driving decision process;

the driving decision may be modeled as a markov decision process based on a reinforcement learning approach. The method comprises a state vector S representing factors influencing intelligent driving decision factors, and can enhance the design of an action vector A for the refined decision of intelligent decision making intelligence of the intelligent agent. The specific method comprises the following steps:

the first step, the state variable design,

the state variables are used for action selection and value function estimation in the reinforcement learning algorithm, so that the relation between the current state of the intelligent agent and the environment and the characteristics among tasks required to be completed by the current state of the intelligent agent can be accurately represented in the design of the state variables, the sensitivity of the intelligent agent to the environment and the state of the intelligent agent can be improved, the intelligent agent can be helped to reasonably act in the changing environment, and the learning process can be more effective. Meanwhile, the efficiency of the learning algorithm and the learning result are not only related to the design of the return function, but also have a certain degree of relation with the design of the state variable.

In the design of the state variables, the present embodiment considers two parts of the state variable design, an environment representation 1 related to the relative state of the vehicle and the surrounding vehicle, and a task representation 2 related to the driving task of the vehicle. The environment representation 1 can help the intelligent agent to complete safety decision, and the task representation 2 helps the intelligent agent to smoothly complete driving task.

For environmental characterization 1, in the roundabout, the week vehicle can be divided into two parts as shown in fig. 2, and the numbering is as shown in the figure. P₁-P₇The range of different positions is shown in table 1.

Table 1:

position of	Range	Position of	Range
				P₄	TH_n∈[0,3]	P₁,P₅	d_n∈[10,40]
P₂,P₆	d_n∈[-10,10]	P₃,P₇	d_n∈[-40,-10]
				P₈,P₉,P₁₀	d_n<40	P₁₁,P₁₂	d_n>-40

TH in Table 1_n(k)＝d_n(k)/v_h(k)，TH_n(k),d_n(k),v_h(k) Respectively at time point P of k_nThe location is relative to the headway, relative distance of the host vehicle, and the speed of the host vehicle.

Some of the vehicles shown in fig. 2, which are adjacent to the host vehicle, may interact with each other in direct contact and require close attention. The position of these vehicles is P₁,P₂,....,P₇. Relative lanes Δ L of the vehicle at these positions at time k_n(k) Relative velocity Δ v_n(k) Acceleration a_n(k) Relative distance d_n(k) Intention of driving I_n(k) Considered in the environment representation 1, the subscript n corresponds to the position number P_nThe vehicle information of (c). Here relative lane Δ L_n(k) By Δ L_n(k)＝L_n(k)-L_h(k) Is calculated to obtain wherein L_n(k),L_h(k) Respectively at time point P of k_nThe lane of the host vehicle and the lane of the host vehicle. Relative velocity Δ v_n(k) By Δ v_n(k)＝v_n(k)-v_h(k) Is calculated to obtain wherein v_n(k),v_h(k) Respectively at time point P of k_nThe speed of the vehicle and the speed of the vehicle. Driving intention I_n(k) E { -1,0,1} represents time P of k_nThe vehicle has the intention to change lanes left, to keep lanes, and to change lanes right. Meanwhile, a human driver can make a decision according to the state of surrounding vehicles and also consider traffic flow information on a certain lane, and if a smooth lane is selected, the probability of traffic jam and pause can be reduced. Thus, near forward and backward traffic, e.g. position P₈,P₉,....,P₁₂And the environment as another part is characterized 1. Position P₈,P₉,....,P₁₂The state of (A) is determined by the average relative speed of the traffic at time k

Average headTime distance

And (4) showing. Here k time P_nThe time interval between the front vehicle and the vehicle j is TH_n,j(k)＝d_n,j(k)/v_n,j(k) Wherein d is_n,j(k),v_n,j(k) The relative distance between the vehicle j and the front vehicle at the moment k and the vehicle speed of the vehicle j are respectively. From the above, time k, position P₁,P₂,....,P₇At each position P_nCan be expressed as equation (1),

S_Pn(k)＝(F_n(k),ΔL_n(k),Δv_n(k),a_n(k),d_n(k),I_n(k))^T, (1)

wherein F_nE 1,0 indicates whether the corresponding location is a feasible lane. Time k, position P₈,P₉,....,P₁₂The state variable at the state variable may be represented by equation (2),

therefore, at time k, the environment characterization 1 can be expressed as formula (3),

for the task characterization 2, in the roundabout, the driving decision module needs to complete a specific driving task in the route navigation planning, that is, the intelligent vehicle enters the roundabout from a certain entrance and then exits from another exit. Thus, at time k, the relative longitudinal distance Δ l of the host vehicle with respect to the exit_h(k) And relative lane Δ L_h(k) Are considered in task characterization 2. Relative longitudinal distance Deltal of the vehicle relative to the exit_h(k) Can be represented by the formula (4),

wherein Δ α_h(k),D_E,D_h(k),α_E,α_h(k) The central angle of the vehicle at the time k with respect to the exit position E, the diameters of the exit position E and the lane where the vehicle is located at the time k, and the central angles of the exit position E and the vehicle at the time k are respectively corresponding to the central angles. Relative lane Δ L_h(k)＝L_E-L_h(k) Wherein L is_E,L_h(k) The exit position E and the lane where the host vehicle k is located at the moment are respectively. Therefore, at time k, task representation 2 can be expressed as equation (5),

S_TR(k)＝(Δl_h(k),ΔL_h(k))^T. (5)

finally, the state vector S is jointly characterized using the environment characterization 1 and the task characterization 2 of the above design.

The second step, the design of the action variables,

the refined driving decision should consider more driving behaviors in the decision layer. The motion vector A representing the driving decision of the vehicle comprises discrete macroscopic driving behaviors, namely the lateral deviation T of the terminal relative to the central line of the vehicle channel_yAnd continuous microscopic driving behavior, i.e. adding a decision variable to the desired acceleration a_tarTime of action t_a. Lateral offset T of terminal relative to central line of lane_yAnd e { -L,0, L }, which respectively represent a left lane change, a lane keeping and a right lane change. And L is the distance between two adjacent lanes. Final use motion vector a ═ T_y,a_tar,t_a)^TAnd comprehensively representing a more refined driving decision, and inputting the driving decision as an input variable into a lower track planning layer and a vehicle control layer. In particular, when the motion vector a takes different values, it can be described as different driving behaviors as shown in table 2.

TABLE 2

(T_y,a_tar,t_a)^T	Description of the invention
		(-L,0.5,4)^T	Accelerating left lane change in a gentle manner
(-L,1,1)^T	Accelerated lane keeping
		(-L,-1,1)^T	Deceleration lane keeping
(L,0,2)^T	Speed-keeping fast right lane change

Step two, designing a network framework of the Actor;

the reinforcement learning decision algorithm of the embodiment is built on an Actor-Critic framework. In a reinforcement learning Actor-critical framework, an Actor selects an action according to a state vector, namely, a driving decision is represented. The state vector considered by this patent contains two parts, environment characterization 1 and task characterization 2. These two parts have equal effect in driving decision. For example, when the intelligent vehicle enters a lane change scene, the intelligent agent has more freedom to select actions with higher returns, for example, entering an inner lane or a lane with sparse traffic flow to obtain higher traffic efficiency, and when approaching an exit of the roundabout, the intelligent agent should change the lane outside the lane as much as possible so as to smoothly leave the roundabout from a given exit. These cases cause the state vector to have different policies at different stages. As described in step 21), the dimension of the environment characterization 1 is 52 and the dimension of the task characterization 2 is 2. Such dimension differences can make it difficult for a few-dimensional task representation 2 to function as a state representation as does environment representation 1 in a fully connected BP neural network. Therefore, in order to balance the dimension difference, the patent redesigns the network framework of the Actor, and the specific method is as follows:

as shown in fig. 3, at the input layer, task representation 2 is copied into 26 copies, and environment representation 1 is input to the Actor network, while at the first hidden layer and the second hidden layer, assuming that the number of neurons input by the previous layer is 2m, environment representation 1 is copied into the current layer m times. The above steps are repeated once for both hidden layers. Through the redesign of the network framework of the Actor, the problem of different dimensions of the environment representation 1 and the task representation 2 can be balanced, and finally, the driving environments of different conditions can be accurately identified and the driving tasks can be accurately completed when the intelligent vehicle runs in the roundabout.

Step three, designing a return function;

the agent selects an action A in the environment according to the state vector S to obtain a return signal, and updates the strategy according to the return signal. Therefore, the design of the reward function is closely related to the driving problem, and is the key to effectively learn the driving strategy.

The specific method for designing the return function under the roundabout scene considered in the patent is as follows:

the design of the return function mainly considers the safety return r_sTasking reward r_tExecutive reporting r_eThree levels. Time k security report r_s(k) Mainly considering the own lane L_h(k) And a target lane L_tar(k)＝L_h(k)+sign(T_y(k) Distance of the vehicle from the host vehicle, where sign (T)_y(k) Left and right lane-changing actions selected by the host vehicle at time k. Also included are vehicles that will switch in both lanes in the future 5S. In particular, the lateral offset T of the terminal relative to the center line of the lane_y(k) When the vehicle speed is 0, the vehicle performs a lane keeping operation, and only the front part P of the vehicle₄The vehicle of the location needs to be considered. When the terminal is laterally offset T relative to the center line of the lane_y(k) If < 0, P is considered₁,P₂,P₃,P₄Four position vehicles.Suppose a position P at time k_nAt a distance d from the host vehicle in the lane direction_n(k) Then the security at this moment is reported r_s(k) Can be incrementally calculated as equation (6),

wherein d is_eIs a dangerous distance, d_cIs the collision distance.

Time k mission-specific reporting r_t(k) The calculation can be carried out from the following three aspects, one is the final completion situation of the intelligent vehicle for the driving task of going out of the roundabout, the calculation can be carried out in an incremental mode as an equation (7),

wherein | Δ l_h(k)|＝|(α_E-α_h(k))D_EI is the longitudinal distance of the vehicle from the exit E on the lane, alpha_E,α_h(k),D_EThe central angle of the vehicle relative to the exit position E at the time of the exit position E and k, and the diameter of the lane where the exit position E is located. Relative lane Δ L_h(k)＝L_E-L_h(k)，L_E,L_h(k) The exit positions E and k are the lanes in which the host vehicle is located.

One is related to the decision of different positions of the intelligent vehicle. Because of the higher traffic efficiency of the inboard lane, vehicles tend to select the inboard lane to pass through the rotary faster. The desired relative lane Δ L at time k_exp(k) Can be calculated as equation (8),

for rounding-down the sign of the operation, the relative lane Δ L_h(k)＝L_E-L_h(k)，L_E,L_h(k) The exit positions E and k are the lanes in which the host vehicle is located. Thus, another portion of the tasking reward r at time k_t(k) Can be incrementally calculated as equation (9),

wherein, Δ L_exp(k) Desired relative lane at time k, T_y(k) Is the lateral offset of the terminal relative to the centerline of the lane. Meanwhile, when the vehicle selects the lane change decision-making behavior, the target lane L is processed_tar(k) And the road L_h(k) Comparing the front vehicle with the traffic flow. Assume that the preceding vehicle requiring comparison is position P₁,P₄If the traffic flow condition to be compared is P₈,P₉The reward is calculated as equations (10a) to (10d),

wherein v is₁(k),v₄(k),TH₁(k),TH₄(k),d₁(k),d₄(k),

Respectively, at time k₁,P₄Speed, headway from host vehicle, longitudinal distance, and k time position P₈,P₉Average time interval of traffic flow.

Correspondingly, the last portion of the tasking return at time k, r_t(k) Can be incrementally calculated as equation (11),

r_t(t)＝r_t(t)+k₁r_t,1+k₂r_t,2+k₃r_t,3+k₄r_t,4 (11)

wherein k is₁,k₂,k₃,k₄Are parameters respectively.

Finally, the k-time executive report r_e(k) As shown in formula (12),

wherein k is₅,k₆Are respectively a parameter, L_TIs the total number of lanes in the roundabout, L_h(k) Time k of the vehicle lane, T_y(k) Is the lateral offset of the terminal relative to the centerline of the lane.

Finally, the time k returns r (k) as the formula (13),

r(t)＝r_s(t)+r_t(t)+r_e(t) (13)

The unmanned control system and the unmanned control method suitable for the roundabout scene belong to the technical field of automatic driving, and relate to a driving decision method based on a reinforcement learning method for designing, wherein the reinforcement learning state and action are specially designed according to driving decision characteristics, and a network framework of an Actor-critical framework of reinforcement learning is optimized, so that the decision method can be better suitable for the driving decision of the roundabout unmanned scene. Each sub-control system of the automatic driving control system of the unmanned vehicle needs to realize automatic control through system design, as shown in fig. 1, the automatic driving control system comprises a perception and cognition module, a driving control module and a track control module, and the embodiment mainly relates to the driving control module.

The above description is only an example of the present invention, and is not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. The unmanned control system suitable for the roundabout scene comprises a perception and cognition module, a driving control module and a track control module; it is characterized in that the preparation method is characterized in that,

2. The unmanned control method suitable for the roundabout scene is realized by the unmanned control system suitable for the roundabout scene in claim 1, and is characterized by comprising the following steps,

step one, designing states and actions in a Markov driving decision process;

step two, designing a network framework of the Actor;

step three, designing a return function;

3. The unmanned control method for roundabout scenes according to claim 2, wherein the Markov driving decision process state and action design of the first step comprises the following steps,

firstly, designing a state variable;

the state variables are used for action selection and value function estimation in a reinforcement learning algorithm and comprise state variable designs of an environment representation related to the relative state of the vehicle and the surrounding vehicle and a task representation related to a driving task of the vehicle, wherein the environment representation is used for an intelligent agent to complete safety decision, and the task representation is used for the intelligent agent to complete the driving task;

secondly, designing action variables;

4. The unmanned control method for rotary island scene as claimed in claim 3, wherein in the first step of state variable design; for environmental characterization, in the roundabout, a part of the vehicles in the week is adjacent to the vehicle, and the vehicles are vehicles which are in direct contact interaction and need attention; the position of these vehicles is P₁,P₂,....,P₇(ii) a Relative lanes Δ L of the vehicle at these positions at time k_n(k) Relative velocity Δ v_n(k) Acceleration a_n(k) Relative distance d_n(k) Intention of driving I_n(k) Considered in the environment characterization, the subscript n corresponds to the position number P_nVehicle information of (d); here relative lane Δ L_n(k) By Δ L_n(k)＝L_n(k)-L_h(k) Is calculated to obtain wherein L_n(k),L_h(k) Respectively at time point P of k_nA lane of a host vehicle and a lane of the host vehicle; relative velocity Δ v_n(k) By Δ v_n(k)＝v_n(k)-v_h(k) Is calculated to obtain wherein v_n(k),v_h(k) Respectively at time point P of k_nThe speed of the host vehicle and the speed of the host vehicle; driving intention I_n(k) E { -1,0,1} represents time P of k_nThe vehicle has the intention of changing lanes left, keeping lanes and changing lanes right; meanwhile, a human driver makes a decision according to the state of surrounding vehicles and selects a smooth lane according to traffic flow information on a certain lane, so that the probability of traffic jam and pause is reduced; near forward and backward traffic, e.g. position P₈,P₉,....,P₁₂As another part of the environmental characterization; position P₈,P₉,....,P₁₂The state of (A) is determined by the average relative speed of the traffic at time k

Average headway

S_Pn(k)＝(F_n(k),ΔL_n(k),Δv_n(k),a_n(k),d_n(k),I_n(k))^T, (1)

then at time k, the environment characterization is expressed as equation (3),

for the task representation, in the roundabout, the driving control module finishes a set driving task in route navigation planning, so that the intelligent vehicle drives in the roundabout from a certain entrance and then drives out from another exit; then at time k the relative longitudinal distance deltal of the vehicle with respect to the exit_h(k) And relative lane Δ L_h(k) In task characterization; relative longitudinal distance Deltal of the vehicle relative to the exit_h(k) Represented by the formula (4),

wherein Δ α_h(k),D_E,D_h(k),α_E,α_h(k) The central angle of the vehicle at time k relative to the exit position E, the diameter of the exit position E and the lane where the vehicle at time k is located, the exit position E and the time of the vehicle at time kCarving the corresponding central angle of the position; relative lane Δ L_h(k)＝L_E-L_h(k) Wherein L is_E,L_h(k) Respectively as an exit position E and a lane where the vehicle k is located at the moment; then at time k, the task characterization (TR) is expressed as equation (5),

S_TR(k)＝(Δl_h(k),ΔL_h(k))^T. (5)

5. The method as claimed in claim 2, wherein the return function in step three is designed according to the return r for safety_sTasking reward r_tExecutive reporting r_eThree layers; time k security report r_s(k) According to the own lane L_h(k) And a target lane L_tar(k)＝L_h(k)+sign(T_y(k) Distance of the vehicle from the host vehicle, where sign (T)_y(k) Left and right lane changing actions selected by the vehicle at the time k; also including vehicles that will switch into both lanes in the future 5S; when the terminal is laterally offset T relative to the center line of the lane_y(k) When the vehicle speed is 0, the vehicle performs a lane keeping operation, and only the front part P of the vehicle₄Vehicle considerations of location; when the terminal is laterally offset T relative to the center line of the lane_y(k) If < 0, P is considered₁,P₂,P₃,P₄A four-position vehicle; suppose a position P at time k_nAt a distance d from the host vehicle in the lane direction_n(k) Then the security at this moment is reported r_s(k) Can be incrementally calculated as equation (6),

wherein d is_eIs a dangerous distance, d_cIs the collision distance;

time k mission-specific reportingr_t(k) The calculation is carried out from the following three aspects, the first aspect is the final completion situation of the intelligent vehicle for the driving task of going out of the roundabout, the incremental calculation is the formula (7),

wherein v is₁(k),v₄(k),TH₁(k),TH₄(k),d₁(k),d₄(k),

r_t(t)＝r_t(t)+k₁r_t,1+k₂r_t,2+k₃r_t,3+k₄r_t,4 (11)

wherein k is₁,k₂,k₃,k₄Are parameters respectively;

finally, the time k returns r (k) as the formula (13),

r(t)＝r_s(t)+r_t(t)+r_e(t) (13)