CN109598934B

CN109598934B - Rule and learning model-based method for enabling unmanned vehicle to drive away from high speed

Info

Publication number: CN109598934B
Application number: CN201811524283.4A
Authority: CN
Inventors: 杨殿阁; 曹重; 江昆; 封硕; 王思佳; 肖中阳; 谢诗超; 焦新宇
Original assignee: Beijing Chaoxing Future Technology Co ltd
Current assignee: Beijing Chaoxing Future Technology Co., Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2020-11-06
Anticipated expiration: 2038-12-13
Also published as: CN109598934A

Abstract

The invention relates to a method for enabling an unmanned vehicle to drive away from a high speed based on a fusion rule and a learning model, which comprises the following steps: the method comprises the following steps that during the driving process of an unmanned automobile on a highway, a down-ramp motive machine is generated according to a distance between a navigation system and a ramp, a rule model is utilized to try down the ramp, whether the success rate of the down-ramp is reduced or not is judged based on the rule decision model, if not, the rule model is adopted to perform decision action, otherwise, the next step is performed; the hybrid decision model can be driven by adopting a rule model when the hybrid decision model is far away from a ramp, and the action of a vehicle is adjusted by utilizing the reinforcement learning decision model according to the urgency of a lower ramp in the process of driving to the ramp. The method can improve the running efficiency and stability of the unmanned automobile in the ramp-off process, and realize the high-efficiency and high-stability ramp-off decision of the unmanned automobile under the environment vehicle conditions which are limited in sensing range and difficult to predict.

Description

Rule and learning model-based method for enabling unmanned vehicle to drive away from high speed

Technical Field

The invention relates to the technical field of unmanned automobile decision making, in particular to a rule and learning model based unmanned automobile driving-away high-speed method.

Background

The autonomous decision-making of the unmanned automobile is an important component in an unmanned automobile system, the expressway is an important application scene of the unmanned automobile, wherein the driving efficiency of the unmanned automobile is greatly influenced in the process that the unmanned automobile drives away from the expressway (a lower ramp), and the driving efficiency is obviously reduced when the unmanned automobile is switched to the right-most lane to wait for the lower ramp or misses the ramp. At present, the mainstream of the ramp-down method is to realize the ramp-down process by generating a lane change motivation at a proper place and utilizing the lane change behavior for a plurality of times. However, the lane changing operation can not be self-adjusted according to the urgency of the next ramp, the method has low success rate of driving away from the highway, the required preparation distance is long, and the efficiency of the unmanned automobile is reduced. On the other hand, because the perception range of the unmanned automobile is limited, and the behaviors of the drivers on the highway are full of uncertainty, the influence of a simple enumeration lane change rule on the success rate of a lower ramp is difficult to estimate, and all environmental states cannot be covered; the result generated by using the pure learning method is difficult to control, and the safety and stability of the vehicle running are affected.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a rule and learning model-based method for enabling an unmanned vehicle to drive away from a high speed, which can fully exert the decision-making capability of reinforcement learning on a definite target in a highly uncertain environment, and simultaneously give consideration to the safety and stability of the rule-based decision-making model, thereby improving the driving efficiency and stability of the unmanned vehicle in a ramp-off process, and realizing efficient and high-stability ramp-off decision-making of the unmanned vehicle in an environment vehicle condition with a limited sensing range and difficult prediction.

In order to achieve the purpose, the invention adopts the following technical scheme: a method for enabling an unmanned vehicle to drive away from a high speed based on a fusion rule and a learning model comprises the following steps: 1) in the running process of the unmanned automobile on the expressway, generating a down-ramp motive according to a distance of a navigation system in front of a ramp, firstly trying down the ramp by using a rule model, judging whether the success rate of the down-ramp is reduced or not based on the rule decision model, if not, adopting a rule model decision action, and if so, entering a step 2); using the starting point of the ramp opening as the origin point to turn the vehicleEstablishing a rectangular coordinate system, wherein the vehicle running direction is x, the vertical vehicle running direction is upward y, and the unit is m; the unmanned vehicle has a driving position with a speed acceleration of

Position, velocity and acceleration of the surrounding vehicle

1,2, …, n; in addition, the time interval of the regular model is Δ t, and the output of the regular model is the lateral and longitudinal accelerations that the unmanned vehicle is expected to maintain in the next Δ t time

Wherein

The speed and the acceleration of the vehicle in the longitudinal direction and the transverse direction are respectively, and t represents the current moment; 2) the hybrid decision model can be driven by adopting a rule model when the hybrid decision model is far away from a ramp, and the action of a vehicle is adjusted by utilizing the reinforcement learning decision model according to the urgency of a lower ramp in the process of driving to the ramp, so that the success rate of the lower ramp is improved.

Further, in step 1), the method for establishing the rule model includes the following steps: 1.1) the decision in the x direction needs to comprehensively analyze the expected running speed of the unmanned automobile, the distance between the unmanned automobile and the expected keeping distance of the front automobile and the dynamic characteristics of the unmanned automobile; 1.2) the vehicle decides in the y direction to decide whether to change lanes, the y direction decision is preset in the lane changing process, and after a lane changing motivation is generated, lane changing is started once a safe position is found, otherwise, the vehicle continues to keep driving in the lane; 1.3) generating a smooth curve by using a fifth-order polynomial under the boundary conditions of the current position, the speed, the next-moment target position and the next-moment target speed of the unmanned automobile, dispersing the smooth curve into a guide point with the frequency of 20Hz, and sending a guide signal to the unmanned automobile to generate a local track of the unmanned automobile.

Further, in step 1.1), the decision in the x direction includes the following steps: 1.1.1) desired speed of travel of the unmanned vehicle

Comprises the following steps:

wherein the content of the first and second substances,

is the maximum deceleration of the unmanned vehicle; Δ t is the time interval; d_fThe distance between the unmanned automobile and the vehicle in front of the lane where the unmanned automobile is located at the current moment;

the current unmanned vehicle speed;

is the current preceding vehicle speed;

is the maximum deceleration of the leading vehicle;

is the expected driving speed of the unmanned vehicle when the unmanned vehicle is normally driving;

1.1.2) desired acceleration of the unmanned vehicle to achieve desired speed of the unmanned vehicle

Comprises the following steps:

1.1.3) adjusting the final decision in the x direction according to the expected acceleration of the unmanned vehicle as:

wherein, a_minIs the maximum deceleration of the unmanned vehicle during normal running, a_maxThe maximum acceleration of the unmanned automobile during normal running is obtained.

Further, in the step 1.2), the y-direction decision includes the following steps: 1.2.1) determining whether the current lane change is safe by judging the motion states of vehicles in front of and behind the target lane, and starting the lane change when any one of the following conditions is met: (1) no vehicle exists in the observation ranges at the front and the rear of the target lane; (2) the target lane has a front vehicle, and the current vehicle speed meets the following conditions:

wherein d is_f,jIs the following distance between the unmanned vehicle and the preceding vehicle on the target lane;

the speed of the front vehicle on the target lane;

is the maximum deceleration of the vehicle ahead of the target lane; (3) the target lane exists in the rear vehicle, and the speed of the rear vehicle meets the following requirements:

wherein the content of the first and second substances,

is the maximum deceleration of the vehicle behind the target lane; d_r,jIs the following distance between the unmanned vehicle and the rear vehicle on the target lane;

the speed of the rear vehicle on the target lane;

is the maximum deceleration of the vehicle behind the target lane; (4) the target lane has a front vehicle and a rear vehicle at the same time, and the speeds of the front vehicle and the rear vehicle meet the requirements of the conditions (2) and (3); 1.2.2) when the unmanned vehicle decides to change lanes, the y-direction decision in the lane changing process is constant, and the lane changing decision is as follows: the whole lane changing process is set to be subjected to two time intervals 2 delta t, so that two processes of firstly accelerating and then decelerating are required to be carried out in the transverse direction; when a feasible lane change time is obtained, a y-direction decision is set

Comprises the following steps:

wherein w is the lane width; when the last lane change has been started, the next y-direction decision is set to:

at the moment, the driverless automobile completes one lane change; 1.2.3) calculating the speed and the position of the unmanned automobile at the next moment according to the decision-making action:

further, in the step 2), the method for establishing and training the hybrid decision model includes the following steps: 2.1) defining an environment state space, an action space and a reward mechanism; 2.2) the action output by the reinforcement learning model must meet the limits of safety and traffic regulation speed, so that the action of reinforcement learning is limited; 2.3) the hybrid decision model is trained in a highly uncertain simulation environment through a constantly repeated off-ramp process.

Further, in the step 2.1), the environment state space, the action space and the reward mechanism are defined as follows: 2.1.1) the environmental state is constructed by the position of the vehicle in the environment, the driving state, the driving strategy and the distance between ramps, and is defined as follows:

wherein the coordinate system is the same as the regular model coordinate system, and l ═ x_eI is the distance between the current vehicle and the ramp; q. q.s_eThe driving state of the unmanned automobile is set; q. q.s_iAs ambient vehicle driving conditions, theta_iAn environmental vehicle driving strategy; s represents an environmental state;

representing an environment state space formed by all environment states; in the running state, the difference of the x-direction coordinates of any one environmental vehicle and the unmanned automobile is less than 50m, namely

|x_e-x_i|≤50m；

2.1.2) motion definition is defined by vehicle acceleration in x and y directions, and all selectable motion spaces are as follows:

wherein, a_brakeMaximum deceleration for the unmanned vehicle; a is_ruleAn action generated for the rule model;

the y-direction action being taken when the vehicle starts changing lanes

And the next moment adopts

Realizing lane change; each action can calculate the position and the speed of the unmanned vehicle at the next moment, a fifth-order polynomial is constructed by taking the position and the speed as boundary conditions, and the fifth-order polynomial is dispersed into a local guide track of 20Hz to guide the unmanned vehicle to complete the action;

2.1.3) the mixed decision model reward mechanism comprises two parts, namely a lower ramp completion reward and a regular model enlightenment reward, and the setting method is as follows: lower ramp completion reward r₁Comprises the following steps:

regular model elicitation reward r₂Comprises the following steps:

the final action gets the reward of:

r＝r₁+r₂。

further, in the step 2.2), the limiting method comprises the following steps: 2.2.1) for satisfying the security demand that current lane traveled, the distance that needs to guarantee unmanned vehicle and its front truck can satisfy: when the front vehicle decelerates at the maximum deceleration until the vehicle stops, the unmanned vehicle can stop without a collision by decelerating the vehicle with the maximum deceleration, and therefore the vehicle speed v of the unmanned vehicle is limited to:

when a certain item in the action space can cause the speed of the next moment not to meet the constraint, deleting the action from the action space; when there is no front vehicle, there is no safety speed limit; when the lane is changed, when the states of the front vehicle, the rear vehicle and the unmanned vehicle on the target lane do not meet the lane changing condition, the lane changing action is deleted from the action space, and the generated action can ensure the driving safety of the vehicle; 2.2.2) to meet the speed requirement of the traffic rules, when a certain action in the action space causes the vehicle speed to not meet the speed limit of the traffic rules at the next moment, the action is deleted from the action space.

Further, in the step 2.3), the training method is implemented by using a particle filtering and monte carlo tree searching method, and specifically includes the following steps: 2.3.1) fitting the vehicle driving strategy by adopting IDM and MOBIL and fitting by adopting a particle filtering method; wherein IDM is an intelligent driver model, and MOBIL is a total braking minimum lane change model; 2.3.2) the hybrid decision model uses a reinforcement learning model, and because the state space is continuous and the dimensionality is high, the reinforcement learning model is trained by adopting a Monte Carlo tree searching method; 2.3.3) repeating the above process for several times to complete the training.

Further, in the step 2.3.1), the step of fitting by using a particle filtering method includes: (1) establishing a particle library for each new environmental vehicle; (2) randomly selecting 50 groups of driving strategy model parameters as initial particles; (3) transferring all the environmental vehicles to the next moment state according to a driving model formed by 50 groups of particles; (4) analyzing the difference between 50 groups of particles and a real driving model of the environmental vehicle according to the actually observed next environmental vehicle state, and intensively resampling the new 50 groups of particles to the vicinity of the particles close to the real driving model; (5) this process is repeated and at each moment the particle closest to the real driving model is selected as the driving model input state space.

Further, in the step 2.3.2), the enhanced learning model is trained by adopting a monte carlo tree search method, and the specific steps are as follows: (1) each state has a plurality of alternative actions, the actions meet the requirements of safety and traffic rules, and the value of each action of the initialized Monte Carlo tree is the same; (2) in each simulation process, when all action values are the same, the actions generated by the rule model are preferentially adopted for simulation; (3) if the action values are different, the action selected is as follows:

wherein Q (s, a) is a cost function of the action a to the environmental state s; n (s, a) is the number of times action a was taken at ambient state s in the past simulation process; n(s) ═ Σ_aN (s, a); c is exploring new action intention constant;

(4) after each simulation is finished, the mapping of the value between the state and the action in the process is adjusted according to the finally obtained reward, and the value function Q (s, a) is updated.

Due to the adoption of the technical scheme, the invention has the following advantages: 1. the unmanned vehicle driving strategy can be adjusted according to the urgency of the off-ramp, and the off-ramp success rate is improved. 2. The unmanned driving decision model based on the rules is preferentially adopted, and the decision model based on reinforcement learning is used for adjustment when the rule model is possibly invalid, so that the driving stability is improved. 3. The method can finish the ramp-off process under the conditions of limited sensing range and uncertain environmental vehicle behavior height, and the condition is similar to a real traffic scene, thereby ensuring the practicability of the method. 4. The rule model and the reinforcement learning model both meet the safety requirement, and the safety of the unmanned automobile is ensured. 5. The unmanned decision-making model generated by the invention can output a smooth curve with the frequency of 20Hz, and meets the requirements of a vehicle dynamics model and vehicle track tracking.

In summary, based on the rule-based unmanned vehicle decision model, the unmanned vehicle is trained for the problem of the off-ramp by means of reinforcement learning, so that the unmanned vehicle can adjust the driving strategy according to the urgency of the off-ramp, and the method is one of effective ways for improving the form efficiency and stability of the unmanned vehicle, and further promotes the development of the unmanned vehicle.

Drawings

Fig. 1 is a schematic view of a framework of a driverless automobile down-ramp decision model (hybrid down-ramp decision model) based on rules and reinforcement learning;

FIG. 2 is a schematic diagram of an inter-action-decision linkage method;

FIG. 3 is an algorithmic schematic of a hybrid down-ramp decision model;

FIG. 4 is a schematic diagram of a reinforcement learning environment state space;

FIG. 5 is a schematic illustration of the impact of the reward mechanism on the model;

FIG. 6 is a schematic diagram of a Monte Carlo tree search method.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

As shown in FIG. 1, the invention provides a method for enabling an unmanned vehicle to drive away from a high speed based on a fusion rule and a learning model, which comprises the following steps:

1) in the running process of an unmanned automobile on a highway, generating a down-ramp motivation according to a distance between a navigation system and a ramp, firstly trying down the ramp by using a rule-based decision model (namely a rule model), judging whether the down-ramp success rate of the rule-based decision model is reduced or not, if not, adopting a rule model decision action, and if so, entering a step 2);

for convenience of description, a rectangular coordinate system is first established with the starting point of the ramp opening as the origin, the vehicle traveling direction as x, and the vertical vehicle traveling direction as y, in m. The driverless vehicle driving position, the speed acceleration can be expressed as

The position, velocity and acceleration of the surrounding vehicle can be expressed as

i is 1,2, …, n. The time interval of the regular model is Δ t (═ 0.75s), and the output of the regular model is atThe lateral and longitudinal acceleration that the drone vehicle is expected to maintain during the next delta t time, i.e. the time

Wherein subscripts e, i represent the unmanned vehicle and the ambient vehicle, respectively, x, y represent the position of the vehicle in the coordinate system, the velocity and acceleration of the vehicle in the longitudinal and transverse directions, respectively, and t represents the current time.

The establishment method of the rule model comprises the following steps:

1.1) the decision in the x direction (longitudinal direction) needs to comprehensively analyze the expected running speed of the unmanned automobile, the distance to the expected keeping of the front automobile and the dynamic characteristics of the unmanned automobile; the method comprises the following specific steps:

1.1.1) desired speed of travel of the unmanned vehicle

Comprises the following steps:

wherein the content of the first and second substances,

the current unmanned vehicle speed;

is the current preceding vehicle speed;

is the largest of the front vehicleA deceleration rate;

Comprises the following steps:

1.1.3) the final desired speed in x direction may not be achieved by one decision adjustment due to dynamics limitations and driving comfort requirements, so the final decision in x direction is adjusted to:

wherein, a_minIs the maximum deceleration of the unmanned vehicle during normal running, a_maxThe two values are set to be the maximum acceleration of the unmanned automobile during normal running, and the two values are set to be 0.1 times of the maximum deceleration and the maximum acceleration of the vehicle dynamics.

1.2) making a decision of the vehicle in the y direction (transverse direction) to determine whether to change the lane, wherein the transverse decision is preset in the lane changing process, and after a lane changing motivation is generated due to the fact that a regular model is used for a next ramp, the lane changing is started once a safe position is found, otherwise, the vehicle continues to keep driving the lane; the specific settings are as follows:

1.2.1) determining whether the current lane change is safe by judging the motion states of vehicles in front of and behind the target lane, and starting the lane change when any one of the following conditions is met:

(1) no vehicle exists in the observation ranges at the front and the rear of the target lane;

(2) the target lane has a front vehicle, and the speed of the current self vehicle (i.e. the unmanned vehicle) meets the following requirements:

the speed of the front vehicle on the target lane;

is the maximum deceleration of the vehicle ahead of the target lane;

(3) the target lane exists in the rear vehicle, and the speed of the rear vehicle meets the following requirements:

wherein the content of the first and second substances,

the speed of the rear vehicle on the target lane;

is the maximum deceleration of the vehicle behind the target lane;

(4) the target lane has a front vehicle and a rear vehicle at the same time, and the speeds of the front vehicle and the rear vehicle meet the requirements of the conditions (2) and (3);

1.2.2) when the unmanned automobile decides to change lanes, the decision in the y direction is constant in the lane changing process;

the lane change decision is as follows:

the entire lane change procedure setup requires two time intervals, i.e., the lane change procedure time is 2 Δ t (═ 1.5s), and therefore two procedures of acceleration before deceleration in the lateral direction are required. Obtaining feasible lane change time after step 1.2.1)Then, the y-direction decision is set

Comprises the following steps:

wherein w is the lane width;

when the last lane change has been started, the next y-direction decision is set to:

at the moment, the driverless automobile completes one lane change.

1.2.3) calculating the speed and the position of the unmanned automobile at the next moment according to the decision-making action:

1.3) generating a smooth curve by using a fifth-order polynomial with the current position, the speed, the target position at the next moment and the target speed at the next moment of the unmanned automobile as boundary conditions, dispersing the smooth curve into a guide point with the frequency of 20Hz, and sending a guide signal to the unmanned automobile to generate a local track of the unmanned automobile, wherein the local track is shown in FIG. 2.

The above is a rule-based decision model for the off-ramp of the unmanned vehicle, which considers the problems of safety, driving comfort and the like, can generate a smooth vehicle guiding track to achieve the purpose of the off-ramp of the unmanned vehicle, but the local lane change decision cannot respond to the urgency of the off-ramp, thereby affecting the passing efficiency of the unmanned vehicle.

2) As shown in fig. 3, a decision model (i.e., a hybrid decision model) based on reinforcement learning and a decision rule model for reinforcement learning are established based on a framework for reinforcement learning, and a training method thereof, wherein the hybrid decision model can be driven by adopting a rule model when the hybrid decision model is far away from a ramp, and the action of a vehicle is adjusted by using the reinforcement learning decision model according to the urgency of a lower ramp in the process of driving to the ramp, so that the success rate of the lower ramp is improved;

the establishment and training method of the hybrid decision model comprises the following steps:

2.1) the purpose of reinforcement learning is to establish a mapping model from the environment state to the action, the model is continuously trained by using the rewards obtained by different actions, and finally the actions generated by the model can obtain the rewards to the maximum extent. Therefore, the environment state space, the action space and the reward mechanism need to be defined first.

2.1.1) the environmental state in fig. 3 is constructed from the vehicle position, driving state, driving strategy, and distance from the ramp in the environment, as shown in fig. 4, which is defined as follows:

representing an environment state space formed by all environment states;

the driving strategy of the environmental vehicle in the driving state cannot be directly observed and needs to be continuously estimated in the driving process of the vehicle. In addition, due to the limitation of the observation range, the unmanned automobile can only observe the environmental vehicles within the range of 50m in front and at the back, so that the difference of the x-direction coordinates between any one environmental vehicle and the unmanned automobile is less than 50m, namely

|x_e-x_i|≤50m；

the lateral movement being in accordance with a regular model, i.e. when the vehicle starts changing lanes

And the next moment adopts

And realizing channel change.

Each action can calculate the position and the speed of the unmanned vehicle at the next moment, a fifth-order polynomial is constructed by taking the position and the speed as boundary conditions, and the fifth-order polynomial is dispersed into a local guide track of 20Hz to guide the unmanned vehicle to complete the action, as shown in FIG. 2.

2.1.3) the mixed decision model reward mechanism comprises two parts, namely a lower ramp completion reward and a regular model enlightenment reward, and the setting method is as follows:

lower ramp completion reward r₁Comprises the following steps:

regular model elicitation reward r₂Comprises the following steps:

the final action gets the reward of:

r＝r₁+r₂。

when the unmanned automobile is far away from the ramp port, the influence of vehicle decision on the lower ramp is small, so a rule model needs to be adopted, and the rule model elicitation reward mechanism can help to maintain the unmanned automobile to adopt the rule model. As shown in FIG. 5, the value f of the action generated by the rule model when the unmanned vehicle is away from the ramp_dIs promoted to f due to the inspiring reward_d' significantly greater than the value of the other actions, so the vehicle will always take the action of the rule model;

when the unmanned automobile approaches the ramp junction, the influence of the action on the success rate of the lower ramp is enhanced, namely the probability of obtaining the completion of the reward of the lower ramp is increased. When there is an action with a value higher than the value f of the action of the rule model after being promoted_dWhen the action is more beneficial to getting off the ramp than the action of the regular model, the unmanned automobile adopts the reinforcement learning decision model to get off the ramp;

by the mode, the mixed decision model can adopt a regular model to drive when the mixed decision model is far away from the ramp, and the action of the vehicle is adjusted by utilizing the reinforcement learning decision model according to the urgency of the lower ramp in the process of driving to the ramp, so that the success rate of the lower ramp is improved.

2.2) the action output by the reinforcement learning model must meet the limits of safety and traffic regulation speed, so the reinforcement learning action needs to be limited;

the limiting method comprises the following steps:

2.2.1) for satisfying the security demand that current lane traveled, the distance that needs to guarantee unmanned vehicle and its front truck can satisfy: when the front vehicle decelerates at the maximum deceleration until the vehicle stops, the unmanned vehicle can stop without a collision by decelerating the vehicle with the maximum deceleration, and therefore the vehicle speed v of the unmanned vehicle is limited to:

when an item in the action space of step 2.1.2) would cause the speed at the next moment not to satisfy the constraint, the action is deleted from the action space. When there is no front vehicle, there is no safety speed limit. And in lane changing, when the states of the front vehicle, the rear vehicle and the unmanned vehicle on the target lane do not meet the lane changing condition in the step 1.2.1), deleting the lane changing action from the action space. By the mode, the generated action can ensure the running safety of the vehicle.

2.2.2) to meet the speed requirement of the traffic rules, when a certain action in the action space causes the vehicle speed to not meet the speed limit of the traffic rules at the next moment, the action is deleted from the action space. Therefore, the action generated by the reinforcement learning can ensure that the speed of the unmanned automobile always meets the traffic regulation limit.

2.3) the mixed decision model is trained in a highly uncertain simulation environment (the environmental vehicles have different driving strategies, and the next-vehicle action of the same driving strategy has randomness) through a continuously repeated off-ramp process;

the training method is realized by utilizing a particle filtering and Monte Carlo tree searching method, and comprises the following specific steps:

2.3.1) since in the state space the driving strategy cannot be observed directly, it is necessary to supplement the strategy by means of an online fit. In this embodiment, an IDM (Intelligent Driver Model) and a MOBIL (Minimizing over all Braking Induced by LaneChanges) are adopted to fit a vehicle driving strategy, and the two models have 8 parameters in total and need to be fitted according to the vehicle driving performance. The invention adopts a particle filtering method to carry out fitting, and the steps are as follows:

(1) establishing a particle library for each new environmental vehicle;

(2) randomly selecting 50 groups of driving strategy model parameters as initial particles;

(3) transferring all the environmental vehicles to the next moment state according to a driving model formed by 50 groups of particles;

(4) analyzing the difference between 50 groups of particles and a real driving model of the environmental vehicle according to the actually observed next environmental vehicle state, and intensively resampling the new 50 groups of particles to the vicinity of the particles close to the real driving model;

(5) this process is repeated and at each moment the particle closest to the real driving model is selected as the driving model input state space.

The method adopts particle filtering to obtain a driving model (namely an environmental vehicle driving strategy) theta with maximum likelihood_iAnd sending the driving model into an enhanced learning model as a part of the environmental state for training. At this time, all the environmental states required for reinforcement learning have been completely acquired.

2.3.2) the hybrid decision model in the invention uses reinforcement learning model, because the state space is continuous and the dimensionality is high, the reinforcement learning model is trained by adopting a Monte Carlo tree searching method, and the concrete steps are as follows:

(1) as shown in fig. 6, there are several alternative actions for each state, and these actions meet the requirements for safety and traffic regulations in step 2.2). Each action value of the initialized Monte Carlo tree is the same;

(2) in each simulation process, when all action values are the same, the actions generated by the rule model are preferentially adopted for simulation;

(3) if the action values are different, the action selected is as follows:

wherein Q (s, a) is a cost function of the action a to the environmental state s; n (s, a) is the number of times action a was taken at ambient state s in the past simulation process; n(s) ═ Σ_aN (s, a); c is the number of constants for the intention to search for a new action, and is preferably 5 in the present embodiment;

(4) after each simulation is finished (the unmanned vehicle enters a ramp or misses a ramp), the mapping between the state and the value of the action in the process is adjusted according to the finally obtained reward, and the value function Q (s, a) is updated.

2.3.3) repeating the above process for several times to complete the training.

In conclusion, the present invention performs the off-ramp test in a highly random simulation environment, and the unmanned vehicle is on the leftmost lane of a four-lane highway and prepares for the off-ramp. In order to compare with the regular model, the lane changing is not allowed before 1000m,1500m and 2000m respectively, then the regular model obtains a down-ramp engine, the ramp is turned 500 times by adopting the method in the step 1), the mixed decision model is turned 500 times under the same condition, and the result is shown in table 1. The result shows that the mixed decision model can effectively improve the success rate of the off-ramp by 5-50%, and ensure the safety of vehicles and meet the constraint of traffic regulations in the whole process.

TABLE 1 comparison of results for rule-based model and hybrid down-ramp model

The above embodiments are only for illustrating the present invention, and the steps may be changed, and on the basis of the technical solution of the present invention, the modification and equivalent changes of the individual steps according to the principle of the present invention should not be excluded from the protection scope of the present invention.

Claims

1. A method for enabling an unmanned vehicle to drive away from a high speed based on a fusion rule and a learning model is characterized by comprising the following steps:

1) in the running process of the unmanned automobile on the expressway, generating a down-ramp motive according to a distance of a navigation system in front of a ramp, firstly trying down the ramp by using a rule model, judging whether the success rate of the down-ramp is reduced or not based on the rule decision model, if not, adopting a rule model decision action, and if so, entering a step 2);

establishing a rectangular coordinate system by taking the starting point of the ramp opening as the origin, the vehicle running direction as x, the upward direction vertical to the vehicle running direction as y and the unit as m; the unmanned vehicle has a driving position and a speed acceleration of (x)_e，y_e，

) Position, velocity and acceleration of the surrounding vehicle are (x)_i，y_i，

) I ═ 1,2, …, n; in addition, the time interval of the regular model is Δ t, and the output of the regular model is the lateral and longitudinal accelerations that the unmanned vehicle is expected to maintain in the next Δ t time

Wherein

The speed and the acceleration of the vehicle in the longitudinal direction and the transverse direction are respectively, and t represents the current moment;

2) the hybrid decision model can be driven by adopting a rule model when the hybrid decision model is far away from a ramp, and the action of a vehicle is adjusted by utilizing the reinforcement learning decision model according to the urgency of a lower ramp in the process of driving to the ramp, so that the success rate of the lower ramp is improved.

2. A method of navigating away from high speed in an unmanned vehicle according to claim 1, wherein: in the step 1), the method for establishing the rule model comprises the following steps:

1.1) the decision in the x direction needs to comprehensively analyze the expected running speed of the unmanned automobile, the distance between the unmanned automobile and the expected keeping distance of the front automobile and the dynamic characteristics of the unmanned automobile;

1.2) the vehicle decides in the y direction to decide whether to change lanes, the y direction decision is preset in the lane changing process, and after a lane changing motivation is generated, lane changing is started once a safe position is found, otherwise, the vehicle continues to keep driving in the lane;

1.3) generating a smooth curve by using a fifth-order polynomial under the boundary conditions of the current position, the speed, the next-moment target position and the next-moment target speed of the unmanned automobile, dispersing the smooth curve into a guide point with the frequency of 20Hz, and sending a guide signal to the unmanned automobile to generate a local track of the unmanned automobile.

3. A method for driving away from high speed in an unmanned vehicle according to claim 2, wherein: in step 1.1), the decision in the x direction includes the following steps:

1.1.1) desired speed of travel of the unmanned vehicle

Comprises the following steps:

if a leading vehicle exists;

wherein the content of the first and second substances,

the current unmanned vehicle speed;

is the current preceding vehicle speed;

is the maximum deceleration of the leading vehicle;

Comprises the following steps:

4. A method for enabling an unmanned vehicle to drive away from a high speed as claimed in claim 3, wherein: in step 1.2), the y-direction decision includes the following steps:

(2) the target lane has a front vehicle, and the current vehicle speed meets the following conditions:

is a target vehicleThe speed of the vehicle ahead on the road;

is the maximum deceleration of the vehicle ahead of the target lane;

wherein the content of the first and second substances,

the speed of the rear vehicle on the target lane;

is the maximum deceleration of the vehicle behind the target lane;

1.2.2) when the unmanned vehicle decides to change lanes, the y-direction decision in the lane changing process is constant, and the lane changing decision is as follows:

the whole lane changing process is set to be subjected to two time intervals 2 delta t, so that two processes of firstly accelerating and then decelerating are required to be carried out in the transverse direction; when a feasible lane change time is obtained, a y-direction decision is set

Comprises the following steps:

wherein w is the lane width;

at the moment, the driverless automobile completes one lane change;

5. the method for enabling an unmanned vehicle to drive away from a high speed according to any one of claims 3 or 4, wherein: in the step 2), the method for establishing and training the hybrid decision model comprises the following steps:

2.1) defining an environment state space, an action space and a reward mechanism;

2.2) the action output by the reinforcement learning model must meet the limits of safety and traffic regulation speed, so that the action of reinforcement learning is limited;

2.3) the hybrid decision model is trained in a highly uncertain simulation environment through a constantly repeated off-ramp process.

6. The method for enabling an unmanned vehicle to drive away from a high speed as set forth in claim 5, wherein: in the step 2.1), the environment state space, the action space and the reward mechanism are defined as follows:

2.1.1) the environmental state is constructed by the position of the vehicle in the environment, the driving state, the driving strategy and the distance between ramps, and is defined as follows:

representing an environment state space formed by all environment states;

in the running state, the difference of the x-direction coordinates of any one environmental vehicle and the unmanned automobile is less than 50m, namely

|x_e-x_i|≤50m；

the y-direction action being taken when the vehicle starts changing lanes

And the next moment adopts

Realizing lane change;

each action can calculate the position and the speed of the unmanned vehicle at the next moment, a fifth-order polynomial is constructed by taking the position and the speed as boundary conditions, and the fifth-order polynomial is dispersed into a local guide track of 20Hz to guide the unmanned vehicle to complete the action;

lower ramp completion reward r₁Comprises the following steps:

regular model elicitation reward r₂Comprises the following steps:

the final action gets the reward of:

r＝r₁+r₂。

7. the method for enabling an unmanned vehicle to drive away from a high speed as set forth in claim 5, wherein: in the step 2.2), the limiting method comprises the following steps:

when a certain item in the action space can cause the speed of the next moment not to meet the constraint, deleting the action from the action space; when there is no front vehicle, there is no safety speed limit; when the lane is changed, when the states of the front vehicle, the rear vehicle and the unmanned vehicle on the target lane do not meet the lane changing condition, the lane changing action is deleted from the action space, and the generated action can ensure the driving safety of the vehicle;

2.2.2) to meet the speed requirement of the traffic rules, when a certain action in the action space causes the vehicle speed to not meet the speed limit of the traffic rules at the next moment, the action is deleted from the action space.

8. The method for enabling an unmanned vehicle to drive away from a high speed as set forth in claim 5, wherein: in the step 2.3), the training method is realized by using a particle filtering and Monte Carlo tree searching method, and the specific steps are as follows:

2.3.1) fitting the vehicle driving strategy by adopting IDM and MOBIL and fitting by adopting a particle filtering method; wherein IDM is an intelligent driver model, and MOBIL is a total braking minimum lane change model;

2.3.2) the hybrid decision model uses a reinforcement learning model, and because the state space is continuous and the dimensionality is high, the reinforcement learning model is trained by adopting a Monte Carlo tree searching method;

2.3.3) repeating the above process for several times to complete the training.

9. The method for enabling an unmanned vehicle to drive away from a high speed as recited in claim 8, wherein: in the step 2.3.1), the step of fitting by using a particle filtering method comprises the following steps:

(1) establishing a particle library for each new environmental vehicle;

10. The method for enabling an unmanned vehicle to drive away from a high speed as recited in claim 8, wherein: in the step 2.3.2), a Monte Carlo tree searching method is adopted to train the reinforcement learning model, and the specific steps are as follows:

(1) each state has a plurality of alternative actions, the actions meet the requirements of safety and traffic rules, and the value of each action of the initialized Monte Carlo tree is the same;

(3) if the action values are different, the action selected is as follows:

wherein Q (s, a) is a cost function of the action a to the environmental state s; n (s, a) is the number of times action a was taken at ambient state s in the past simulation process;

c is exploring new action intention constant;