CN111413974B

CN111413974B - Automobile automatic driving motion planning method and system based on learning sampling type

Info

Publication number: CN111413974B
Application number: CN202010236474.1A
Authority: CN
Inventors: 江昆; 周伟韬; 杨殿阁; 严瑞东; 黄晋
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-03-30
Filing date: 2020-03-30
Publication date: 2021-03-30
Anticipated expiration: 2040-03-30
Also published as: CN111413974A

Abstract

The invention relates to an automobile automatic driving movement planning method and system based on a learning sampling type, which comprises the following steps: establishing a vehicle kinematic model; initializing an Open table and a Closed table; calculating the evaluation value of each forward simulation track, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, and storing the initial optimal track into a Closed table; screening a non-collision forward simulation track by using a collision detection method, and storing the non-collision forward simulation track into an Open table; calculating the evaluation value of each forward simulation track, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, and storing the candidate optimal track in a Closed table; ending the motion planning process when the candidate optimal track end point is within the end point range required by the motion planning; and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.

Description

Automobile automatic driving motion planning method and system based on learning sampling type

Technical Field

The invention relates to the field of intelligent vehicles, in particular to an automobile automatic driving motion planning method and system based on a learning sampling mode.

Background

In recent years, artificial intelligence technology is gradually beginning to be commercialized in the fields of intelligent transportation and vehicles, and intelligent networked vehicles gradually come into the visual field of people. Generally, an automatic driving software system of an intelligent vehicle can be divided into four modules of perception, positioning, decision and control. The motion planning is the most important part in the decision module and determines the decision quality of the intelligent vehicle. Since the control module generally only performs the task of motion/trajectory tracking, the outcome of the motion planning is crucial to the final driving behavior of the vehicle.

Existing motion planning methods can be broadly divided into sampling-based methods, optimization-based methods, and end-to-end learning-based methods. The method based on end-to-end learning establishes the mapping from sensor data to driving actions, but engineering practice and optimization are difficult to carry out due to the black box characteristic of the learning method; the optimization-based method generally depends on lane lines or other prior road information, and the solving time is often difficult to ensure; the sampling-based method is widely applied to the motion planning of automatic driving due to the characteristics that the solving speed is high and the method can adapt to various environmental characteristics.

The sampling-based method generally selects a sampling trajectory or a motion state through a cost function, which is essentially based on the selection of an optimal trajectory/motion state by an artificially set rule, but the artificially set cost function is difficult to adapt to a complex and variable real environment.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide a method and a system for planning an automatic driving motion of an automobile based on a learning sampling mode, which can better consider uncertainty and randomness in the environment and can improve the safety and robustness of the automatic driving motion planning.

In order to achieve the purpose, the invention adopts the following technical scheme: an automobile automatic driving motion planning method based on a learning sampling mode comprises the following steps: s1: establishing a vehicle kinematic model according to vehicle parameters; s2: initializing a storage table of a heuristic motion planning method: an Open table and a Closed table; s3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating an evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning; s4: generating a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point; s5: repeating the step S4 until the candidate optimal trajectory end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process; s6: and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.

Further, in step S3, the forward simulation trajectory is generated by using the steering wheel angular acceleration and the accelerator/brake input of the vehicle.

Further, the forward simulation trajectory generation method comprises: determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtain

And updating the positions x and y of the vehicle and the direction theta of the vehicle, continuously iterating, updating the track of the vehicle, and finally obtaining the forward simulation track.

Further, in step S3, before the forward simulation trajectory is stored in the Open table, collision detection is performed on the forward simulation trajectory, it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result, and if the trajectory collides, the trajectory is directly deleted, and the trajectory that does not collide is stored in the Open table.

Further, in the step S3, a reinforcement learning trajectory is selected based on a reinforcement learning method, specifically, a reinforcement learning method based on a Q learning algorithm.

Further, the method is based on intensive chemistryThe reinforcement learning track selection method of the learning method comprises the following steps: s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R; s32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)_t,a_t) Wherein s is_tIs in a state of time t, a_tIs the action at time t; s33, state S from current time t_tStarting with a different action a_tGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time t_tCalculating the steering wheel angle sum theta expected at the next time t + delta t_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)_t,a_t) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.

Further, the selection method of the initial optimal trajectory comprises the following steps: a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into a vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:

F＝g+h

wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point;

selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a rule optimal track, and estimating the Q value of the rule optimal track by using a reinforcement learning Q network; and comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.

Further, in the step S4, the current state S is set_tStarting with a different action a_tGenerating a trajectory through the steering wheel angle and theta at the current time_tCalculating the steering wheel angle sum theta expected at the next moment_t+Δt＝θ_t+ γ × Δ t, inputting a desired steering wheel angle and a desired longitudinal acceleration into the vehicle model, and generating a trajectory; the evaluation values of these forward simulated trajectories are:

F＝g+h

wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.

An automotive autopilot motion planning system based on learning sampling, comprising: the system comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning ending module and a final planning track forming module; the vehicle kinematic model building module builds a vehicle kinematic model according to vehicle parameters; the storage table initialization module is used for initializing a storage table of a heuristic motion planning method: an Open table and a Closed table; the initial segment optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning; the candidate optimal track selection module generates a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point; the motion planning ending module ends the motion planning process until the candidate optimal trajectory end point is within the end point range required by the motion planning; and the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track.

Further, in the initial segment optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method includes the following steps: s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R; s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)_t,a_t) Wherein s is_tIs in a state of time t, a_tIs the action at time t; s3, state S from current time t_tStarting with a different action a_tGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time t_tCalculating the steering wheel angle sum theta expected at the next time t + delta t_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)_t,a_t) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.

Due to the adoption of the technical scheme, the invention has the following advantages: the method combines the vehicle kinematics model, and replaces the traditional rule-based candidate track selection mode by adding the reinforcement learning module in the sampling type motion planning method based on the vehicle kinematics model, so that the track selection is more reasonable, the uncertainty and the randomness in the environment can be better considered, and the safety and the robustness of the automatic driving motion planning can be improved.

Drawings

FIG. 1 is a schematic overall flow diagram of the process of the present invention;

FIG. 2 is a simplified vehicle dynamics model schematic of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

As shown in fig. 1, the present invention provides a learning sampling based vehicle automatic driving motion planning method, which combines reinforcement learning, receives automatic driving perception results, i.e. information such as the position and speed of an obstacle in the surrounding environment, and outputs a series of actions that an intelligent vehicle can perform under the constraint of the starting point and the ending point.

The method specifically comprises the following steps:

s1: establishing a vehicle kinematic model according to vehicle parameters;

the vehicle parameters include vehicle size, minimum turning radius, maximum acceleration/deceleration, and the like.

S2: initializing a storage table of a heuristic motion planning method: open and Closed tables.

S3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating the evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value from the forward simulation tracks as a regular optimal track; and performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track. And selecting the initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking the end point of the initial optimal track as the starting point of subsequent planning.

S4: based on a heuristic planning method, a series of forward simulation tracks are generated from a planning starting point: firstly, screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; and calculating the evaluation value of each forward simulation track through a heuristic function. And then selecting the forward simulation track with the highest evaluation value from the Open table as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point.

S5: and repeating the step S4 until the candidate optimal track end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process.

S6: and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.

In step S1, the vehicle kinematics model may be the simplest two-degree-of-freedom bicycle model, and as shown in fig. 2, the kinematics model may be described as:

establishing an xOy coordinate system, (x and y) are coordinates of the vehicle under the coordinate system, L is a distance between a front axle and a rear axle of a wheel, theta is a vehicle orientation angle, delta is a steering wheel corner, gamma is a steering wheel angular acceleration of the vehicle, and a is a steering wheel longitudinal acceleration of the vehicle. v represents the speed of the vehicle and is,

respectively, the derivative values of the respective variables.

In step S1, other vehicle kinematics or dynamics models may be established, and only the model needs to be input as the vehicle accelerator/brake opening, the vehicle steering wheel angle, and the output as the vehicle forward simulation trajectory.

In step S2, the Open table and the Closed table have the same format, and store the trajectory generated in step S3 and the evaluation value corresponding to the trajectory, and the evaluation value is a digital value larger than 0.

In step S3, the forward simulation trajectory is generated by using the vehicle steering wheel angular acceleration and the accelerator/brake input, and the specific generation method is as follows:

determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtain

And updating the vehicle position x, y and the vehicle direction theta is realized, and the vehicle track can be updated by continuous iteration, so that the forward simulation track is finally obtained.

For example, γ and a are the steering wheel angular accelerations of the vehicle, respectively, as shown in the vehicle kinematics model in step S1Degrees and longitudinal acceleration. The vehicle longitudinal acceleration may be derived directly from throttle/brake input. In the formula, x, y, theta, delta and v are known numbers, and simultaneous solution can be obtained

Through the following formula, the updating of the vehicle position x, y and the vehicle direction theta can be realized, and the updating of the vehicle track can be realized through continuous iteration.

In step S3, before storing the forward simulation trajectory in the Open table, collision detection is performed on the forward simulation trajectory, and it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result. And if the tracks collide, directly deleting the tracks, and storing the non-colliding tracks into an Open table.

In step S3, selecting a reinforcement learning trajectory based on a reinforcement learning method, specifically, reinforcement learning based on a Q learning algorithm;

the reinforcement learning track selection method based on the reinforcement learning method comprises the following steps:

s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R:

the state space includes information of 3 obstacles closest to the host vehicle in the environment and the current state of the trajectory. Wherein the obstacle information includes a position and a speed (x) of the obstacle_n,y_n,v_xn,v_yn)，x_n,y_nCoordinates in the xOy coordinate system, v, of the nth obstacle, respectively_xn,v_ynThe speed of the obstacle in the x and y directions in the xOy coordinate system, respectively.

The track current state comprises the coordinates x of the current point_t,y_tDesired velocity v_xt,v_ytDesired track state time t, vehicle throttle/brake opening for track state, vehicle steering wheel angle. The action space comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment;

in the off-line training process, the reward function R of the reinforcement learning method needs to be determined:

R＝A-B

wherein, A is the reward of successfully reaching the terminal when the vehicle executes the final calculated track, B is the collision penalty, and when the vehicle executes the final calculated track, if the collision happens, the penalty is obtained.

S32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)_t,a_t) Wherein s is_tIs in a state of time t, a_tThe action at time t.

S33, state S from current time t_tStarting with a different action a_tA trajectory may be generated. As can be seen from step S31, the action includes the steering wheel rotation acceleration γ and the longitudinal acceleration a of the vehicle at the next time. Steering wheel angle and theta at the current time t_tThe steering wheel angle and θ desired at the next time t + Δ t can be calculated_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, i.e., the trajectory can be generated. The Q value at this time is: q(s)_t,a_t) As the Q value of the generated trajectory. And taking the track generated by the action with the maximum Q value in the current state as the reinforcement learning track.

Meanwhile, a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into the vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:

F＝g+h

And selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a regular optimal track. And estimating the Q value of the regular optimal track by using a reinforcement learning Q network. And comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.

S34, during off-line training, different common strengths can be used by collecting training dataThe learning training method calculates s in each state_tTake different actions a_tExpectation of future rewards gained Q(s)_t,a_t) And taking the Q values of different states and different actions as training data, and updating parameters of the Q network by using a gradient descent method.

In step S4, from the current state S_tStarting with a different action a_tA trajectory may be generated. As can be seen from step S31, the action includes the steering wheel rotation acceleration γ and the longitudinal acceleration a of the vehicle at the next time. Steering wheel angle and theta through current time_tThe steering wheel angle and theta desired at the next time can be calculated_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, i.e., the trajectory can be generated. The evaluation values of these forward simulated trajectories are:

F＝g+h

In step S5, if the number of times of repeating step S4 exceeds the predetermined number of times and no solution is found, the motion planning outputs a planning failure result.

The invention also provides an automobile automatic driving motion planning system based on the learning sampling type, which comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning finishing module and a final planning track forming module;

the vehicle kinematic model building module builds a vehicle kinematic model according to the vehicle parameters;

the storage table initialization module is used for initializing a storage table of the heuristic motion planning method: an Open table and a Closed table;

the initial optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;

the candidate optimal trajectory selection module generates a series of forward simulation trajectories from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;

the motion planning ending module is used for ending the motion planning process until the candidate optimal track end point is in the end point range required by the motion planning;

and the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track.

In the above embodiment, in the initial optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method includes the following steps:

s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;

s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)_t,a_t) Wherein s is_tIs in a state of time t, a_tIs the action at time t;

s3, state S from current time t_tStarting with a different action a_tGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time t_tCalculating the next time t + Deltat expectationSteering wheel angle of and theta_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)_t,a_t) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.

The above embodiments are only for illustrating the present invention, and the steps may be changed, and on the basis of the technical solution of the present invention, the modification and equivalent changes of the individual steps according to the principle of the present invention should not be excluded from the protection scope of the present invention.

Claims

1. An automobile automatic driving motion planning method based on a learning sampling mode is characterized by comprising the following steps:

s1: establishing a vehicle kinematic model according to vehicle parameters;

s2: initializing a storage table of a heuristic motion planning method: an Open table and a Closed table;

s3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating an evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;

s4: generating a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;

s5: repeating the step S4 until the candidate optimal trajectory end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process;

s6: connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory;

in the step S3, a reinforcement learning trajectory is selected based on a reinforcement learning method, specifically, a reinforcement learning method based on a Q learning algorithm;

s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;

s32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)_t,a_t) Wherein s is_tIs in a state of time t, a_tIs the action at time t;

s33, state S from current time t_tStarting with a different action a_tGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time t_tCalculating the steering wheel angle sum theta expected at the next time t + delta t_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)_t,a_t) Taking the track generated by the action with the maximum Q value in the current state as a reinforced learning track; where Δ t is the simulation step size.

2. The automotive autonomous driving motion planning method of claim 1, wherein: in step S3, the forward simulation trajectory is generated using the steering wheel angular acceleration and the accelerator/brake input of the vehicle.

3. The automotive autonomous driving motion planning method of claim 2, wherein: the forward simulation track generation method comprises the following steps: determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtain

4. The automotive autonomous driving motion planning method of claim 1, wherein: in step S3, before the forward simulation trajectory is stored in the Open table, collision detection is performed on the forward simulation trajectory, it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result, and if the trajectory collides, the trajectory is directly deleted, and the trajectory that does not collide is stored in the Open table.

5. The automotive autonomous driving motion planning method of claim 1, wherein: the selection method of the initial optimal track comprises the following steps:

a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into a vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:

F＝g+h

6. The automotive autonomous driving motion planning method of claim 1, wherein: in the step S4, the current state S is set_tStarting with a different action a_tGenerating a trajectory through the steering wheel angle and theta at the current time_tCalculating the steering wheel angle sum theta expected at the next moment_t+Δt＝θ_t+ γ × Δ t, inputting a desired steering wheel angle and a desired longitudinal acceleration into the vehicle model, and generating a trajectory; the evaluation values of these forward simulated trajectories are:

F＝g+h

7. An automobile automatic driving motion planning system based on learning sampling type is characterized by comprising: the system comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning ending module and a final planning track forming module;

the vehicle kinematic model building module builds a vehicle kinematic model according to vehicle parameters;

the storage table initialization module is used for initializing a storage table of a heuristic motion planning method: an Open table and a Closed table;

the initial segment optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;

the candidate optimal track selection module generates a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;

the motion planning ending module ends the motion planning process until the candidate optimal trajectory end point is within the end point range required by the motion planning;

the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track;

in the initial segment optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method comprises the following steps:

s3, state S from current time t_tStarting with a different action a_tGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time t_tCalculating the steering wheel angle sum theta expected at the next time t + delta t_t+Δt＝θ_t+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)_t,a_t) Taking the track generated by the action with the maximum Q value in the current state as a reinforced learning track; where Δ t is the simulation step size.