CN111413974B - Automobile automatic driving motion planning method and system based on learning sampling type - Google Patents

Automobile automatic driving motion planning method and system based on learning sampling type Download PDF

Info

Publication number
CN111413974B
CN111413974B CN202010236474.1A CN202010236474A CN111413974B CN 111413974 B CN111413974 B CN 111413974B CN 202010236474 A CN202010236474 A CN 202010236474A CN 111413974 B CN111413974 B CN 111413974B
Authority
CN
China
Prior art keywords
track
forward simulation
trajectory
optimal
vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010236474.1A
Other languages
Chinese (zh)
Other versions
CN111413974A (en
Inventor
江昆
周伟韬
杨殿阁
严瑞东
黄晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010236474.1A priority Critical patent/CN111413974B/en
Publication of CN111413974A publication Critical patent/CN111413974A/en
Application granted granted Critical
Publication of CN111413974B publication Critical patent/CN111413974B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Steering Control In Accordance With Driving Conditions (AREA)

Abstract

The invention relates to an automobile automatic driving movement planning method and system based on a learning sampling type, which comprises the following steps: establishing a vehicle kinematic model; initializing an Open table and a Closed table; calculating the evaluation value of each forward simulation track, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, and storing the initial optimal track into a Closed table; screening a non-collision forward simulation track by using a collision detection method, and storing the non-collision forward simulation track into an Open table; calculating the evaluation value of each forward simulation track, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, and storing the candidate optimal track in a Closed table; ending the motion planning process when the candidate optimal track end point is within the end point range required by the motion planning; and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.

Description

Automobile automatic driving motion planning method and system based on learning sampling type
Technical Field
The invention relates to the field of intelligent vehicles, in particular to an automobile automatic driving motion planning method and system based on a learning sampling mode.
Background
In recent years, artificial intelligence technology is gradually beginning to be commercialized in the fields of intelligent transportation and vehicles, and intelligent networked vehicles gradually come into the visual field of people. Generally, an automatic driving software system of an intelligent vehicle can be divided into four modules of perception, positioning, decision and control. The motion planning is the most important part in the decision module and determines the decision quality of the intelligent vehicle. Since the control module generally only performs the task of motion/trajectory tracking, the outcome of the motion planning is crucial to the final driving behavior of the vehicle.
Existing motion planning methods can be broadly divided into sampling-based methods, optimization-based methods, and end-to-end learning-based methods. The method based on end-to-end learning establishes the mapping from sensor data to driving actions, but engineering practice and optimization are difficult to carry out due to the black box characteristic of the learning method; the optimization-based method generally depends on lane lines or other prior road information, and the solving time is often difficult to ensure; the sampling-based method is widely applied to the motion planning of automatic driving due to the characteristics that the solving speed is high and the method can adapt to various environmental characteristics.
The sampling-based method generally selects a sampling trajectory or a motion state through a cost function, which is essentially based on the selection of an optimal trajectory/motion state by an artificially set rule, but the artificially set cost function is difficult to adapt to a complex and variable real environment.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a method and a system for planning an automatic driving motion of an automobile based on a learning sampling mode, which can better consider uncertainty and randomness in the environment and can improve the safety and robustness of the automatic driving motion planning.
In order to achieve the purpose, the invention adopts the following technical scheme: an automobile automatic driving motion planning method based on a learning sampling mode comprises the following steps: s1: establishing a vehicle kinematic model according to vehicle parameters; s2: initializing a storage table of a heuristic motion planning method: an Open table and a Closed table; s3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating an evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning; s4: generating a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point; s5: repeating the step S4 until the candidate optimal trajectory end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process; s6: and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.
Further, in step S3, the forward simulation trajectory is generated by using the steering wheel angular acceleration and the accelerator/brake input of the vehicle.
Further, the forward simulation trajectory generation method comprises: determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtain
Figure BDA0002431156850000021
And updating the positions x and y of the vehicle and the direction theta of the vehicle, continuously iterating, updating the track of the vehicle, and finally obtaining the forward simulation track.
Further, in step S3, before the forward simulation trajectory is stored in the Open table, collision detection is performed on the forward simulation trajectory, it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result, and if the trajectory collides, the trajectory is directly deleted, and the trajectory that does not collide is stored in the Open table.
Further, in the step S3, a reinforcement learning trajectory is selected based on a reinforcement learning method, specifically, a reinforcement learning method based on a Q learning algorithm.
Further, the method is based on intensive chemistryThe reinforcement learning track selection method of the learning method comprises the following steps: s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R; s32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t; s33, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.
Further, the selection method of the initial optimal trajectory comprises the following steps: a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into a vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point;
selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a rule optimal track, and estimating the Q value of the rule optimal track by using a reinforcement learning Q network; and comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.
Further, in the step S4, the current state S is settStarting with a different action atGenerating a trajectory through the steering wheel angle and theta at the current timetCalculating the steering wheel angle sum theta expected at the next momentt+Δt=θt+ γ × Δ t, inputting a desired steering wheel angle and a desired longitudinal acceleration into the vehicle model, and generating a trajectory; the evaluation values of these forward simulated trajectories are:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
An automotive autopilot motion planning system based on learning sampling, comprising: the system comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning ending module and a final planning track forming module; the vehicle kinematic model building module builds a vehicle kinematic model according to vehicle parameters; the storage table initialization module is used for initializing a storage table of a heuristic motion planning method: an Open table and a Closed table; the initial segment optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning; the candidate optimal track selection module generates a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point; the motion planning ending module ends the motion planning process until the candidate optimal trajectory end point is within the end point range required by the motion planning; and the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track.
Further, in the initial segment optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method includes the following steps: s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R; s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t; s3, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.
Due to the adoption of the technical scheme, the invention has the following advantages: the method combines the vehicle kinematics model, and replaces the traditional rule-based candidate track selection mode by adding the reinforcement learning module in the sampling type motion planning method based on the vehicle kinematics model, so that the track selection is more reasonable, the uncertainty and the randomness in the environment can be better considered, and the safety and the robustness of the automatic driving motion planning can be improved.
Drawings
FIG. 1 is a schematic overall flow diagram of the process of the present invention;
FIG. 2 is a simplified vehicle dynamics model schematic of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
As shown in fig. 1, the present invention provides a learning sampling based vehicle automatic driving motion planning method, which combines reinforcement learning, receives automatic driving perception results, i.e. information such as the position and speed of an obstacle in the surrounding environment, and outputs a series of actions that an intelligent vehicle can perform under the constraint of the starting point and the ending point.
The method specifically comprises the following steps:
s1: establishing a vehicle kinematic model according to vehicle parameters;
the vehicle parameters include vehicle size, minimum turning radius, maximum acceleration/deceleration, and the like.
S2: initializing a storage table of a heuristic motion planning method: open and Closed tables.
S3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating the evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value from the forward simulation tracks as a regular optimal track; and performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track. And selecting the initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking the end point of the initial optimal track as the starting point of subsequent planning.
S4: based on a heuristic planning method, a series of forward simulation tracks are generated from a planning starting point: firstly, screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; and calculating the evaluation value of each forward simulation track through a heuristic function. And then selecting the forward simulation track with the highest evaluation value from the Open table as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point.
S5: and repeating the step S4 until the candidate optimal track end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process.
S6: and connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory.
In step S1, the vehicle kinematics model may be the simplest two-degree-of-freedom bicycle model, and as shown in fig. 2, the kinematics model may be described as:
Figure BDA0002431156850000051
establishing an xOy coordinate system, (x and y) are coordinates of the vehicle under the coordinate system, L is a distance between a front axle and a rear axle of a wheel, theta is a vehicle orientation angle, delta is a steering wheel corner, gamma is a steering wheel angular acceleration of the vehicle, and a is a steering wheel longitudinal acceleration of the vehicle. v represents the speed of the vehicle and is,
Figure BDA0002431156850000052
respectively, the derivative values of the respective variables.
In step S1, other vehicle kinematics or dynamics models may be established, and only the model needs to be input as the vehicle accelerator/brake opening, the vehicle steering wheel angle, and the output as the vehicle forward simulation trajectory.
In step S2, the Open table and the Closed table have the same format, and store the trajectory generated in step S3 and the evaluation value corresponding to the trajectory, and the evaluation value is a digital value larger than 0.
In step S3, the forward simulation trajectory is generated by using the vehicle steering wheel angular acceleration and the accelerator/brake input, and the specific generation method is as follows:
determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtain
Figure BDA0002431156850000053
And updating the vehicle position x, y and the vehicle direction theta is realized, and the vehicle track can be updated by continuous iteration, so that the forward simulation track is finally obtained.
For example, γ and a are the steering wheel angular accelerations of the vehicle, respectively, as shown in the vehicle kinematics model in step S1Degrees and longitudinal acceleration. The vehicle longitudinal acceleration may be derived directly from throttle/brake input. In the formula, x, y, theta, delta and v are known numbers, and simultaneous solution can be obtained
Figure BDA0002431156850000054
Through the following formula, the updating of the vehicle position x, y and the vehicle direction theta can be realized, and the updating of the vehicle track can be realized through continuous iteration.
In step S3, before storing the forward simulation trajectory in the Open table, collision detection is performed on the forward simulation trajectory, and it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result. And if the tracks collide, directly deleting the tracks, and storing the non-colliding tracks into an Open table.
In step S3, selecting a reinforcement learning trajectory based on a reinforcement learning method, specifically, reinforcement learning based on a Q learning algorithm;
the reinforcement learning track selection method based on the reinforcement learning method comprises the following steps:
s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R:
the state space includes information of 3 obstacles closest to the host vehicle in the environment and the current state of the trajectory. Wherein the obstacle information includes a position and a speed (x) of the obstaclen,yn,vxn,vyn),xn,ynCoordinates in the xOy coordinate system, v, of the nth obstacle, respectivelyxn,vynThe speed of the obstacle in the x and y directions in the xOy coordinate system, respectively.
The track current state comprises the coordinates x of the current pointt,ytDesired velocity vxt,vytDesired track state time t, vehicle throttle/brake opening for track state, vehicle steering wheel angle. The action space comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment;
in the off-line training process, the reward function R of the reinforcement learning method needs to be determined:
R=A-B
wherein, A is the reward of successfully reaching the terminal when the vehicle executes the final calculated track, B is the collision penalty, and when the vehicle executes the final calculated track, if the collision happens, the penalty is obtained.
S32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atThe action at time t.
S33, state S from current time ttStarting with a different action atA trajectory may be generated. As can be seen from step S31, the action includes the steering wheel rotation acceleration γ and the longitudinal acceleration a of the vehicle at the next time. Steering wheel angle and theta at the current time ttThe steering wheel angle and θ desired at the next time t + Δ t can be calculatedt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, i.e., the trajectory can be generated. The Q value at this time is: q(s)t,at) As the Q value of the generated trajectory. And taking the track generated by the action with the maximum Q value in the current state as the reinforcement learning track.
Meanwhile, a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into the vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
And selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a regular optimal track. And estimating the Q value of the regular optimal track by using a reinforcement learning Q network. And comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.
S34, during off-line training, different common strengths can be used by collecting training dataThe learning training method calculates s in each statetTake different actions atExpectation of future rewards gained Q(s)t,at) And taking the Q values of different states and different actions as training data, and updating parameters of the Q network by using a gradient descent method.
In step S4, from the current state StStarting with a different action atA trajectory may be generated. As can be seen from step S31, the action includes the steering wheel rotation acceleration γ and the longitudinal acceleration a of the vehicle at the next time. Steering wheel angle and theta through current timetThe steering wheel angle and theta desired at the next time can be calculatedt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, i.e., the trajectory can be generated. The evaluation values of these forward simulated trajectories are:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
In step S5, if the number of times of repeating step S4 exceeds the predetermined number of times and no solution is found, the motion planning outputs a planning failure result.
The invention also provides an automobile automatic driving motion planning system based on the learning sampling type, which comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning finishing module and a final planning track forming module;
the vehicle kinematic model building module builds a vehicle kinematic model according to the vehicle parameters;
the storage table initialization module is used for initializing a storage table of the heuristic motion planning method: an Open table and a Closed table;
the initial optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;
the candidate optimal trajectory selection module generates a series of forward simulation trajectories from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;
the motion planning ending module is used for ending the motion planning process until the candidate optimal track end point is in the end point range required by the motion planning;
and the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track.
In the above embodiment, in the initial optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method includes the following steps:
s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;
s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t;
s3, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the next time t + Deltat expectationSteering wheel angle of and thetat+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) The trajectory generated by the operation having the maximum Q value in the current state is set as the reinforcement learning trajectory.
The above embodiments are only for illustrating the present invention, and the steps may be changed, and on the basis of the technical solution of the present invention, the modification and equivalent changes of the individual steps according to the principle of the present invention should not be excluded from the protection scope of the present invention.

Claims (7)

1. An automobile automatic driving motion planning method based on a learning sampling mode is characterized by comprising the following steps:
s1: establishing a vehicle kinematic model according to vehicle parameters;
s2: initializing a storage table of a heuristic motion planning method: an Open table and a Closed table;
s3: generating a series of forward simulation tracks from a starting point based on a learning sampling method, calculating an evaluation value of each forward simulation track through a heuristic function, and selecting the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;
s4: generating a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;
s5: repeating the step S4 until the candidate optimal trajectory end point in the step S4 is within the end point range required by the motion planning, and ending the motion planning process;
s6: connecting the initial optimal trajectory and the candidate optimal trajectory in the Closed table to form a final planning trajectory;
in the step S3, a reinforcement learning trajectory is selected based on a reinforcement learning method, specifically, a reinforcement learning method based on a Q learning algorithm;
the reinforcement learning track selection method based on the reinforcement learning method comprises the following steps:
s31, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;
s32, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t;
s33, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) Taking the track generated by the action with the maximum Q value in the current state as a reinforced learning track; where Δ t is the simulation step size.
2. The automotive autonomous driving motion planning method of claim 1, wherein: in step S3, the forward simulation trajectory is generated using the steering wheel angular acceleration and the accelerator/brake input of the vehicle.
3. The automotive autonomous driving motion planning method of claim 2, wherein: the forward simulation track generation method comprises the following steps: determining a simulation step length delta t according to a using scene, and solving through a vehicle kinematic model to obtain
Figure FDA0002933542120000021
And updating the positions x and y of the vehicle and the direction theta of the vehicle, continuously iterating, updating the track of the vehicle, and finally obtaining the forward simulation track.
4. The automotive autonomous driving motion planning method of claim 1, wherein: in step S3, before the forward simulation trajectory is stored in the Open table, collision detection is performed on the forward simulation trajectory, it is detected whether the generated forward simulation trajectory collides with the boundary of the obstacle in the sensing result, and if the trajectory collides, the trajectory is directly deleted, and the trajectory that does not collide is stored in the Open table.
5. The automotive autonomous driving motion planning method of claim 1, wherein: the selection method of the initial optimal track comprises the following steps:
a series of forward simulation tracks are generated by inputting different steering wheel turning angles and accelerator/brake opening degrees into a vehicle dynamics model, and the evaluation values F of the forward simulation tracks are as follows:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point;
selecting the track with the minimum evaluation value F from a series of forward simulation tracks as a rule optimal track, and estimating the Q value of the rule optimal track by using a reinforcement learning Q network; and comparing the Q values of the regular optimal track and the reinforcement learning track, and selecting the track with a larger Q value as an initial optimal track.
6. The automotive autonomous driving motion planning method of claim 1, wherein: in the step S4, the current state S is settStarting with a different action atGenerating a trajectory through the steering wheel angle and theta at the current timetCalculating the steering wheel angle sum theta expected at the next momentt+Δt=θt+ γ × Δ t, inputting a desired steering wheel angle and a desired longitudinal acceleration into the vehicle model, and generating a trajectory; the evaluation values of these forward simulated trajectories are:
F=g+h
wherein g is the distance from the candidate track end point to the starting point, and h is the distance from the candidate track end point to the final track end point.
7. An automobile automatic driving motion planning system based on learning sampling type is characterized by comprising: the system comprises a vehicle kinematics model establishing module, a storage table initializing module, an initial optimal track selecting module, a candidate optimal track selecting module, a motion planning ending module and a final planning track forming module;
the vehicle kinematic model building module builds a vehicle kinematic model according to vehicle parameters;
the storage table initialization module is used for initializing a storage table of a heuristic motion planning method: an Open table and a Closed table;
the initial segment optimal track selection module generates a series of forward simulation tracks from a starting point based on a learning sampling method, calculates the evaluation value of each forward simulation track through a heuristic function, and selects the track with the highest evaluation value as a regular optimal track; performing Q value function estimation on the forward simulation track by using a reinforcement learning method, and selecting the track with the maximum Q value as a reinforcement learning track; selecting an initial optimal track from the regular optimal track and the reinforcement learning track, storing the initial optimal track into a Closed table, and taking an end point of the initial optimal track as a starting point of subsequent planning;
the candidate optimal track selection module generates a series of forward simulation tracks from a planning starting point based on a heuristic planning method; screening non-collision forward simulation tracks by using a collision detection method, and storing the non-collision forward simulation tracks into an Open table; calculating the evaluation value of each forward simulation track through a heuristic function, selecting the forward simulation track with the highest evaluation value as a candidate optimal track, storing the candidate optimal track and the evaluation value thereof into a Closed table, and taking the candidate optimal track end point as a subsequent planning start point;
the motion planning ending module ends the motion planning process until the candidate optimal trajectory end point is within the end point range required by the motion planning;
the final planning track forming module connects the initial optimal track and the candidate optimal track in the Closed table to form a final planning track;
in the initial segment optimal trajectory selection module, the reinforcement learning trajectory selection based on the reinforcement learning method comprises the following steps:
s1, initializing a reinforcement learning algorithm: determining a state space, an action space and a reward function R;
s2, establishing a Q network, wherein the Q network stores expected rewards obtained by taking different actions under different states, the rewards are called Q values, and Q value parameters in the Q network are randomly initialized before offline training is started: q(s)t,at) Wherein s istIs in a state of time t, atIs the action at time t;
s3, state S from current time ttStarting with a different action atGenerating a track, wherein the action comprises the steering wheel rotation acceleration gamma and the longitudinal acceleration a of the vehicle at the next moment; steering wheel angle and theta at the current time ttCalculating the steering wheel angle sum theta expected at the next time t + delta tt+Δt=θt+ γ × Δ t, a desired steering wheel angle and a desired longitudinal acceleration are input to the vehicle model, a trajectory is generated, and the Q value at that time: q(s)t,at) Taking the track generated by the action with the maximum Q value in the current state as a reinforced learning track; where Δ t is the simulation step size.
CN202010236474.1A 2020-03-30 2020-03-30 Automobile automatic driving motion planning method and system based on learning sampling type Active CN111413974B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010236474.1A CN111413974B (en) 2020-03-30 2020-03-30 Automobile automatic driving motion planning method and system based on learning sampling type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010236474.1A CN111413974B (en) 2020-03-30 2020-03-30 Automobile automatic driving motion planning method and system based on learning sampling type

Publications (2)

Publication Number Publication Date
CN111413974A CN111413974A (en) 2020-07-14
CN111413974B true CN111413974B (en) 2021-03-30

Family

ID=71494691

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010236474.1A Active CN111413974B (en) 2020-03-30 2020-03-30 Automobile automatic driving motion planning method and system based on learning sampling type

Country Status (1)

Country Link
CN (1) CN111413974B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113239986B (en) * 2021-04-25 2023-04-18 浙江吉利控股集团有限公司 Training method and device for vehicle track evaluation network model and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139072A (en) * 2015-09-09 2015-12-09 东华大学 Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN107423761A (en) * 2017-07-24 2017-12-01 清华大学 Feature based selects and the rail locomotive energy saving optimizing method of operating of machine learning
CN108153585A (en) * 2017-12-01 2018-06-12 北京大学 A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames
CN109521774A (en) * 2018-12-27 2019-03-26 南京芊玥机器人科技有限公司 A kind of spray robot track optimizing method based on intensified learning
CN109598934A (en) * 2018-12-13 2019-04-09 清华大学 A kind of rule-based method for sailing out of high speed with learning model pilotless automobile
CN109726676A (en) * 2018-12-28 2019-05-07 苏州大学 The planing method of automated driving system
CN109936865A (en) * 2018-06-30 2019-06-25 北京工业大学 A kind of mobile sink paths planning method based on deeply learning algorithm
CN110083160A (en) * 2019-05-16 2019-08-02 哈尔滨工业大学(深圳) A kind of method for planning track of robot based on deep learning
CN110472738A (en) * 2019-08-16 2019-11-19 北京理工大学 A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
KR102055141B1 (en) * 2018-12-31 2019-12-12 한국기술교육대학교 산학협력단 System for remote controlling of devices based on reinforcement learning
CN110666793A (en) * 2019-09-11 2020-01-10 大连理工大学 Method for realizing robot square part assembly based on deep reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190331920A1 (en) * 2016-11-18 2019-10-31 Eyedapfei inc. Improved Systems for Augmented Reality Visual Aids and Tools

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139072A (en) * 2015-09-09 2015-12-09 东华大学 Reinforcement learning algorithm applied to non-tracking intelligent trolley barrier-avoiding system
CN106595671A (en) * 2017-02-22 2017-04-26 南方科技大学 Method and apparatus for planning route of unmanned aerial vehicle based on reinforcement learning
CN107194612A (en) * 2017-06-20 2017-09-22 清华大学 A kind of train operation dispatching method learnt based on deeply and system
CN107423761A (en) * 2017-07-24 2017-12-01 清华大学 Feature based selects and the rail locomotive energy saving optimizing method of operating of machine learning
CN108153585A (en) * 2017-12-01 2018-06-12 北京大学 A kind of method and apparatus of the operational efficiency based on locality expression function optimization MapReduce frames
CN109936865A (en) * 2018-06-30 2019-06-25 北京工业大学 A kind of mobile sink paths planning method based on deeply learning algorithm
CN109598934A (en) * 2018-12-13 2019-04-09 清华大学 A kind of rule-based method for sailing out of high speed with learning model pilotless automobile
CN109521774A (en) * 2018-12-27 2019-03-26 南京芊玥机器人科技有限公司 A kind of spray robot track optimizing method based on intensified learning
CN109726676A (en) * 2018-12-28 2019-05-07 苏州大学 The planing method of automated driving system
KR102055141B1 (en) * 2018-12-31 2019-12-12 한국기술교육대학교 산학협력단 System for remote controlling of devices based on reinforcement learning
CN110083160A (en) * 2019-05-16 2019-08-02 哈尔滨工业大学(深圳) A kind of method for planning track of robot based on deep learning
CN110472738A (en) * 2019-08-16 2019-11-19 北京理工大学 A kind of unmanned boat Real Time Obstacle Avoiding algorithm based on deeply study
CN110666793A (en) * 2019-09-11 2020-01-10 大连理工大学 Method for realizing robot square part assembly based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem;Damien Ernst;《IEEE》;20091231;第517-529页 *
基于深度强化学习的自动驾驶策略学习方法;夏伟,等;《集成技术》;20170521(第3期);第29-35页 *

Also Published As

Publication number Publication date
CN111413974A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
Zhang et al. Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning
CN110136481B (en) Parking strategy based on deep reinforcement learning
CN107063280B (en) Intelligent vehicle path planning system and method based on control sampling
Rosolia et al. Autonomous racing using learning model predictive control
CN112356830B (en) Intelligent parking method based on model reinforcement learning
CN111098852B (en) Parking path planning method based on reinforcement learning
CN110969848A (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
EP3864574A1 (en) Autonomous vehicle planning and prediction
Rubies-Royo et al. A classification-based approach for approximate reachability
CN114312830B (en) Intelligent vehicle coupling decision model and method considering dangerous driving conditions
CN112162555A (en) Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN111679660B (en) Unmanned deep reinforcement learning method integrating human-like driving behaviors
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN114564016A (en) Navigation obstacle avoidance control method, system and model combining path planning and reinforcement learning
CN110861651B (en) Method for estimating longitudinal and lateral motion states of front vehicle
CN115257745A (en) Automatic driving lane change decision control method based on rule fusion reinforcement learning
Firl et al. Probabilistic Maneuver Prediction in Traffic Scenarios.
CN116679719A (en) Unmanned vehicle self-adaptive path planning method based on dynamic window method and near-end strategy
CN113296523A (en) Mobile robot obstacle avoidance path planning method
CN117222915A (en) System and method for tracking an expanded state of a moving object using a composite measurement model
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
CN116486356A (en) Narrow scene track generation method based on self-adaptive learning technology
CN111413974B (en) Automobile automatic driving motion planning method and system based on learning sampling type
CN115416024A (en) Moment-controlled mechanical arm autonomous trajectory planning method and system
Gaskett et al. Reinforcement learning for visual servoing of a mobile robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant