CN117141520B - Real-time track planning method, device and equipment - Google Patents
Real-time track planning method, device and equipment Download PDFInfo
- Publication number
- CN117141520B CN117141520B CN202311426011.1A CN202311426011A CN117141520B CN 117141520 B CN117141520 B CN 117141520B CN 202311426011 A CN202311426011 A CN 202311426011A CN 117141520 B CN117141520 B CN 117141520B
- Authority
- CN
- China
- Prior art keywords
- path
- cost
- target
- path planning
- initial value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000005457 optimization Methods 0.000 claims abstract description 45
- 230000002787 reinforcement Effects 0.000 claims abstract description 25
- 238000006073 displacement reaction Methods 0.000 claims abstract description 19
- 238000009499 grossing Methods 0.000 claims abstract description 15
- 230000006870 function Effects 0.000 claims description 21
- 238000003062 neural network model Methods 0.000 claims description 20
- 238000005070 sampling Methods 0.000 claims description 19
- 230000009471 action Effects 0.000 claims description 18
- 230000001133 acceleration Effects 0.000 claims description 17
- 238000010586 diagram Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 15
- 238000004088 simulation Methods 0.000 claims description 8
- 230000003068 static effect Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000013508 migration Methods 0.000 claims description 3
- 230000005012 migration Effects 0.000 claims description 3
- 230000005856 abnormality Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000002159 abnormal effect Effects 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000005452 bending Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W50/0098—Details of control systems ensuring comfort, safety or stability not otherwise provided for
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W60/00—Drive control systems specially adapted for autonomous road vehicles
- B60W60/001—Planning or execution of driving tasks
- B60W60/0015—Planning or execution of driving tasks specially adapted for safety
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W50/00—Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
- B60W2050/0001—Details of the control system
- B60W2050/0043—Signal treatments, identification of variables or parameters, parameter estimation or state estimation
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B60—VEHICLES IN GENERAL
- B60W—CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
- B60W2552/00—Input parameters relating to infrastructure
- B60W2552/50—Barriers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Automation & Control Theory (AREA)
- Human Computer Interaction (AREA)
- Transportation (AREA)
- Mechanical Engineering (AREA)
- Traffic Control Systems (AREA)
Abstract
The disclosure belongs to the technical field of automatic driving, and particularly relates to a real-time track planning method, device and equipment. Wherein the method comprises the following steps: basic navigation point information and pose information of a vehicle at the current moment are obtained, a target path reference line is determined based on the basic navigation point information, the pose information is converted from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and a transverse offset-displacement map and a rasterized map are obtained; determining a path planning initial value of a current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and performing multi-objective optimization on the initial path plan to obtain an optimal path plan; and determining an optimal speed curve corresponding to the optimal path planning, obtaining a target track by combining the optimal path planning and the optimal speed curve, and carrying out cost selection on the multi-lane target track to obtain a final control track.
Description
Technical Field
The disclosure belongs to the technical field of automatic driving, and particularly relates to a real-time track planning method, device and equipment.
Background
At present, an automatic driving scene which is handled by automatic driving real-time track planning is more and more complex and changeable, and when an abnormal scene appears, the traditional decoupling planning or optimizing solution is difficult to handle because of problems such as searching dimension, cost parameters and the like, so that the real-time planning is failed.
Therefore, in the prior art, the reinforcement learning is introduced to directly generate a track coarse solution and then carry out the back-end optimization solution to obtain a final track, and although the method can obtain a relatively reliable and safe track, the DQN reinforcement learning method is often difficult to obtain a good track coarse solution because of the limitation of a network, so that the back-end optimization is unstable.
Disclosure of Invention
The embodiment of the disclosure provides a real-time track planning scheme to solve the problem of insufficient processing capacity of the prior art on abnormal scenes.
A first aspect of an embodiment of the present disclosure provides a real-time trajectory planning method, including:
basic navigation point information and pose information of a vehicle at the current moment are obtained, a target path reference line is determined based on the basic navigation point information, the pose information is converted from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and a transverse offset-displacement map and a rasterized map are obtained, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information;
determining a path planning initial value of a current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and performing multi-objective optimization on the initial path plan to obtain an optimal path plan;
and determining an optimal speed curve corresponding to the optimal path planning, obtaining a target track by combining the optimal path planning and the optimal speed curve, and carrying out cost selection on the multi-lane target track to obtain a final control track.
In some embodiments, the determining a target path reference line based on the base navigation point information comprises:
acquiring basic navigation points in a first preset distance in front and a second preset distance in rear in real time according to the positioning information of the current frame of the vehicle;
determining a loss function and a boundary constraint condition of the target path reference line, wherein the loss function comprises a smooth cost, a compact cost and a similar cost, the smooth cost is used for measuring the smoothness of the target path reference line, the compact cost is used for measuring the uniformity degree between target path reference points corresponding to the basic navigation points, and the similar cost is used for measuring the deviation degree of the target path reference line and an original navigation path;
and obtaining the basic navigation points meeting the boundary constraint condition based on a quadratic solver, wherein the path reference points corresponding to the basic navigation points form the target path reference line.
In some embodiments, the determining the path planning initial value of the current vehicle through the cost search and reinforcement learning model comprises:
sampling along the target path reference line, calculating the cost value of the preset cost of each sampling node, and forming a first path planning initial value corresponding to the preset cost by the sampling nodes with the minimum sum of cost values;
according to the three continuous frames of the grid map based on the target path reference line in the current and past, utilizing a decision neural network model to make a decision of the action to be executed by the vehicle at the current moment, and obtaining a decision action value of the vehicle at the current moment, wherein the decision action value forms a second path planning initial value;
and calculating the cost value of the preset cost for the second path planning initial value, comparing the cost value with the cost value of the first path planning initial value corresponding to the preset cost, and taking the smaller one of the second path planning initial value and the first path planning initial value as the path planning initial value of the current vehicle.
In some embodiments, the training method of the decision neural network model comprises:
constructing a state space from a path-based three-frame continuous grid map centered on a vehicle, wherein the positions of the vehicle and surrounding obstacles are projected onto the grid map;
constructing an action space formed by transverse continuous coordinate values of a plurality of path points sampled in a preset rectangular range in front of the vehicle, wherein the longitudinal intervals of the path points are kept at fixed intervals, and the transverse coordinate values are determined by a decision neural network model;
designing a reward function based on rewards to reach the target, penalties for collision occurrence, curvature of the output paths, and smooth migration between the output paths;
constructing a training set by taking the state space as input and the action space as output;
designing a simulation environment comprising a static scene and a complex scene, training the neural network model for the training set by adopting a SAC reinforcement learning method based on the reward function in the simulation environment, and obtaining the decision neural network model.
In some embodiments, the smoothing the path plan initial value to obtain an initial path plan comprises:
taking the loss function and the boundary constraint condition of the target path reference line as the loss function and the boundary constraint condition of the path planning initial value;
and optimizing the path planning initial value based on a quadratic solver to obtain the initial path planning.
In some embodiments, the multi-objective optimization of the initial path plan to obtain an optimal path plan includes:
determining a boundary constraint condition of the path based on the transverse offset-displacement diagram and the path planning initial value, and performing multi-objective optimization on the initial path planning based on the boundary constraint condition and a preset optimization objective to obtain an optimal path planning in a target space, wherein the preset optimization objective comprises one or more of a deviation degree of a target path from a target path reference line, a deviation degree of the target path from the path planning initial value and continuity of the target path;
and when the optimal path planning in the target space solves for abnormality, taking the initial path planning as the optimal path planning.
In some embodiments, the determining an optimal speed profile corresponding to the optimal path plan includes:
constructing a displacement-time diagram for static obstacles and dynamic obstacles of the vehicle under a flener coordinate system;
sampling the displacement-time diagram, calculating the cost value of preset cost of each sampling node, and forming a speed curve initial value corresponding to the preset cost by the sampling nodes with the minimum sum of cost values, wherein the preset cost comprises speed cost, acceleration adding cost and collision cost;
and obtaining a convex space of speed optimization from the initial value of the speed curve, extracting boundary constraint conditions in the convex space, and performing multi-objective optimization on the speed curve based on the boundary constraint conditions and a preset optimization target to obtain an optimal speed curve, wherein the preset optimization target comprises one or more of cruising speed, acceleration and acceleration.
In some embodiments, the combining the optimal path plan and the optimal speed profile to obtain a target trajectory includes:
and interpolating the optimal path planning through displacement based on the optimal speed curve to obtain speed and path information at each moment so as to generate a target track.
A second aspect of an embodiment of the present disclosure provides a real-time trajectory planning device, including:
the determining module is used for acquiring basic navigation point information and pose information of the vehicle at the current moment, determining a target path reference line based on the basic navigation point information, converting the pose information from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and acquiring a transverse offset-displacement map and a rasterized map, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information;
the optimizing module is used for determining a path planning initial value of the current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and carrying out multi-objective optimization on the initial path plan to obtain an optimal path plan;
the acquisition module is used for determining an optimal speed curve corresponding to the optimal path planning, obtaining a target track by combining the optimal path planning and the optimal speed curve, carrying out cost selection on the multi-lane target track, and determining a final control track.
A third aspect of the disclosed embodiments provides a real-time trajectory planning device comprising a memory and a processor:
the memory is used for storing a computer program;
the processor is configured to implement the method according to the first aspect of the present disclosure when executing the computer program.
In summary, unlike the prior art in which the path rough solution is directly obtained through reinforcement learning and then the optimal path is optimized, the real-time path planning method, device and equipment provided by the embodiments of the present disclosure use reinforcement learning to obtain the path rough solution, and fuse with the rough solution obtained by cost search to obtain the target path rough solution, then optimize to obtain the target path, and finally combine with the speed-optimized result to form the target path. The reference line of the target path is determined through multi-layer cost and smoothing treatment of the basic navigation points, so that the deviation of a model or a search space is solved, and the initial path of the reference line of dynamic planning is more stable and reliable; the method combining cost search and reinforcement learning solves the problem of selecting fine granularity space solutions, and enhances the accuracy and safety of path planning initial values; through a multi-objective optimization strategy based on an accurate path planning initial value, the problem that a planning module frequently adjusts parameters in different driving scenes is solved, and the system has good abnormal scene processing capability.
Drawings
The features and advantages of the present disclosure will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be construed as limiting the disclosure in any way, in which:
FIG. 1 is a schematic diagram of a real-time trajectory planning system to which the present disclosure is applicable;
FIG. 2 is a flow chart of a method of real-time trajectory planning shown in accordance with some embodiments of the present disclosure;
FIG. 3 is a schematic diagram of a real-time trajectory planning device, shown in accordance with some embodiments of the present disclosure;
fig. 4 is a schematic diagram of a real-time trajectory planning device, shown in accordance with some embodiments of the present disclosure.
Detailed Description
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant disclosure. However, it will be apparent to one of ordinary skill in the art that the present disclosure may be practiced without these specific details. It should be appreciated that the use of "system," "apparatus," "unit," and/or "module" terms in this disclosure is one method for distinguishing between different parts, elements, portions, or components at different levels in a sequential arrangement. However, these terms may be replaced with other expressions if the other expressions can achieve the same purpose.
It will be understood that when a device, unit, or module is referred to as being "on," "connected to," or "coupled to" another device, unit, or module, it can be directly on, connected to, or coupled to, or in communication with the other device, unit, or module, or intervening devices, units, or modules may be present unless the context clearly indicates an exception. For example, the term "and/or" as used in this disclosure includes any and all combinations of one or more of the associated listed items.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present disclosure. As used in the specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" are intended to cover only those features, integers, steps, operations, elements, and/or components that are explicitly identified, but do not constitute an exclusive list, as other features, integers, steps, operations, elements, and/or components may be included.
These and other features and characteristics of the present disclosure, as well as the methods of operation, functions of the related elements of structure, combinations of parts and economies of manufacture, may be better understood with reference to the following description and the accompanying drawings, all of which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the disclosure. It will be understood that the figures are not drawn to scale.
Various block diagrams are used in the present disclosure to illustrate various modifications of the embodiments according to the present disclosure. It should be understood that the foregoing or following structures are not intended to limit the present disclosure. The protection scope of the present disclosure is subject to the claims.
Fig. 1 is a schematic diagram of a real-time trajectory planning system to which the present disclosure is applicable. The system shown in fig. 1, the planning module is in data connection with the sensing module and the control module, and the planning module performs real-time track planning based on the digital world model sensed by the sensing module under the Cartesian coordinate system and sends the generated control track to the control module to control the vehicle to run along the control track.
The sensing module comprises various sensors for sensing the state of the vehicle, such as a pose, a speed, an acceleration sensor and the like; sensors, such as cameras, lidars, etc., for sensing local road networks and surrounding traffic participant information are also included. The sensing module can also acquire vehicle chassis state information such as gear, speed, steering angle and the like through the CANBUS bus.
The planning module may be any of a stand-alone, clustered, or distributed server. In particular, the planning module may be an on-board controller of an autonomous vehicle. The planning module can carry out dynamic planning on the path based on the digital world model to obtain a path rough solution, carry out secondary optimization on the path rough solution to obtain an optimal path, carry out dynamic planning on the speed and then carry out secondary optimization to obtain an optimal speed, and finally combine the optimal path and the optimal speed to generate a control track.
The control module is a mechanical, electronic control unit of an autonomous vehicle, such as a body control unit, a steering wheel and system, a braking system, a stabilizing system, etc.
Fig. 2 is a flow chart of a method of real-time trajectory planning shown in accordance with some embodiments of the present disclosure. In some embodiments, the real-time trajectory planning method is performed by a planning module in the system shown in fig. 1, and the real-time trajectory planning method includes the steps of:
s210: basic navigation point information and pose information of a vehicle at the current moment are obtained, a target path reference line is determined based on the basic navigation point information, the pose information is converted from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and a transverse offset-displacement map and a grid map are obtained, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information.
Specifically, a digitized world model perceived by the current vehicle in a Cartesian coordinate system is obtained, including the state of the vehicle itself, the local road network, and information of surrounding traffic participants.
And dynamically intercepting basic navigation points of front 50m and rear 10m in real time according to the positioning information of the current frame of the vehicle, and carrying out optimization treatment on the intercepted points. The three aspects of smoothing, similarity and compactness are mainly processed on the basic navigation points to obtain a target path reference line, and a loss function corresponding to the optimization solution is constructed as follows:
smoothing cost:,
compact cost:,
similar cost:,
wherein,three arbitrary adjacent continuous points of the target path reference point respectively, < > are provided>Is the original position corresponding to the reference point of the target pathA navigation point is started. Smooth cost->Measuring smoothness of a path reference line, and compacting cost->Measuring the uniformity degree between the path reference points, and the similarity cost +.>And measuring the deviation degree of the path reference line and the original navigation path.
The boundary constraints for the optimization solution are as follows:
,
wherein,the i-th path reference point, the original navigation point and the upper boundary point and the lower boundary point between the original navigation point and the path reference point are respectively.
The quadratic solver OSQP is adopted to solve the optimization problem, the optimization problem comprises a loss function and boundary constraint, so that a path reference point meeting constraint conditions is obtained, and the path reference point is smoother and more stable than a rough original navigation point, and provides accurate reference for a subsequent planning module.
According to the method and the device, the target path reference line is determined through multi-layer cost and smoothing processing of the basic navigation points, so that deviation of a model or a search space is solved, and an initial path of the reference line which is dynamically planned is more stable and reliable.
Finally, on the basis of a target reference line, various pose information (including boundaries) in the digital world model such as a vehicle pose Te, a static and dynamic obstacle To, a lane line point Tl and a boundary point Tb of a travelable area are projected from a Cartesian coordinate system To a Flunar coordinate system To obtain a transverse offset-displacement (SL) diagram, and meanwhile, the information in the diagram is rasterized To obtain a rasterized map of the current frame.
S220: and determining a path planning initial value of the current vehicle through a cost search and reinforcement learning model, smoothing the path planning initial value to obtain an initial path plan, and performing multi-objective optimization on the initial path plan to obtain an optimal path plan.
Specifically, in the lateral offset-displacement (SL) graph, uniform scattering is performed on the lateral direction L, and non-uniform scattering is performed on the longitudinal direction S (in this case, a near-dense and far-sparse mode is adopted), so that a model is generated for paths between adjacent sampling points in the sampling space as follows:
,
wherein,represents a lateral distance; s represents the longitudinal distance>The fifth degree polynomial coefficient representing the lateral distance:。
the cost calculation of each sampling node is constructed as follows:
smoothing cost:,
offset cost:,
collision cost:,,
wherein,longitudinal distance +.>The next corresponding lateral distance and its first, second and third derivatives,/>For the coordinates of the obstacle in the SL diagram, < +.>Cost minimum and maximum distance from obstacle, +.>The cost between distances adopts a linear relation +.>To indicate (I)>Is the slope of the slope,is an intercept term.
And recursively calculating the cost value of each node according to the formula, and obtaining the path planning initial value I with the minimum cost in the discrete space through backtracking.
And then according to the continuous three-frame state space information (grid map) based on the current and past paths, utilizing the decision neural network model to make a decision of the action to be executed by the self-vehicle at the current moment, and obtaining a decision action value (namely a path planning initial value II) of the self-vehicle at the current moment.
Here the decision neural network model is obtained by training the neural network using a SAC reinforcement learning method. The decision action value includes: the laterally continuous coordinate values of the plurality of path points. The training process of the decision neural network model comprises the following steps:
1. building a state space: the state space is a vehicle-centric path-based laser grid map of three consecutive frames onto which the positions of the vehicle and surrounding obstacles are projected.
2. An action space is constructed which consists of transverse continuous coordinate values of a plurality of path points. The screening range of the path points is a rectangular range with a certain range in front of the vehicle, wherein the longitudinal coordinate values of the discretized path points are selected, the longitudinal intervals of the path points are kept at fixed intervals, and the transverse coordinate values of the path points are output by the decision neural network model.
3. Based on the generated rough path points, a Bezier curve is used to fit a smoother, continuous curvature navigation path.
4. Designing a reward function: the whole algorithm rewards function considers 4 designs, rewards to reach the target, punishments of collision occurrence, curvature of the output paths and smooth migration among the output paths.
5. Designing various and complex simulation environments (including static scenes and complex scenes), training a neural network model in the simulation environments by adopting a SAC reinforcement learning method based on the bonus function design, and obtaining a decision neural network model.
6. The whole training process of the decision neural network model adopts a course learning mode, the training environment difficulty is from simple to complex, the training of the decision neural network model is accelerated, and the robustness of the decision neural network is improved.
Training a neural network model in a simulation environment by utilizing an SAC reinforcement learning method, and after applying corresponding path actions to the vehicle in the simulation environment, ensuring that the vehicle accurately follows a strategy path by a control module, calculating corresponding rewards at the same time, and adjusting strategy output corresponding to the corresponding state by utilizing the rewards.
And finally, carrying out the same cost calculation on the path planning initial value I obtained by searching in the discrete space and the path planning initial value II output by reinforcement learning in the continuous space, wherein the cost function is designed as follows:
fluctuation cost:,
curvature cost:,
smoothing cost, offset cost, collision cost: in keeping with the foregoing manner of calculation,
wherein the fluctuation cost is consideredPath of three consecutive frames in the past +.>And respectively corresponding to the transverse displacement at the same time of the current frame path and the ith frame path, and measuring the fluctuation degree of the current path compared with the historical path by the fluctuation cost. The curvature cost calculation considers that the sampling points are small enough and uniform, +.>、/>、And the curvature cost measures the bending degree between the path points for any adjacent three continuous points of the path points.
By evaluating the cost information, the total cost of the two path planning initial values I, II is compared, and a path with smaller cost is selected as a final path planning initial value.
Then, in order to further optimize the path planning initial value and reduce the constraint dependence of the subsequent continuous space optimization, the path planning initial value is subjected to primary optimization treatment, and the optimized target problem, the loss function construction and the boundary constraint are consistent with those in the step S210, so that a smooth and stable initial path plan is finally obtained.
Finally, from the lateral offset-displacement (SL) graph and the initial path plan with decision information, the quadratic optimization problem can be constructed as follows:
the target path equation constraint is:
,
wherein,is->The corresponding lateral offset and the first and second derivatives are likewise +.>Is->Position-corresponding lateral offset and first and second derivatives thereof, ">Incremental for adjacent longitudinal displacements. The equality constraint is piecewise plus acceleration continuation, ensuring +.>Third derivative is constant +.>Thereby meeting the comfort requirements of the vehicle.
The convex spatial boundary inequality constraint is:
,
wherein,respectively->The convex space corresponding to the position is limited up and down.
The path boundary inequality constraint is:
,
wherein,for start point->Corresponding lateral offsets and first and second derivatives thereof,the lateral offset minimum speed, minimum acceleration, maximum speed, maximum acceleration of the vehicle, respectively.
The objective of optimization is divided into the following three parts:
reference line consistency target:
,
wherein,for longitudinal distance->And the target measures the deviation degree of the target path from the reference line according to the corresponding transverse distance.
Coarse solution consistency target:
,
wherein,for coarse solution of the path at longitudinal distance +.>And the corresponding transverse distance is measured, and the deviation degree of the target path and the rough solution is measured by the target.
Smooth continuity objective:
,
wherein,respectively->Weight coefficients corresponding to the first, second and third order derivatives of the lateral offset. The target measures the continuity of the target path.
And solving the optimization problem by using a quadratic programming solver, wherein the optimization problem comprises the objective path segmentation plus acceleration continuous equality constraint, convex space boundary inequality constraint and path boundary inequality constraint, and the optimization problem takes reference line consistency, rough solution consistency and smooth continuity as optimization targets to solve the optimal path in the objective space. In particular, the initial path plan is returned as the target path when the solution anomalies are optimized.
The method solves the selection problem of fine granularity space solution through the combination of cost search and reinforcement learning, and enhances the accuracy and safety of the path planning initial value; through a multi-objective optimization strategy based on an accurate path planning initial value, the problem that a planning module frequently adjusts parameters in different driving scenes is solved, and the system has good abnormal scene processing capability.
S230: and determining an optimal speed curve corresponding to the optimal path planning, obtaining a target track by combining the optimal path planning and the optimal speed curve, and carrying out cost selection on the multi-lane target track to obtain a final control track.
Specifically, firstly, a displacement-time (ST) diagram is constructed for static and dynamic obstacles in a Flunar coordinate system, in the displacement-time (ST) diagram, uniform scattering points are carried out on time T, non-uniform scattering points (a near-dense and far-sparse mode is adopted here) of longitudinal displacement S, and an initial speed curve is obtained by adopting a node cost (comprising speed, acceleration and collision cost) searching mode in consideration of space inner boundaries and obstacles.
Similar to S220, obtaining a convex space of speed optimization from the initial speed curve, and extracting boundary constraint information from the convex space, wherein the constraints comprise a target speed curve segmentation plus acceleration continuous equality constraint, a convex space boundary inequality constraint and a variable boundary inequality constraint; the optimization objectives include cruise speed, acceleration and acceleration, whereby a final optimal speed profile is solved using quadratic programming optimization.
Then, based on the optimal speed curve, the optimal path planning is interpolated by the displacement (S) to obtain each moment (T) i ) To generate a target trajectory. Considering the state limit of the vehicle, including limit information such as speed, acceleration, curvature, etc., it is verified whether the target track is within the range of the state limit.
Finally, selecting the multi-lane target track through design cost calculation, and obtaining a final control track. Wherein:
the single lane cost calculation is constructed as follows:
lateral offset cost:,
cost of lateral comfort:,
longitudinal comfort penalty:,
wherein,first and second derivatives of longitudinal displacement versus time, respectively>For planning a cycle +.>Acceleration limits are imposed on the maximum longitudinal direction of the vehicle.
And (3) calculating cost information of target tracks of all lanes in parallel, and selecting the target track with the minimum cost as a final output track of the current frame.
Fig. 3 is a schematic diagram of a real-time trajectory planning device, shown in accordance with some embodiments of the present disclosure. As shown in fig. 3, the real-time trajectory planning device 300 includes a determining module 310, an optimizing module 320, and an acquiring module 330. The real-time trajectory planning function may be performed by a planning module in the system shown in fig. 1. Wherein:
a determining module 310, configured to obtain basic navigation point information and pose information of a vehicle at a current moment, determine a target path reference line based on the basic navigation point information, and convert the pose information from a cartesian coordinate system to a flea coordinate system based on the target path reference line to obtain a lateral offset-displacement map and a rasterized map, where the pose information includes a state of the vehicle itself, surrounding traffic participants, and local road network information;
the optimizing module 320 is configured to determine a path planning initial value of the current vehicle through a cost searching and reinforcement learning model, smooth the path planning initial value to obtain an initial path plan, and perform multi-objective optimization on the initial path plan to obtain an optimal path plan;
and the obtaining module 330 is configured to determine an optimal speed curve corresponding to the optimal path plan, combine the optimal path plan and the optimal speed curve to obtain a target track, perform cost selection on the multi-lane target track, and determine a final control track.
One embodiment of the present disclosure provides a real-time trajectory planning device. As shown in fig. 4, the real-time trajectory planning device 400 comprises a memory 420 and a processor 410, the memory 420 being for storing a computer program; the processor 410 is configured to implement the methods described in S210-S230 of fig. 2 when executing the computer program.
In summary, unlike the prior art in which the path rough solution is directly obtained through reinforcement learning and then the optimal path is optimized, the real-time path planning method, device and equipment provided by the embodiments of the present disclosure use reinforcement learning to obtain the path rough solution, and fuse with the rough solution obtained by cost search to obtain the target path rough solution, then optimize to obtain the target path, and finally combine with the speed-optimized result to form the target path. The reference line of the target path is determined through multi-layer cost and smoothing treatment of the basic navigation points, so that the deviation of a model or a search space is solved, and the initial path of the reference line of dynamic planning is more stable and reliable; the method combining cost search and reinforcement learning solves the problem of selecting fine granularity space solutions, and enhances the accuracy and safety of path planning initial values; through a multi-objective optimization strategy based on an accurate path planning initial value, the problem that a planning module frequently adjusts parameters in different driving scenes is solved, and the system has good abnormal scene processing capability.
It will be clear to those skilled in the art that, for convenience and brevity of description, the specific operation of the apparatus and modules described above may refer to the corresponding description in the foregoing apparatus embodiments, which is not repeated here.
While the subject matter described herein is provided in the general context of operating systems and application programs that execute in conjunction with the execution of a computer system, those skilled in the art will recognize that other implementations may also be performed in combination with other types of program modules. Generally, program modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types. Those skilled in the art will appreciate that the subject matter described herein may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like, as well as distributed computing environments that have tasks performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
It is to be understood that the above-described embodiments of the present disclosure are merely illustrative or explanatory of the principles of the disclosure and are not restrictive of the disclosure. Accordingly, any modifications, equivalent substitutions, improvements, or the like, which do not depart from the spirit and scope of the present disclosure, are intended to be included within the scope of the present disclosure. Furthermore, the appended claims of this disclosure are intended to cover all such changes and modifications that fall within the scope and boundary of the appended claims, or the equivalents of such scope and boundary.
Claims (9)
1. A method of real-time trajectory planning, comprising:
basic navigation point information and pose information of a vehicle at the current moment are obtained, a target path reference line is determined based on the basic navigation point information, the pose information is converted from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and a transverse offset-displacement map and a rasterization map are obtained, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information;
determining a path planning initial value of a current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and performing multi-objective optimization on the initial path plan to obtain an optimal path plan;
determining an optimal speed curve corresponding to the optimal path planning, combining the optimal path planning and the optimal speed curve to obtain a target track, and carrying out cost selection on the multi-lane target track to obtain a final control track;
wherein, the determining the path planning initial value of the current vehicle through the cost searching and reinforcement learning model comprises:
sampling along the target path reference line, calculating the cost value of the preset cost of each sampling node, and forming a first path planning initial value corresponding to the preset cost by the sampling nodes with the minimum sum of cost values;
according to the three continuous frames of the grid map based on the target path reference line in the current and past, utilizing a decision neural network model to make a decision of the action to be executed by the vehicle at the current moment, and obtaining a decision action value of the vehicle at the current moment, wherein the decision action value forms a second path planning initial value;
and calculating the cost value of the preset cost for the second path planning initial value, comparing the cost value with the cost value of the first path planning initial value corresponding to the preset cost, and taking the smaller one of the second path planning initial value and the first path planning initial value as the path planning initial value of the current vehicle.
2. The method of claim 1, wherein the determining a target path reference line based on the base navigation point information comprises:
acquiring basic navigation points in a first preset distance in front and a second preset distance in rear in real time according to the positioning information of the current frame of the vehicle;
determining a loss function and a boundary constraint condition of the target path reference line, wherein the loss function comprises a smooth cost, a compact cost and a similar cost, the smooth cost is used for measuring the smoothness of the target path reference line, the compact cost is used for measuring the uniformity degree between target path reference points corresponding to the basic navigation points, and the similar cost is used for measuring the deviation degree of the target path reference line and an original navigation path;
and obtaining the basic navigation points meeting the boundary constraint condition based on a quadratic solver, wherein the path reference points corresponding to the basic navigation points form the target path reference line.
3. The method of claim 1, wherein the training method of the decision neural network model comprises:
constructing a state space from a path-based three-frame continuous grid map centered on a vehicle, wherein the positions of the vehicle and surrounding obstacles are projected onto the grid map;
constructing an action space formed by transverse continuous coordinate values of a plurality of path points sampled in a preset rectangular range in front of the vehicle, wherein the longitudinal intervals of the path points are kept at fixed intervals, and the transverse coordinate values are determined by a decision neural network model;
designing a reward function based on rewards to reach the target, penalties for collision occurrence, curvature of the output paths, and smooth migration between the output paths;
constructing a training set by taking the state space as input and the action space as output;
designing a simulation environment comprising a static scene and a complex scene, training the neural network model for the training set by adopting a SAC reinforcement learning method based on the reward function in the simulation environment, and obtaining the decision neural network model.
4. The method of claim 1, wherein said smoothing the path plan initial value to obtain an initial path plan comprises:
taking the loss function and the boundary constraint condition of the target path reference line as the loss function and the boundary constraint condition of the path planning initial value;
and optimizing the path planning initial value based on a quadratic solver to obtain the initial path planning.
5. The method of claim 1, wherein said multi-objective optimizing said initial path plan to obtain an optimal path plan comprises:
determining a boundary constraint condition of the path based on the transverse offset-displacement diagram and the path planning initial value, and performing multi-objective optimization on the initial path planning based on the boundary constraint condition and a preset optimization objective to obtain an optimal path planning in a target space, wherein the preset optimization objective comprises one or more of a deviation degree of a target path from a target path reference line, a deviation degree of the target path from the path planning initial value and continuity of the target path;
and when the optimal path planning in the target space solves for abnormality, taking the initial path planning as the optimal path planning.
6. The method of claim 1, wherein the determining an optimal speed profile corresponding to the optimal path plan comprises:
constructing a displacement-time diagram for static obstacles and dynamic obstacles of the vehicle under a flener coordinate system;
sampling the displacement-time diagram, calculating the cost value of preset cost of each sampling node, and forming an initial speed curve corresponding to the preset cost by the sampling nodes with the minimum sum of cost values, wherein the preset cost comprises speed cost, acceleration adding cost and collision cost;
obtaining a convex space of speed optimization from the initial speed curve, extracting boundary constraint conditions in the convex space, and performing multi-objective optimization on the initial speed curve based on the boundary constraint conditions and a preset optimization target to obtain an optimal speed curve, wherein the preset optimization target comprises one or more of cruising speed, acceleration and acceleration.
7. The method of claim 1, wherein the combining the optimal path plan and the optimal speed profile to obtain a target trajectory comprises:
and interpolating the optimal path planning through displacement based on the optimal speed curve to obtain speed and path information at each moment so as to generate a target track.
8. A real-time trajectory planning device, comprising:
the determining module is used for acquiring basic navigation point information and pose information of the vehicle at the current moment, determining a target path reference line based on the basic navigation point information, converting the pose information from a Cartesian coordinate system to a Flunar coordinate system based on the target path reference line, and acquiring a transverse offset-displacement map and a rasterized map, wherein the pose information comprises the state of the vehicle, surrounding traffic participants and local road network information;
the optimizing module is used for determining a path planning initial value of the current vehicle through cost searching and reinforcement learning models, smoothing the path planning initial value to obtain an initial path plan, and carrying out multi-objective optimization on the initial path plan to obtain an optimal path plan;
the acquisition module is used for determining an optimal speed curve corresponding to the optimal path planning, combining the optimal path planning and the optimal speed curve to obtain a target track, and carrying out cost selection on the multi-lane target track to acquire a final control track;
wherein, the determining the path planning initial value of the current vehicle through the cost searching and reinforcement learning model comprises:
sampling along the target path reference line, calculating the cost value of the preset cost of each sampling node, and forming a first path planning initial value corresponding to the preset cost by the sampling nodes with the minimum sum of cost values;
according to the three continuous frames of the grid map based on the target path reference line in the current and past, utilizing a decision neural network model to make a decision of the action to be executed by the vehicle at the current moment, and obtaining a decision action value of the vehicle at the current moment, wherein the decision action value forms a second path planning initial value;
and calculating the cost value of the preset cost for the second path planning initial value, comparing the cost value with the cost value of the first path planning initial value corresponding to the preset cost, and taking the smaller one of the second path planning initial value and the first path planning initial value as the path planning initial value of the current vehicle.
9. A real-time trajectory planning device comprising a memory and a processor:
the memory is used for storing a computer program;
the processor being adapted to implement the method according to any of claims 1-7 when executing the computer program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311426011.1A CN117141520B (en) | 2023-10-31 | 2023-10-31 | Real-time track planning method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311426011.1A CN117141520B (en) | 2023-10-31 | 2023-10-31 | Real-time track planning method, device and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117141520A CN117141520A (en) | 2023-12-01 |
CN117141520B true CN117141520B (en) | 2024-01-12 |
Family
ID=88903131
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311426011.1A Active CN117141520B (en) | 2023-10-31 | 2023-10-31 | Real-time track planning method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117141520B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117406756B (en) * | 2023-12-06 | 2024-03-08 | 苏州元脑智能科技有限公司 | Method, device, equipment and storage medium for determining motion trail parameters |
CN118025226A (en) * | 2024-02-22 | 2024-05-14 | 北京集度科技有限公司 | Track generation method, track generation device, computer equipment and storage medium |
CN118089773B (en) * | 2024-04-26 | 2024-08-02 | 中国第一汽车股份有限公司 | Vehicle motion trail calculation method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111273668A (en) * | 2020-02-18 | 2020-06-12 | 福州大学 | Unmanned vehicle motion track planning system and method for structured road |
CN111473794A (en) * | 2020-04-01 | 2020-07-31 | 北京理工大学 | Structural road unmanned decision planning method based on reinforcement learning |
CN112918486A (en) * | 2021-02-08 | 2021-06-08 | 北京理工大学 | Space-time behavior decision and trajectory planning system and method |
CN113386766A (en) * | 2021-06-17 | 2021-09-14 | 东风汽车集团股份有限公司 | Continuous and periodic self-adaptive synchronous online trajectory planning system and method |
WO2021238303A1 (en) * | 2020-05-29 | 2021-12-02 | 华为技术有限公司 | Motion planning method and apparatus |
CN114715193A (en) * | 2022-04-15 | 2022-07-08 | 重庆大学 | Real-time trajectory planning method and system |
CN116161056A (en) * | 2023-03-03 | 2023-05-26 | 湖南大学 | Structured road vehicle track planning method and system based on reinforcement learning |
-
2023
- 2023-10-31 CN CN202311426011.1A patent/CN117141520B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061277A (en) * | 2019-12-31 | 2020-04-24 | 歌尔股份有限公司 | Unmanned vehicle global path planning method and device |
CN111273668A (en) * | 2020-02-18 | 2020-06-12 | 福州大学 | Unmanned vehicle motion track planning system and method for structured road |
CN111473794A (en) * | 2020-04-01 | 2020-07-31 | 北京理工大学 | Structural road unmanned decision planning method based on reinforcement learning |
WO2021238303A1 (en) * | 2020-05-29 | 2021-12-02 | 华为技术有限公司 | Motion planning method and apparatus |
CN112918486A (en) * | 2021-02-08 | 2021-06-08 | 北京理工大学 | Space-time behavior decision and trajectory planning system and method |
CN113386766A (en) * | 2021-06-17 | 2021-09-14 | 东风汽车集团股份有限公司 | Continuous and periodic self-adaptive synchronous online trajectory planning system and method |
CN114715193A (en) * | 2022-04-15 | 2022-07-08 | 重庆大学 | Real-time trajectory planning method and system |
CN116161056A (en) * | 2023-03-03 | 2023-05-26 | 湖南大学 | Structured road vehicle track planning method and system based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN117141520A (en) | 2023-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117141520B (en) | Real-time track planning method, device and equipment | |
CN112356830B (en) | Intelligent parking method based on model reinforcement learning | |
KR102292277B1 (en) | LIDAR localization inferring solutions using 3D CNN networks in autonomous vehicles | |
Toledo et al. | Estimation of vehicle trajectories with locally weighted regression | |
EP3959576A1 (en) | Methods and systems for trajectory forecasting with recurrent neural networks using inertial behavioral rollout | |
CN114945885A (en) | Adaptive control of autonomous or semi-autonomous vehicles | |
CN111971574A (en) | Deep learning based feature extraction for LIDAR localization of autonomous vehicles | |
CN112242069A (en) | Method and device for determining vehicle speed | |
CN114005280A (en) | Vehicle track prediction method based on uncertainty estimation | |
CN111177934B (en) | Method, apparatus and storage medium for reference path planning | |
CN111257853B (en) | Automatic driving system laser radar online calibration method based on IMU pre-integration | |
CN114386599B (en) | Method and device for training trajectory prediction model and trajectory planning | |
CN114169463A (en) | Autonomous prediction lane information model training method and device | |
CN117698769B (en) | Automatic driving track planning method and device, electronic equipment and storage medium | |
Zhao et al. | Adaptive non-linear joint probabilistic data association for vehicle target tracking | |
KR20230024392A (en) | Driving decision making method and device and chip | |
CN111738046B (en) | Method and apparatus for calibrating a physics engine of a virtual world simulator for learning of a deep learning-based apparatus | |
Yoon et al. | Trajectory Prediction Using Graph-Based Deep Learning for Longitudinal Control of Autonomous Vehicles: A Proactive Approach for Autonomous Driving in Urban Dynamic Traffic Environments | |
CN113885496B (en) | Intelligent driving simulation sensor model and intelligent driving simulation method | |
CN114104005B (en) | Decision-making method, device and equipment of automatic driving equipment and readable storage medium | |
Cai et al. | Implementation of the Human‐Like Lane Changing Driver Model Based on Bi‐LSTM | |
JP2023066619A (en) | Control device, control method and control program | |
Yang et al. | Deep Reinforcement Learning Lane-Changing Decision Algorithm for Intelligent Vehicles Combining LSTM Trajectory Prediction | |
CN116504052B (en) | Perception data complement method based on shockwave traffic flow model | |
Chen et al. | An Acceleration Denoising Method Based on an Adaptive Kalman Filter for Trajectory in Merging Zones |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |