CN112061116B - Parking strategy of reinforcement learning method based on potential energy field function approximation - Google Patents

Parking strategy of reinforcement learning method based on potential energy field function approximation Download PDF

Info

Publication number
CN112061116B
CN112061116B CN202010847538.1A CN202010847538A CN112061116B CN 112061116 B CN112061116 B CN 112061116B CN 202010847538 A CN202010847538 A CN 202010847538A CN 112061116 B CN112061116 B CN 112061116B
Authority
CN
China
Prior art keywords
potential energy
parking
vehicle
energy field
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010847538.1A
Other languages
Chinese (zh)
Other versions
CN112061116A (en
Inventor
李道飞
刘关明
刘傲
林思远
肖斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010847538.1A priority Critical patent/CN112061116B/en
Publication of CN112061116A publication Critical patent/CN112061116A/en
Application granted granted Critical
Publication of CN112061116B publication Critical patent/CN112061116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/06Automatic manoeuvring for parking
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

The invention discloses a parking strategy of a reinforcement learning method based on potential energy field function approximation, which approximates a state value function in the reinforcement learning process by designing a potential energy field; selecting an execution action from a preset executable action space through an epsilon-greedy strategy according to the vehicle state value function; predicting a next state of the vehicle according to the current state of the vehicle and the selected action by using a state transition equation of the vehicle; and repeating the process of selecting the execution action and the prediction state until the parking is finished, wherein all the selected action sequences form a real-time parking planning path. The method is suitable for various parking areas through training potential energy field parameters, carries out parking path planning in different scenes, and has universality; the parking path can be planned in real time and can be accurately tracked.

Description

Parking strategy of reinforcement learning method based on potential energy field function approximation
Technical Field
The invention belongs to the technical field of vehicles, and particularly relates to a parking strategy based on a potential energy field function approximation reinforcement learning method.
Background
At present, the mainstream routes of the automatic parking technology are still mostly based on the traditional path planning algorithm, and mainly divided into three types, namely a random path generation algorithm, a path planning algorithm for generating a path by using function fitting, a rule-based path planning algorithm and the like. The Random path generation algorithm, such as RRT (rapid-expanding Random Tree algorithm), PRM (Random Roadmap algorithm), etc., needs to randomly generate a path in a pre-manufactured scene map, then perform collision detection on the randomly generated path (or detect whether the path is in a drivable area), and then select an optimal parking path through an optimization target among all paths that meet the requirements. The method for generating the path by utilizing function fitting can use a polynomial, a Bezier curve, an arc tangent function and the like, under the condition of obtaining environmental information, according to a selected function form, optimizing and solving by utilizing conditions such as collision constraint, parking geometric constraint, vehicle performance constraint and the like, and finally determining parameters of the selected function, thereby generating the path. The rule-based path planning method is mainly used for planning according to driving experiences and different position relations of a vehicle and a parking space, so that the whole path is generated.
However, the above three parking path planning methods all have certain limitations, the random path generation algorithm has high requirements on the sensor, information of the whole parking environment needs to be obtained by sensing in advance, and it is difficult to ensure that the generated path is a track that can be actually realized by the vehicle. The path generated based on the function fitting has strict requirements on parking places and the initial pose of the vehicle, has almost no applicability to different scenes, is difficult to continuously plan under the condition of large path tracking error in the actual parking process of the vehicle, and is not a parking algorithm which is easy to meet real-time planning. The rule-based parking path planning method is also not flexible enough, and it is difficult to ensure the completeness of the set rule, and the algorithm rule needs to be re-established manually for a new scene.
The artificial potential energy field method is a classic robot path planning scheme, which utilizes the concept of potential energy field and generally considers only moving objects as particles. However, in a parking scene, the vehicle cannot be regarded as a simple mass point or a circle due to the shape and steering characteristics of the vehicle, and therefore a path planned directly by the artificial potential energy field cannot be actually realized for vehicle tracking control. Potential energy is defined as the energy that an object or system has due to its position or state, called potential energy, which is not owned by the object alone, but is shared by interacting objects. During the parking process of a vehicle, the parking environment (including terrain and other vehicles) can be considered as a field, the vehicle has potential energy therein, the potential energy is related to the position and the state of the vehicle, and the potential energy is considered as the target for the pursuit of the parking process, so that the parking process is a process for pursuing the potential energy to increase.
Disclosure of Invention
The invention aims to provide a parking strategy based on a reinforced learning method of potential energy field function approximation aiming at the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a parking strategy of a reinforcement learning method based on potential energy field function approximation is characterized in that a state value function in the reinforcement learning process is approximated by designing a potential energy field, and the potential energy field function is embodied as quantitative representation of different factors such as the current state of a vehicle, a target parking space, a drivable area and vehicle parameters in the vehicle state value function; predicting a next state corresponding to each executable action by using a state transfer equation according to the current state of the vehicle and a preset executable action space, then calculating a state cost function value of each predicted state by combining a potential energy field, and selecting one action with the highest state cost function value from the predicted states by an epsilon-greedy strategy; and selecting the next action according to the state corresponding to the action, repeating the process of predicting the state and selecting the execution action until the parking is finished, and finally generating a real-time parking planning path according to the selected action sequence.
Further, a guide line is designed according to a required path, a potential energy field is generated by using the guide line and parking boundary constraint, potential energy field parameters are optimized, and the finally obtained potential energy field function can be used for representing a state value function value of the vehicle in each state in the vehicle parking process; the parking area comprises a drivable area and a target parking space, and the parking boundary is the outer contour of the parking area.
Further, the designed potential energy field is divided into a potential energy field generated by an attractive force part and a potential energy field generated by a repulsive force part; the gravitational partial potential energy field is generated by designed virtual guidelines, and the fields generated by different virtual guidelines have different priorities in different regions, the field with the higher priority covering the field with the lower priority; the repulsive force component potential energy field is generated by the parking boundary.
Further, different potential energy fields are designed for different types of parking areas, and the potential energy fields of different parts have different action ranges; wherein, the potential energy of the gravitational part is a positive value, the more the gravitational potential energy is closer to the guide line in the action range, the more the gravitational potential energy is closer to the terminal point, and the gravitational potential energy of the terminal point is the largest; the potential energy of the repulsion part is a negative value, the potential energy of the repulsion at the parking boundary is negative and infinite, and the repulsion closer to the parking boundary within the action range is larger.
Furthermore, according to the vehicle state, different parts of the potential energy field have different effects on different positions of the vehicle, and the generated potential energy is different, so that the contribution to the vehicle state value function is different; the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle, the state value function generated by the parking boundary of the repulsive field part acts on four angular points of the outer contour of the vehicle, and the potential energy field generated by the contact angular points of the target parking space and the travelable area acts on the outer contour edge of the vehicle.
Further, the state cost function value v0 of the gravitational partial potential energy field is:
v0=f(X)
Figure BDA0002643577410000021
X=[(x-xtarget),(y-ytarget),(yaw-yawtarget)]
the vehicle state at least comprises x and y coordinates of the middle point of the rear axle of the vehicle in a parking space coordinate system and an included angle yaw between the longitudinal central axis of the vehicle and the x axle in the parking space coordinate system, and the included angle yaw is marked as (x, y, yaw); (x)target,ytarget,yawtarget) Is in the end point state; v0 represents the function value of the potential energy field caused by gravity, the function f is the function of the potential energy field caused by gravity; c0And C1Is the parameter to be trained.
Value of value v1 of state cost of repulsive force component potential energy fieldiComprises the following steps:
v1i=-C2/di 2
the vehicle contour has four angular points, the target parking space and the drivable area have two contact angular points, i is 1-6, d1~d4For the shortest distance of each vehicle contour corner to the parking boundary, d5~d6The shortest distance from the contact angular point of each target parking space and the drivable area to the vehicle contour edge is calculated; if d isiBeyond the repulsive force action range, v1i=0;C2Is a parameter to be trained.
The final vehicle state cost function value V is:
Figure BDA0002643577410000031
further, defining a parking success rate as an optimization target to optimize the potential energy field parameters, wherein the parking success rate is defined as follows: under the same scene, N rounds are trained, the initial state of the vehicle in each round is randomly generated, and if M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the group of potential energy field parameters is M/Nx 100%; the end mark comprises the successful driving of the vehicle into the parking space, the driving of the vehicle out of the parking area and the overtime of parking.
Further, the executing action consists of a steering wheel angle and gears, wherein the gears comprise a forward gear, a reverse gear and a neutral gear.
Further, before predicting the next state by using a state transition equation, removing the action for enabling the vehicle to collide with the parking boundary from a preset executable action space, and selecting the executable action by using an epsilon-greedy strategy; and after a series of action sequences from the current state to the final prediction state are obtained according to an epsilon-greedy strategy and a state transition equation, trimming the action sequences to remove circular actions, and obtaining a final parking planning path.
Further, the vehicle parameters include front suspension, rear suspension, wheel base, vehicle width, minimum turning radius, and transmission ratio.
Compared with the prior art, the invention has the beneficial effects that: different from the traditional reinforcement learning method, the method of the invention does not set timely reward, but gives a state value function by a potential energy field approximation method, and in the training process of the reinforcement learning algorithm, the method takes the vehicle observation state, the vehicle state value function prediction value, the vehicle state transition matrix and the vehicle prediction action as the basic components of the algorithm of the invention:
1. by using the method, a specific vehicle can train a parking scene in a minimum standard parking area by self, design a potential energy field function, and calculate corresponding potential energy field parameters of the potential energy field function in an off-line manner, and when meeting other travelable areas, if the other parking areas contain the trained area, the parking path can be planned by directly using the training parameters;
2. by utilizing the method, different potential energy fields can be trained for irregular or various different parking areas respectively, and parking path planning can be carried out under different scenes, so that the method has universality;
3. because the observation state of the vehicle is real-time, the vehicle can plan a track under the action of the potential energy field in different states, which is favorable for improving the generalization capability of the vehicle, and the invention is one of the invention points;
4. because the reinforcement learning is a process established under a vehicle kinematic model and the final output path is back-deduced from the predicted action sequence of the vehicle, the path of the invention is ensured to be accurately tracked by the vehicle;
5. the potential energy field is constructed by establishing a proper virtual guide line, and the condition of the potential energy field is measured by introducing the parking success rate, so that the final state value function is more scientific, which is one of the invention points of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;
fig. 1 is a schematic view of an application scenario according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a potential energy field construction method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The invention relates to a reinforcement learning intelligent parking strategy based on a potential energy field approximation state value function, which combines reinforcement learning and a potential energy field to plan a parking path. The method is characterized in that a potential energy field is designed to approach a state value function in the reinforcement learning process, the potential energy field is used for quantitatively expressing different factors such as the current attitude, the target parking space, the drivable area and vehicle parameters of the vehicle in the state value function of the vehicle, wherein the vehicle parameters comprise front suspension, rear suspension, axle distance, vehicle width, minimum turning radius, rotation ratio and the like; predetermining an executable action space; obtaining a state transition equation by utilizing a vehicle motion model; obtaining the current state of the vehicle at each moment, predicting the next-moment prediction state corresponding to each action of all executable action spaces according to a state transfer equation, calculating a state value function value of the prediction state, and selecting a certain action corresponding to the state value function value meeting the strategy according to an epsilon-greedy strategy; outputting the predicted state of the vehicle in the one operation; and selecting corresponding actions according to an epsilon-greedy strategy, repeating the processes of predicting the state and selecting the actions until the parking is finished, finally outputting the vehicle predicting state in the whole process, performing certain smoothing, and generating a real-time parking planning path according to the selected action sequence. The method is suitable for different parking scenes such as vertical parking, lateral parking, oblique parking and the like, and only needs to design corresponding potential energy fields and optimize parameters in different scenes.
Before the real vehicle runs, a kinematics model is built according to specific real vehicle parameters, a required parking area is determined according to the real vehicle parameters and sensor information, and corresponding potential energy field parameters are built under an offline condition. In a specific task, firstly, environment parameters are input into an algorithm, the algorithm can generate a specific potential energy field, then the current pose of the vehicle is input, and the parking algorithm can plan a path starting from the current point and enabling successful parking. And tracking the vehicle according to the planned path or according to a predicted action sequence given by an algorithm. In the real movement process of the vehicle, the algorithm can be called in real time, only the current pose needs to be known, and no matter whether the vehicle has a large tracking error or not, the parking algorithm can plan a parking path with the current point as a starting point in real time. The difficulty of design reward in a parking environment existing in the process of building strategies through reinforcement learning is avoided by using the potential energy field as the estimation of the state value function. And for each safety position in the potential energy field, the vehicle can successfully cross the local optimal state due to the searchability with certain probability of the adopted strategy, and finally successfully reaches the target parking space.
The invention specifically comprises the following steps:
(1) the value of the vehicle state cost function is determined by the parking environment, the environment information is obtained by using a sensor at the initial stage of a parking algorithm, and the vehicle state cost function is generated according to the idea of borrowing the potential energy field from the obtained environment information. The vehicle observation state at least comprises an x coordinate of a midpoint of a rear axle of the vehicle under a parking space coordinate system, a y coordinate of the midpoint of the rear axle of the vehicle under the parking space coordinate system, and an included angle yaw between a longitudinal central axis of the vehicle and the x axis under the parking space coordinate system, wherein the combination is (x, y, yaw).
For example, in the vertical parking application scenario shown in fig. 1, the length of the travelable region a is 30m, the width of the travelable region a is 5m (5m or more may be regarded as 5m), the rectangular region B is the target parking space, and the length × width of the target parking space is 6m × 2.2 m. Vehicle parameters: the wheelbase is 2.7m, the front suspension is 0.985m, the rear suspension is 0.8m, the vehicle width is 1.9m, the minimum turning radius is 5.8m, and the steering gear ratio is 20. After the specific vehicle parameters and the expected parking area, the potential energy field of the parking process can be determined accordingly.
(2) Constructing a potential energy field:
and (1.1) under the global coordinate, designing a virtual guide line according to a required path, wherein the guide line points to a parking terminal position.
(1.2) restraining and generating a potential energy field by using a guide line and a parking boundary, wherein the parking boundary is an outer contour of a drivable area and a target parking space; designing different potential energy fields for different types of parking areas; the potential energy of different obstacles and guide wires have different ranges of action; combining collision constraint and vehicle parking endpoint constraint in a potential energy field, the designed potential energy field is divided into a potential energy field generated by a gravitational part and a potential energy field generated by a repulsive part:
the potential energy field of the gravitational part is generated by a designed virtual guide line, the potential energy field of the guide line shows a positive value, and the action range of the partial field is large; in the process that the vehicle approaches the terminal state, the potential energy of the vehicle is continuously increased; fields generated by different virtual guidelines have different priorities in different regions, fields with high priority will overlap fields with low priority;
the repulsive force part potential energy field is generated by a parking boundary, and the repulsive force field only influences the vehicle in a small range, so that the collision avoidance requirement can be met; at the parking boundary, the potential energy of the repulsive force part received by the vehicle is set to be minus infinity.
According to the vehicle state, different parts of the potential energy field have different effects on different position points on the vehicle, the generated potential energy is different, and therefore the contribution to the vehicle state value function is different:
the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle;
a state value function generated by a parking boundary of the repulsion field part acts on four corner points of the outer contour of the vehicle, and a potential energy field generated by contact corner points of the parking space and the drivable area acts on the contour edge of the outer contour of the vehicle.
As shown in the potential energy field function design of the embodiment shown in FIG. 2, the parking boundary is characterized by repulsive force, i.e., the closer the potential energy field is, the more negative the estimated state cost function of the vehicle is, but the repulsive force acting range of the boundary is limited to a small area near the boundary. Accordingly, for a desired virtual guide line (a portion indicated by a dotted arrow in fig. 2), which is in the form of attractive force, the closer to the guide line, the more positive the estimated state cost function value of the vehicle is, and the larger the range of action of the guide line of the vehicle is, so that the vehicle can be planned by the action of the attractive force at each position of the parking area. For different virtual guide lines, different priorities are given according to different areas where the vehicle is located, and the attraction with the higher priority covers the attraction with the lower priority, for example, in a vehicle travelable area and a parking space area (denoted as area C) included in a dotted line frame shown in fig. 2, the priority of the guide line 3 is higher, that is, in the area, the vehicle is not influenced by the guide lines 1 and 2; the guideline 1,2 has a higher priority in other travelable regions (denoted as D-regions) not included in the dash-dot line region.
(3) And (3) parking: the invention obtains the parking path by a reinforcement learning method, and more particularly, learns the parking strategy from an exploration sequence. Determining the pose of the vehicle in the environment according to the current vehicle state, determining all current executable action spaces, predicting the next-moment prediction state corresponding to each action of all the executable action spaces according to a state transfer equation, calculating the state value function value of the prediction state, selecting the state value function value (maximum) meeting the strategy according to an epsilon-greedy strategy, and finally selecting the corresponding prediction action; updating the state corresponding to the predicted action to be the current state, and calculating the predicted state and the corresponding predicted state cost function of each action based on the current state so as to select the next action according to a greedy strategy; and repeating the process until the vehicle is parked successfully, the vehicle exceeds the parking area or the parking time is out, and finishing the planning. The step (3) is specifically as follows:
(3.1) the vehicle can generate an action sequence to a parking space at any safe position in the parking area according to the method, and the action selection generated strategy is an epsilon-greedy strategy (the next action is randomly selected according to the probability of epsilon, the probability of 1-epsilon with the maximum profit is selected according to the probability of epsilon, and epsilon is more than 0 and less than or equal to 1), namely, a certain exploration probability is obtained. Meanwhile, during search, all action elements which enable the vehicle to collide with the boundary line of the parking area are removed from the executable action space, and the remaining subspace is the action space for searching in the current state of the vehicle. The executable action of the vehicle, i.e. the action space (action) of the vehicle, is the output capable of controlling the motion of the vehicle, in the actual situation, the vehicle speed is set as the parking vehicle speed, and the executable action of the vehicle is composed of two dimensions of a steering wheel angle SW and Gear information (Gear), wherein the Gear comprises a forward Gear, a reverse Gear, a neutral Gear and the like.
And (3.2) generating a vehicle state transition matrix according to the performance constraint of the vehicle and the vehicle kinematic model, wherein the vehicle state transition matrix is used for representing that the vehicle obtains the next state from one state transition under a certain action. The kinematic model of the vehicle, which is established on the basis of the vehicle parameters, can be based on the current state (x) of the vehicle0,y0,yaw0) And input action (SW, Gear) for estimating the state (x) of the vehicle at the next moment1,y1,yaw1)。
(3.3) for each state of the vehicle, the potential energy field can give a corresponding state cost function value.
The guiding line of the vehicle gives the vehicle a target end state (x)target,ytarget,yawtarget) The state cost function value for this portion is given by the difference X:
v0=f(X)
Figure BDA0002643577410000061
X=[(x-xtarget),(y-ytarget),(yaw-yawtarget)]
wherein v0 represents the function value of the potential energy field caused by the attractive force, and the function f is the function of the potential energy field caused by the attractive force; c0And C1Is the parameter to be trained.
For the repulsive force part in the potential energy field, as each point on the vehicle body contour of the vehicle needs to meet the collision avoidance requirement, the repulsive force acts on four corner points and four contour edges of the vehicle contour. Calculating corresponding coordinate values of four corner points of the vehicle according to the current state (x, y, yaw) of the vehicle and body parameters of the vehicle, and calculating the shortest distance d from each point to the boundary lineiWherein, the subscript i of the four corner points is 1,2,3, 4; and then calculating the shortest distance d from each point to the vehicle outline according to the coordinates of two contact angular points of the parking space and the drivable areai(i ═ 5, 6); if d isiIn the action range of repulsion, the value of the state cost in the part can be expressed as
v1i=-C2/di 2
Wherein, v1iRepresents the value of the potential energy field function caused by the repulsive force, C2Is a parameter to be trained. If d isiIf the repulsive force range of the parking boundary is exceeded, the value of the partial state cost function is v1i=0。
The final vehicle state cost function value V is calculated as
Figure BDA0002643577410000071
(3.4) applying the method of the invention, obtaining a series of action sequences from the current state to the final prediction state. And pruning the action sequence, specifically deleting an action subsequence which enables the state track to have a loop, and then performing simulation deduction by using a state transition equation, the current state and the pruned action sequence to output a final parking planning path.
(4) According to specific vehicle parameters, obtaining optimal potential energy field function parameters through optimization training, wherein the optimal potential energy field function parameters are as follows:
defining the parking success rate related to the potential energy field function parameters as an optimization target to optimize the potential energy field parameters as follows: and (3) under the same scene, training N rounds, randomly generating the initial state of the vehicle of each round, and executing the step (3) according to the observation state of the vehicle, the executable action, the state transition matrix and the current potential energy field to perform the parking process of the vehicle. The end phase has three flags: the vehicle is successfully parked in the parking space, the vehicle is driven out of the parking area, and after a certain maximum step length, any mark of the three is met, namely the parking stage of the round is considered to be finished. If M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the set of potential energy field parameters is M/Nx 100%. And if the parking success rate defined above is related to the function of the potential energy field, the parking success rate parameter is taken as an optimization target, the potential energy field is optimized off line, and the optimal potential energy field function under the specific environment is found. Based on trained parameter C0,C1,C2And (3) waiting for the number, the vehicle can plan a place with the highest state value from the current state value function value to the whole potential energy field in real time in the reinforcement learning, the state with the highest state value is in the parking space, and the finally obtained potential energy field function can be used for representing the reinforcement learning state value function value of the vehicle in each state in the parking process of the vehicle, namely, a parking path can be successfully planned by using the method.
The present invention is not limited to any particular parking area, and the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A parking strategy based on a potential energy field function approximation reinforcement learning method is characterized in that: the state value function in the reinforcement learning process is approximated by designing a potential energy field, and the function of the potential energy field is embodied as the quantitative representation of different factors of the current state, the target parking space, the drivable area and the vehicle parameters of the vehicle in the vehicle state value function; predicting a next state corresponding to each executable action by using a state transfer equation according to the current state of the vehicle and a preset executable action space, then calculating a state cost function value of each predicted state by combining a potential energy field, and selecting an action with the highest state cost function value from the predicted states by an epsilon-greedy strategy; selecting the next action according to the state corresponding to the action, repeating the process of predicting the state and selecting the execution action until the parking is finished, and finally generating a real-time parking planning path according to the selected action sequence;
the designed potential energy field is divided into a potential energy field generated by an attractive force part and a potential energy field generated by a repulsive force part;
state cost function value of gravitational partial potential energy fieldv0Comprises the following steps:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
X=[
Figure DEST_PATH_IMAGE006
]
wherein the vehicle state is at leastThe method comprises the following steps of (1) recording an x coordinate and a y coordinate of a midpoint of a rear axle of a vehicle in a parking space coordinate system and an included angle yaw between a longitudinal central axis of the vehicle and an x axis in the parking space coordinate system as (x, y, yaw); (x)target,ytarget,yawtarget) Is in the end point state;v0representing a potential energy field function value caused by gravitational forces, the function f being a potential energy field function caused by gravitational forces; c0And C1Is a parameter to be trained;
value of state cost function of repulsive force component potential energy fieldv1 i Comprises the following steps:
v1i=-C2/di 2
the vehicle contour has four angular points, the target parking space and the drivable area have two contact angular points, i = 1-6, d1 ~d4For the shortest distance of each vehicle contour corner to the parking boundary, d5~d6The shortest distance from the contact angular point of each target parking space and the drivable area to the vehicle contour edge is calculated; if d isiBeyond the action range of the repulsive force, thenv1 i =0;C2Is a parameter to be trained;
final vehicle state cost function value
Figure DEST_PATH_IMAGE008
Comprises the following steps:
Figure DEST_PATH_IMAGE010
2. the parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: designing a guide line according to a required path, generating a potential energy field by using the guide line and parking boundary constraint, optimizing potential energy field parameters, and using the finally obtained potential energy field function to represent a state value function value of the vehicle in each state in the vehicle parking process; the parking boundary is the outer contour of a parking area, and the parking area comprises a driving area and a target parking space.
3. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: the gravitational partial potential energy field is generated by designed virtual guidelines, and the fields generated by different virtual guidelines have different priorities in different regions, the field with the higher priority covering the field with the lower priority; the repulsive force component potential energy field is generated by the parking boundary.
4. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: different potential energy fields are designed for different types of parking areas, and the potential energy fields of different parts have different action ranges; wherein, the potential energy of the gravitational part is a positive value, the more the gravitational potential energy is closer to the guide line in the action range, the more the gravitational potential energy is closer to the terminal point, and the gravitational potential energy of the terminal point is the largest; the potential energy of the repulsion part is a negative value, the potential energy of the repulsion at the parking boundary is negative and infinite, and the repulsion closer to the parking boundary within the action range is larger.
5. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: according to the vehicle state, different parts of the potential energy field have different effects on different positions of the vehicle, and the generated potential energy is different, so that the potential energy contributes to the vehicle state value function differently; the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle, the state value function generated by the parking boundary of the repulsive field part acts on four angular points of the outer contour of the vehicle, and the potential energy field generated by the contact angular points of the target parking space and the travelable area acts on the outer contour edge of the vehicle.
6. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: defining a parking success rate as an optimization target to optimize potential energy field parameters, wherein the parking success rate is defined as follows: under the same scene, N rounds are trained, the initial state of the vehicle in each round is randomly generated, and if M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the group of potential energy field parameters is M/Nx 100%; the end mark comprises the successful driving of the vehicle into the parking space, the driving of the vehicle out of the parking area and the overtime of parking.
7. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: the executing action consists of a steering wheel corner and gears, wherein the gears comprise a forward gear, a reverse gear and a neutral gear.
8. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: before predicting the next state by using a state transition equation, removing the action of colliding the vehicle with a parking boundary from a preset executable action space, and selecting the executable action by using an epsilon-greedy strategy; and after a series of action sequences from the current state to the final prediction state are obtained according to an epsilon-greedy strategy and a state transition equation, trimming the action sequences to remove circular actions, and obtaining a final parking planning path.
9. The parking strategy based on the reinforcement learning method of the potential energy field function approximation is characterized in that: the vehicle parameters include front suspension, rear suspension, wheelbase, vehicle width, minimum turning radius, and transmission ratio.
CN202010847538.1A 2020-08-21 2020-08-21 Parking strategy of reinforcement learning method based on potential energy field function approximation Active CN112061116B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010847538.1A CN112061116B (en) 2020-08-21 2020-08-21 Parking strategy of reinforcement learning method based on potential energy field function approximation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010847538.1A CN112061116B (en) 2020-08-21 2020-08-21 Parking strategy of reinforcement learning method based on potential energy field function approximation

Publications (2)

Publication Number Publication Date
CN112061116A CN112061116A (en) 2020-12-11
CN112061116B true CN112061116B (en) 2021-10-29

Family

ID=73658797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010847538.1A Active CN112061116B (en) 2020-08-21 2020-08-21 Parking strategy of reinforcement learning method based on potential energy field function approximation

Country Status (1)

Country Link
CN (1) CN112061116B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112677983B (en) * 2021-01-07 2022-04-12 浙江大学 System for recognizing driving style of driver
CN113335270B (en) * 2021-07-01 2022-05-03 湖南大学 Parking path planning method and device
CN113705474B (en) * 2021-08-30 2022-04-15 北京易航远智科技有限公司 Parking space detection method and device
CN115472038B (en) * 2022-11-01 2023-02-03 南京杰智易科技有限公司 Automatic parking method and system based on deep reinforcement learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007055389A1 (en) * 2007-11-20 2009-05-28 Valeo Schalter Und Sensoren Gmbh Method and apparatus for collision prevention when planning a path for parking a vehicle
JP6764170B2 (en) * 2017-05-11 2020-09-30 日野自動車株式会社 Backward parking support device for connected vehicles
CN109318890A (en) * 2018-06-29 2019-02-12 北京理工大学 A kind of unmanned vehicle dynamic obstacle avoidance method based on dynamic window and barrier potential energy field
CN110136481B (en) * 2018-09-20 2021-02-02 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning

Also Published As

Publication number Publication date
CN112061116A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN112061116B (en) Parking strategy of reinforcement learning method based on potential energy field function approximation
CN110136481B (en) Parking strategy based on deep reinforcement learning
CN110297494B (en) Decision-making method and system for lane change of automatic driving vehicle based on rolling game
Shi et al. Driving decision and control for automated lane change behavior based on deep reinforcement learning
Li et al. An optimization-based path planning approach for autonomous vehicles using the DynEFWA-artificial potential field
CN109976340B (en) Man-machine cooperation dynamic obstacle avoidance method and system based on deep reinforcement learning
Lin et al. Decision making through occluded intersections for autonomous driving
CN110969848A (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
CN114153213A (en) Deep reinforcement learning intelligent vehicle behavior decision method based on path planning
CN111645673B (en) Automatic parking method based on deep reinforcement learning
CN113291318B (en) Unmanned vehicle blind area turning planning method based on partially observable Markov model
CN113311828B (en) Unmanned vehicle local path planning method, device, equipment and storage medium
CN114859905A (en) Local path planning method based on artificial potential field method and reinforcement learning
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Li et al. Adaptive sampling-based motion planning with a non-conservatively defensive strategy for autonomous driving
CN116659501A (en) Data processing method and device and vehicle
Yamaguchi et al. Model predictive path planning for autonomous parking based on projected C-space
Garzón et al. Game theoretic decision making based on real sensor data for autonomous vehicles’ maneuvers in high traffic
Huy et al. A practical and optimal path planning for autonomous parking using fast marching algorithm and support vector machine
Wang et al. A hierarchical planning framework of the intersection with blind zone and uncertainty
CN116551703A (en) Motion planning method based on machine learning in complex environment
CN116027788A (en) Intelligent driving behavior decision method and equipment integrating complex network theory and part of observable Markov decision process
CN116009558A (en) Mobile robot path planning method combined with kinematic constraint
CN113829351B (en) Cooperative control method of mobile mechanical arm based on reinforcement learning
Liang et al. Investigations on Speed Planning Algorithm and Trajectory Tracking Control of Intersection Scenarios Without Traffic Signs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant