CN112061116B - A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy - Google Patents

A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy Download PDF

Info

Publication number
CN112061116B
CN112061116B CN202010847538.1A CN202010847538A CN112061116B CN 112061116 B CN112061116 B CN 112061116B CN 202010847538 A CN202010847538 A CN 202010847538A CN 112061116 B CN112061116 B CN 112061116B
Authority
CN
China
Prior art keywords
potential energy
parking
vehicle
energy field
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010847538.1A
Other languages
Chinese (zh)
Other versions
CN112061116A (en
Inventor
李道飞
刘关明
刘傲
林思远
肖斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202010847538.1A priority Critical patent/CN112061116B/en
Publication of CN112061116A publication Critical patent/CN112061116A/en
Application granted granted Critical
Publication of CN112061116B publication Critical patent/CN112061116B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W30/00Purposes of road vehicle drive control systems not related to the control of a particular sub-unit, e.g. of systems using conjoint control of vehicle sub-units
    • B60W30/06Automatic manoeuvring for parking
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models

Landscapes

  • Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)

Abstract

本发明公开了一种基于势能场函数逼近的强化学习方法的泊车策略,通过设计势能场来逼近强化学习过程中的状态价值函数;根据车辆状态价值函数值通过ε‑贪婪策略从预设的可执行动作空间中选择执行动作;利用车辆利用状态转移方程根据车辆当前状态和选择的动作预测车辆下一状态;重复选择执行动作和预测状态这一过程至泊车结束,所有选择的动作序列构成实时泊车规划路径。本发明通过训练势能场参数适用于各种不同的泊车区域,在不同的场景下进行泊车路径规划,具有通用性;可实时地规划泊车路径,且该路径可准确跟踪。

Figure 202010847538

The invention discloses a parking strategy based on a reinforcement learning method of potential energy field function approximation. The potential energy field is designed to approximate the state value function in the reinforcement learning process; Select the execution action in the executable action space; use the state transition equation of the vehicle to predict the next state of the vehicle according to the current state of the vehicle and the selected action; repeat the process of selecting the execution action and predicting the state until the end of parking, and all the selected action sequences constitute Real-time parking planning route. The present invention is applicable to various parking areas by training potential energy field parameters, and performs parking path planning in different scenarios, and has universality; the parking path can be planned in real time, and the path can be accurately tracked.

Figure 202010847538

Description

Parking strategy of reinforcement learning method based on potential energy field function approximation
Technical Field
The invention belongs to the technical field of vehicles, and particularly relates to a parking strategy based on a potential energy field function approximation reinforcement learning method.
Background
At present, the mainstream routes of the automatic parking technology are still mostly based on the traditional path planning algorithm, and mainly divided into three types, namely a random path generation algorithm, a path planning algorithm for generating a path by using function fitting, a rule-based path planning algorithm and the like. The Random path generation algorithm, such as RRT (rapid-expanding Random Tree algorithm), PRM (Random Roadmap algorithm), etc., needs to randomly generate a path in a pre-manufactured scene map, then perform collision detection on the randomly generated path (or detect whether the path is in a drivable area), and then select an optimal parking path through an optimization target among all paths that meet the requirements. The method for generating the path by utilizing function fitting can use a polynomial, a Bezier curve, an arc tangent function and the like, under the condition of obtaining environmental information, according to a selected function form, optimizing and solving by utilizing conditions such as collision constraint, parking geometric constraint, vehicle performance constraint and the like, and finally determining parameters of the selected function, thereby generating the path. The rule-based path planning method is mainly used for planning according to driving experiences and different position relations of a vehicle and a parking space, so that the whole path is generated.
However, the above three parking path planning methods all have certain limitations, the random path generation algorithm has high requirements on the sensor, information of the whole parking environment needs to be obtained by sensing in advance, and it is difficult to ensure that the generated path is a track that can be actually realized by the vehicle. The path generated based on the function fitting has strict requirements on parking places and the initial pose of the vehicle, has almost no applicability to different scenes, is difficult to continuously plan under the condition of large path tracking error in the actual parking process of the vehicle, and is not a parking algorithm which is easy to meet real-time planning. The rule-based parking path planning method is also not flexible enough, and it is difficult to ensure the completeness of the set rule, and the algorithm rule needs to be re-established manually for a new scene.
The artificial potential energy field method is a classic robot path planning scheme, which utilizes the concept of potential energy field and generally considers only moving objects as particles. However, in a parking scene, the vehicle cannot be regarded as a simple mass point or a circle due to the shape and steering characteristics of the vehicle, and therefore a path planned directly by the artificial potential energy field cannot be actually realized for vehicle tracking control. Potential energy is defined as the energy that an object or system has due to its position or state, called potential energy, which is not owned by the object alone, but is shared by interacting objects. During the parking process of a vehicle, the parking environment (including terrain and other vehicles) can be considered as a field, the vehicle has potential energy therein, the potential energy is related to the position and the state of the vehicle, and the potential energy is considered as the target for the pursuit of the parking process, so that the parking process is a process for pursuing the potential energy to increase.
Disclosure of Invention
The invention aims to provide a parking strategy based on a reinforced learning method of potential energy field function approximation aiming at the defects of the prior art.
The purpose of the invention is realized by the following technical scheme: a parking strategy of a reinforcement learning method based on potential energy field function approximation is characterized in that a state value function in the reinforcement learning process is approximated by designing a potential energy field, and the potential energy field function is embodied as quantitative representation of different factors such as the current state of a vehicle, a target parking space, a drivable area and vehicle parameters in the vehicle state value function; predicting a next state corresponding to each executable action by using a state transfer equation according to the current state of the vehicle and a preset executable action space, then calculating a state cost function value of each predicted state by combining a potential energy field, and selecting one action with the highest state cost function value from the predicted states by an epsilon-greedy strategy; and selecting the next action according to the state corresponding to the action, repeating the process of predicting the state and selecting the execution action until the parking is finished, and finally generating a real-time parking planning path according to the selected action sequence.
Further, a guide line is designed according to a required path, a potential energy field is generated by using the guide line and parking boundary constraint, potential energy field parameters are optimized, and the finally obtained potential energy field function can be used for representing a state value function value of the vehicle in each state in the vehicle parking process; the parking area comprises a drivable area and a target parking space, and the parking boundary is the outer contour of the parking area.
Further, the designed potential energy field is divided into a potential energy field generated by an attractive force part and a potential energy field generated by a repulsive force part; the gravitational partial potential energy field is generated by designed virtual guidelines, and the fields generated by different virtual guidelines have different priorities in different regions, the field with the higher priority covering the field with the lower priority; the repulsive force component potential energy field is generated by the parking boundary.
Further, different potential energy fields are designed for different types of parking areas, and the potential energy fields of different parts have different action ranges; wherein, the potential energy of the gravitational part is a positive value, the more the gravitational potential energy is closer to the guide line in the action range, the more the gravitational potential energy is closer to the terminal point, and the gravitational potential energy of the terminal point is the largest; the potential energy of the repulsion part is a negative value, the potential energy of the repulsion at the parking boundary is negative and infinite, and the repulsion closer to the parking boundary within the action range is larger.
Furthermore, according to the vehicle state, different parts of the potential energy field have different effects on different positions of the vehicle, and the generated potential energy is different, so that the contribution to the vehicle state value function is different; the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle, the state value function generated by the parking boundary of the repulsive field part acts on four angular points of the outer contour of the vehicle, and the potential energy field generated by the contact angular points of the target parking space and the travelable area acts on the outer contour edge of the vehicle.
Further, the state cost function value v0 of the gravitational partial potential energy field is:
v0=f(X)
Figure BDA0002643577410000021
X=[(x-xtarget),(y-ytarget),(yaw-yawtarget)]
the vehicle state at least comprises x and y coordinates of the middle point of the rear axle of the vehicle in a parking space coordinate system and an included angle yaw between the longitudinal central axis of the vehicle and the x axle in the parking space coordinate system, and the included angle yaw is marked as (x, y, yaw); (x)target,ytarget,yawtarget) Is in the end point state; v0 represents the function value of the potential energy field caused by gravity, the function f is the function of the potential energy field caused by gravity; c0And C1Is the parameter to be trained.
Value of value v1 of state cost of repulsive force component potential energy fieldiComprises the following steps:
v1i=-C2/di 2
the vehicle contour has four angular points, the target parking space and the drivable area have two contact angular points, i is 1-6, d1~d4For the shortest distance of each vehicle contour corner to the parking boundary, d5~d6The shortest distance from the contact angular point of each target parking space and the drivable area to the vehicle contour edge is calculated; if d isiBeyond the repulsive force action range, v1i=0;C2Is a parameter to be trained.
The final vehicle state cost function value V is:
Figure BDA0002643577410000031
further, defining a parking success rate as an optimization target to optimize the potential energy field parameters, wherein the parking success rate is defined as follows: under the same scene, N rounds are trained, the initial state of the vehicle in each round is randomly generated, and if M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the group of potential energy field parameters is M/Nx 100%; the end mark comprises the successful driving of the vehicle into the parking space, the driving of the vehicle out of the parking area and the overtime of parking.
Further, the executing action consists of a steering wheel angle and gears, wherein the gears comprise a forward gear, a reverse gear and a neutral gear.
Further, before predicting the next state by using a state transition equation, removing the action for enabling the vehicle to collide with the parking boundary from a preset executable action space, and selecting the executable action by using an epsilon-greedy strategy; and after a series of action sequences from the current state to the final prediction state are obtained according to an epsilon-greedy strategy and a state transition equation, trimming the action sequences to remove circular actions, and obtaining a final parking planning path.
Further, the vehicle parameters include front suspension, rear suspension, wheel base, vehicle width, minimum turning radius, and transmission ratio.
Compared with the prior art, the invention has the beneficial effects that: different from the traditional reinforcement learning method, the method of the invention does not set timely reward, but gives a state value function by a potential energy field approximation method, and in the training process of the reinforcement learning algorithm, the method takes the vehicle observation state, the vehicle state value function prediction value, the vehicle state transition matrix and the vehicle prediction action as the basic components of the algorithm of the invention:
1. by using the method, a specific vehicle can train a parking scene in a minimum standard parking area by self, design a potential energy field function, and calculate corresponding potential energy field parameters of the potential energy field function in an off-line manner, and when meeting other travelable areas, if the other parking areas contain the trained area, the parking path can be planned by directly using the training parameters;
2. by utilizing the method, different potential energy fields can be trained for irregular or various different parking areas respectively, and parking path planning can be carried out under different scenes, so that the method has universality;
3. because the observation state of the vehicle is real-time, the vehicle can plan a track under the action of the potential energy field in different states, which is favorable for improving the generalization capability of the vehicle, and the invention is one of the invention points;
4. because the reinforcement learning is a process established under a vehicle kinematic model and the final output path is back-deduced from the predicted action sequence of the vehicle, the path of the invention is ensured to be accurately tracked by the vehicle;
5. the potential energy field is constructed by establishing a proper virtual guide line, and the condition of the potential energy field is measured by introducing the parking success rate, so that the final state value function is more scientific, which is one of the invention points of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;
fig. 1 is a schematic view of an application scenario according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a potential energy field construction method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The invention relates to a reinforcement learning intelligent parking strategy based on a potential energy field approximation state value function, which combines reinforcement learning and a potential energy field to plan a parking path. The method is characterized in that a potential energy field is designed to approach a state value function in the reinforcement learning process, the potential energy field is used for quantitatively expressing different factors such as the current attitude, the target parking space, the drivable area and vehicle parameters of the vehicle in the state value function of the vehicle, wherein the vehicle parameters comprise front suspension, rear suspension, axle distance, vehicle width, minimum turning radius, rotation ratio and the like; predetermining an executable action space; obtaining a state transition equation by utilizing a vehicle motion model; obtaining the current state of the vehicle at each moment, predicting the next-moment prediction state corresponding to each action of all executable action spaces according to a state transfer equation, calculating a state value function value of the prediction state, and selecting a certain action corresponding to the state value function value meeting the strategy according to an epsilon-greedy strategy; outputting the predicted state of the vehicle in the one operation; and selecting corresponding actions according to an epsilon-greedy strategy, repeating the processes of predicting the state and selecting the actions until the parking is finished, finally outputting the vehicle predicting state in the whole process, performing certain smoothing, and generating a real-time parking planning path according to the selected action sequence. The method is suitable for different parking scenes such as vertical parking, lateral parking, oblique parking and the like, and only needs to design corresponding potential energy fields and optimize parameters in different scenes.
Before the real vehicle runs, a kinematics model is built according to specific real vehicle parameters, a required parking area is determined according to the real vehicle parameters and sensor information, and corresponding potential energy field parameters are built under an offline condition. In a specific task, firstly, environment parameters are input into an algorithm, the algorithm can generate a specific potential energy field, then the current pose of the vehicle is input, and the parking algorithm can plan a path starting from the current point and enabling successful parking. And tracking the vehicle according to the planned path or according to a predicted action sequence given by an algorithm. In the real movement process of the vehicle, the algorithm can be called in real time, only the current pose needs to be known, and no matter whether the vehicle has a large tracking error or not, the parking algorithm can plan a parking path with the current point as a starting point in real time. The difficulty of design reward in a parking environment existing in the process of building strategies through reinforcement learning is avoided by using the potential energy field as the estimation of the state value function. And for each safety position in the potential energy field, the vehicle can successfully cross the local optimal state due to the searchability with certain probability of the adopted strategy, and finally successfully reaches the target parking space.
The invention specifically comprises the following steps:
(1) the value of the vehicle state cost function is determined by the parking environment, the environment information is obtained by using a sensor at the initial stage of a parking algorithm, and the vehicle state cost function is generated according to the idea of borrowing the potential energy field from the obtained environment information. The vehicle observation state at least comprises an x coordinate of a midpoint of a rear axle of the vehicle under a parking space coordinate system, a y coordinate of the midpoint of the rear axle of the vehicle under the parking space coordinate system, and an included angle yaw between a longitudinal central axis of the vehicle and the x axis under the parking space coordinate system, wherein the combination is (x, y, yaw).
For example, in the vertical parking application scenario shown in fig. 1, the length of the travelable region a is 30m, the width of the travelable region a is 5m (5m or more may be regarded as 5m), the rectangular region B is the target parking space, and the length × width of the target parking space is 6m × 2.2 m. Vehicle parameters: the wheelbase is 2.7m, the front suspension is 0.985m, the rear suspension is 0.8m, the vehicle width is 1.9m, the minimum turning radius is 5.8m, and the steering gear ratio is 20. After the specific vehicle parameters and the expected parking area, the potential energy field of the parking process can be determined accordingly.
(2) Constructing a potential energy field:
and (1.1) under the global coordinate, designing a virtual guide line according to a required path, wherein the guide line points to a parking terminal position.
(1.2) restraining and generating a potential energy field by using a guide line and a parking boundary, wherein the parking boundary is an outer contour of a drivable area and a target parking space; designing different potential energy fields for different types of parking areas; the potential energy of different obstacles and guide wires have different ranges of action; combining collision constraint and vehicle parking endpoint constraint in a potential energy field, the designed potential energy field is divided into a potential energy field generated by a gravitational part and a potential energy field generated by a repulsive part:
the potential energy field of the gravitational part is generated by a designed virtual guide line, the potential energy field of the guide line shows a positive value, and the action range of the partial field is large; in the process that the vehicle approaches the terminal state, the potential energy of the vehicle is continuously increased; fields generated by different virtual guidelines have different priorities in different regions, fields with high priority will overlap fields with low priority;
the repulsive force part potential energy field is generated by a parking boundary, and the repulsive force field only influences the vehicle in a small range, so that the collision avoidance requirement can be met; at the parking boundary, the potential energy of the repulsive force part received by the vehicle is set to be minus infinity.
According to the vehicle state, different parts of the potential energy field have different effects on different position points on the vehicle, the generated potential energy is different, and therefore the contribution to the vehicle state value function is different:
the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle;
a state value function generated by a parking boundary of the repulsion field part acts on four corner points of the outer contour of the vehicle, and a potential energy field generated by contact corner points of the parking space and the drivable area acts on the contour edge of the outer contour of the vehicle.
As shown in the potential energy field function design of the embodiment shown in FIG. 2, the parking boundary is characterized by repulsive force, i.e., the closer the potential energy field is, the more negative the estimated state cost function of the vehicle is, but the repulsive force acting range of the boundary is limited to a small area near the boundary. Accordingly, for a desired virtual guide line (a portion indicated by a dotted arrow in fig. 2), which is in the form of attractive force, the closer to the guide line, the more positive the estimated state cost function value of the vehicle is, and the larger the range of action of the guide line of the vehicle is, so that the vehicle can be planned by the action of the attractive force at each position of the parking area. For different virtual guide lines, different priorities are given according to different areas where the vehicle is located, and the attraction with the higher priority covers the attraction with the lower priority, for example, in a vehicle travelable area and a parking space area (denoted as area C) included in a dotted line frame shown in fig. 2, the priority of the guide line 3 is higher, that is, in the area, the vehicle is not influenced by the guide lines 1 and 2; the guideline 1,2 has a higher priority in other travelable regions (denoted as D-regions) not included in the dash-dot line region.
(3) And (3) parking: the invention obtains the parking path by a reinforcement learning method, and more particularly, learns the parking strategy from an exploration sequence. Determining the pose of the vehicle in the environment according to the current vehicle state, determining all current executable action spaces, predicting the next-moment prediction state corresponding to each action of all the executable action spaces according to a state transfer equation, calculating the state value function value of the prediction state, selecting the state value function value (maximum) meeting the strategy according to an epsilon-greedy strategy, and finally selecting the corresponding prediction action; updating the state corresponding to the predicted action to be the current state, and calculating the predicted state and the corresponding predicted state cost function of each action based on the current state so as to select the next action according to a greedy strategy; and repeating the process until the vehicle is parked successfully, the vehicle exceeds the parking area or the parking time is out, and finishing the planning. The step (3) is specifically as follows:
(3.1) the vehicle can generate an action sequence to a parking space at any safe position in the parking area according to the method, and the action selection generated strategy is an epsilon-greedy strategy (the next action is randomly selected according to the probability of epsilon, the probability of 1-epsilon with the maximum profit is selected according to the probability of epsilon, and epsilon is more than 0 and less than or equal to 1), namely, a certain exploration probability is obtained. Meanwhile, during search, all action elements which enable the vehicle to collide with the boundary line of the parking area are removed from the executable action space, and the remaining subspace is the action space for searching in the current state of the vehicle. The executable action of the vehicle, i.e. the action space (action) of the vehicle, is the output capable of controlling the motion of the vehicle, in the actual situation, the vehicle speed is set as the parking vehicle speed, and the executable action of the vehicle is composed of two dimensions of a steering wheel angle SW and Gear information (Gear), wherein the Gear comprises a forward Gear, a reverse Gear, a neutral Gear and the like.
And (3.2) generating a vehicle state transition matrix according to the performance constraint of the vehicle and the vehicle kinematic model, wherein the vehicle state transition matrix is used for representing that the vehicle obtains the next state from one state transition under a certain action. The kinematic model of the vehicle, which is established on the basis of the vehicle parameters, can be based on the current state (x) of the vehicle0,y0,yaw0) And input action (SW, Gear) for estimating the state (x) of the vehicle at the next moment1,y1,yaw1)。
(3.3) for each state of the vehicle, the potential energy field can give a corresponding state cost function value.
The guiding line of the vehicle gives the vehicle a target end state (x)target,ytarget,yawtarget) The state cost function value for this portion is given by the difference X:
v0=f(X)
Figure BDA0002643577410000061
X=[(x-xtarget),(y-ytarget),(yaw-yawtarget)]
wherein v0 represents the function value of the potential energy field caused by the attractive force, and the function f is the function of the potential energy field caused by the attractive force; c0And C1Is the parameter to be trained.
For the repulsive force part in the potential energy field, as each point on the vehicle body contour of the vehicle needs to meet the collision avoidance requirement, the repulsive force acts on four corner points and four contour edges of the vehicle contour. Calculating corresponding coordinate values of four corner points of the vehicle according to the current state (x, y, yaw) of the vehicle and body parameters of the vehicle, and calculating the shortest distance d from each point to the boundary lineiWherein, the subscript i of the four corner points is 1,2,3, 4; and then calculating the shortest distance d from each point to the vehicle outline according to the coordinates of two contact angular points of the parking space and the drivable areai(i ═ 5, 6); if d isiIn the action range of repulsion, the value of the state cost in the part can be expressed as
v1i=-C2/di 2
Wherein, v1iRepresents the value of the potential energy field function caused by the repulsive force, C2Is a parameter to be trained. If d isiIf the repulsive force range of the parking boundary is exceeded, the value of the partial state cost function is v1i=0。
The final vehicle state cost function value V is calculated as
Figure BDA0002643577410000071
(3.4) applying the method of the invention, obtaining a series of action sequences from the current state to the final prediction state. And pruning the action sequence, specifically deleting an action subsequence which enables the state track to have a loop, and then performing simulation deduction by using a state transition equation, the current state and the pruned action sequence to output a final parking planning path.
(4) According to specific vehicle parameters, obtaining optimal potential energy field function parameters through optimization training, wherein the optimal potential energy field function parameters are as follows:
defining the parking success rate related to the potential energy field function parameters as an optimization target to optimize the potential energy field parameters as follows: and (3) under the same scene, training N rounds, randomly generating the initial state of the vehicle of each round, and executing the step (3) according to the observation state of the vehicle, the executable action, the state transition matrix and the current potential energy field to perform the parking process of the vehicle. The end phase has three flags: the vehicle is successfully parked in the parking space, the vehicle is driven out of the parking area, and after a certain maximum step length, any mark of the three is met, namely the parking stage of the round is considered to be finished. If M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the set of potential energy field parameters is M/Nx 100%. And if the parking success rate defined above is related to the function of the potential energy field, the parking success rate parameter is taken as an optimization target, the potential energy field is optimized off line, and the optimal potential energy field function under the specific environment is found. Based on trained parameter C0,C1,C2And (3) waiting for the number, the vehicle can plan a place with the highest state value from the current state value function value to the whole potential energy field in real time in the reinforcement learning, the state with the highest state value is in the parking space, and the finally obtained potential energy field function can be used for representing the reinforcement learning state value function value of the vehicle in each state in the parking process of the vehicle, namely, a parking path can be successfully planned by using the method.
The present invention is not limited to any particular parking area, and the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1.一种基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:通过设计势能场来逼近强化学习过程中的状态价值函数,势能场作用体现为车辆当前状态、目标车位、可行驶区域和车辆参数的不同因素在车辆状态价值函数的定量表示;根据车辆当前状态和预设的可执行动作空间,利用状态转移方程预测每个可执行动作对应的下一状态,然后结合势能场计算每个预测状态的状态价值函数值,通过ε-贪婪策略从中选择状态价值函数值最高的一个动作;再根据该动作对应的状态选择下一个动作,重复预测状态和选择执行动作这一过程至泊车结束,最后根据选择动作序列生成实时泊车规划路径;1. A parking strategy based on a reinforcement learning method of potential energy field function approximation, characterized in that: the state value function in the reinforcement learning process is approximated by designing a potential energy field, and the potential energy field effect is embodied as the current state of the vehicle, the target parking space, and the available space. Quantitative representation of the different factors of the driving area and vehicle parameters in the vehicle state value function; according to the current state of the vehicle and the preset executable action space, the state transition equation is used to predict the next state corresponding to each executable action, and then combined with the potential energy field Calculate the state value function value of each predicted state, and select an action with the highest state value function value through the ε-greedy strategy; then select the next action according to the state corresponding to the action, and repeat the process of predicting the state and selecting the execution action until After parking, finally generate a real-time parking planning path according to the selected action sequence; 设计的势能场分为引力部分产生的势能场和斥力部分产生的势能场;The designed potential energy field is divided into the potential energy field generated by the gravitational part and the potential energy field generated by the repulsive force part; 引力部分势能场的状态价值函数值v0为:The state value function value v0 of the potential energy field of the gravitational part is:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
Figure DEST_PATH_IMAGE004
X=[
Figure DEST_PATH_IMAGE006
]
X=[
Figure DEST_PATH_IMAGE006
]
其中,车辆状态至少包括车辆后轴中点在泊车位坐标系下的x,y坐标和车辆纵向中心轴与在泊车位坐标系下x轴的夹角yaw,记为(x,y,yaw);(xtarget,ytarget,yawtarget)为终点状态;v0表示由引力引起的势能场函数值,函数f为由引力引起的势能场函数;C0和C1是待训练获得的参数;Among them, the vehicle state at least includes the x, y coordinates of the center point of the rear axle of the vehicle in the parking space coordinate system and the angle yaw between the longitudinal center axis of the vehicle and the x axis in the parking space coordinate system, denoted as (x, y, yaw) ; (x target , y target , yaw target ) is the end state; v0 represents the potential energy field function value caused by gravity, and the function f is the potential energy field function caused by gravity; C 0 and C 1 are the parameters to be obtained by training; 斥力部分势能场的状态价值函数值v1 i 为:The state value function value v1 i of the potential energy field of the repulsive force is: v1i=-C2/di 2 v 1 i =-C 2 /d i 2 其中,车辆轮廓有四个角点,目标车位与可行驶区域有两个接触角点,i=1~6,d1 ~d4为每个车辆轮廓角点到泊车边界的最短距离,d5~d6为每个目标车位与可行驶区域的接触角点到车辆轮廓边的最短距离;如果di超过了斥力作用范围,则v1 i =0;C2是一个待训练参数;Among them, the vehicle contour has four corner points, the target parking space and the drivable area have two contact corner points, i=1~6, d 1 ~ d 4 is the shortest distance from each vehicle contour corner point to the parking boundary, d 5 ~ d 6 is the shortest distance from the contact angle between each target parking space and the drivable area to the vehicle silhouette edge; if d i exceeds the range of repulsion, then v1 i =0; C 2 is a parameter to be trained; 最终车辆状态价值函数值
Figure DEST_PATH_IMAGE008
为:
final vehicle state value function value
Figure DEST_PATH_IMAGE008
for:
Figure DEST_PATH_IMAGE010
Figure DEST_PATH_IMAGE010
.
2.根据权利要求1所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:根据需要的路径设计引导线,用引导线和泊车边界约束生成势能场,优化势能场参数,最后得到的势能场函数可以用来表示车辆泊车过程中车辆在每一个状态下的状态价值函数值;其中泊车边界为泊车区域外轮廓,泊车区域包括可行驶区域和目标车位。2. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1, it is characterized in that: design guide line according to required path, generate potential energy field with guide line and parking boundary constraint, optimize potential energy field parameters, The obtained potential energy field function can be used to represent the state value function value of the vehicle in each state during the vehicle parking process; the parking boundary is the outer contour of the parking area, and the parking area includes the drivable area and the target parking space. 3.根据权利要求2所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:引力部分势能场由设计的虚拟引导线产生,且不同的虚拟引导线产生的场在不同区域具有不同的优先级,优先级高的场覆盖优先级低的场;斥力部分势能场由泊车边界产生。3. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 2, it is characterized in that: the potential energy field of the gravitational part is produced by the virtual guide line of design, and the field that different virtual guide lines produce is in different areas With different priorities, the fields with higher priorities cover the fields with lower priorities; the repulsive part of the potential energy field is generated by the parking boundary. 4.根据权利要求3所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:对于不同类型的泊车区域设计不同的势能场,不同部分的势能场具有不同的作用范围;其中,引力部分势能为正值,在其作用范围内越靠近引导线引力势能越大,进一步地越靠近终点引力势能越大,且终点的引力势能最大;斥力部分势能为负值,泊车边界的斥力势能为负无穷,且在其作用范围内越靠近泊车边界斥力势能越大。4. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 3, is characterized in that: different potential energy fields are designed for different types of parking areas, and the potential energy fields of different parts have different scopes of action; Among them, the potential energy of the gravitational force is a positive value, and the closer it is to the guide line within its scope of action, the greater the gravitational potential energy, and the closer it is to the end point, the greater the gravitational potential energy, and the gravitational potential energy of the end point is the largest; the potential energy of the repulsive force is a negative value, the parking boundary The repulsive potential energy of is negative infinity, and the closer to the parking boundary in its action range, the greater the repulsive potential energy. 5.根据权利要求3所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:根据车辆状态,势能场不同部分对车辆不同位置的作用不同,产生的势能也不一样,因此对车辆状态价值函数的贡献不同;引力势能场部分产生的状态价值函数值对车辆的后轴中心点产生作用,斥力场部分泊车边界产生的状态价值函数对车辆外轮廓的四个角点产生作用,且目标车位与可行驶区域的接触角点产生的势能场对车辆的外轮廓边产生作用。5. The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 3, is characterized in that: according to the state of the vehicle, different parts of the potential energy field have different effects on different positions of the vehicle, and the resulting potential energy is also different, so The contribution to the vehicle state-value function is different; the state-value function value generated by the gravitational potential energy field has an effect on the center point of the rear axle of the vehicle, and the state-value function generated by the parking boundary of the repulsion field part has an effect on the four corners of the outer contour of the vehicle. and the potential energy field generated by the contact angle between the target parking space and the drivable area has an effect on the outer silhouette edge of the vehicle. 6.根据权利要求2所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:定义泊车成功率为优化目标来优化势能场参数,泊车成功率定义如下:在同一个场景下,拟训练N个回合,且每个回合的车辆初始状态随机生成,若其中有M个回合是以车辆成功驶入车位为结束标志,则该组势能场参数下的泊车成功率为M/N×100%;所述结束标志包括车辆成功驶入车位、车辆驶离泊车区域和泊车超时。6. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 2, it is characterized in that: define the success rate of parking to optimize the target to optimize the potential energy field parameter, and the success rate of parking is defined as follows: In the scenario, N rounds of training are planned, and the initial state of the vehicle in each round is randomly generated. If there are M rounds of which the vehicle successfully enters the parking space as the end sign, the parking success rate under the set of potential energy field parameters is as follows: M/N×100%; the end sign includes the vehicle entering the parking space successfully, the vehicle leaving the parking area and the parking time-out. 7.根据权利要求1所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:所述执行动作由方向盘转角和档位组成,其中档位包括前进档位、倒车档位和空档位。7. The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1, wherein the execution action is composed of a steering wheel angle and a gear position, wherein the gear positions include a forward gear position, a reverse gear position and Empty position. 8.根据权利要求1所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:每次利用状态转移方程预测下一状态前,先从预设的可执行动作空间中去除使车辆与泊车边界发生碰撞的动作,再利用ε-贪婪策略选择执行动作;根据ε-贪婪策略和状态转移方程得到由当前状态直到最终预测状态的一连串动作序列后,对动作序列进行修剪去除循环动作,得到最终的泊车规划路径。8 . The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1 , characterized in that: before each state transition equation is used to predict the next state, the parking strategy is removed from the preset executable action space. When the vehicle collides with the parking boundary, the ε-greedy strategy is used to select the execution action; after a series of action sequences from the current state to the final predicted state are obtained according to the ε-greedy strategy and the state transition equation, the action sequence is pruned to remove the loop action to get the final parking plan path. 9.根据权利要求1所述基于势能场函数逼近的强化学习方法的泊车策略,其特征在于:所述车辆参数包括前悬、后悬、轴距、车宽、最小转弯半径和传动比。9 . The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1 , wherein the vehicle parameters include front overhang, rear overhang, wheelbase, vehicle width, minimum turning radius and transmission ratio. 10 .
CN202010847538.1A 2020-08-21 2020-08-21 A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy Active CN112061116B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010847538.1A CN112061116B (en) 2020-08-21 2020-08-21 A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010847538.1A CN112061116B (en) 2020-08-21 2020-08-21 A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy

Publications (2)

Publication Number Publication Date
CN112061116A CN112061116A (en) 2020-12-11
CN112061116B true CN112061116B (en) 2021-10-29

Family

ID=73658797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010847538.1A Active CN112061116B (en) 2020-08-21 2020-08-21 A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy

Country Status (1)

Country Link
CN (1) CN112061116B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112677983B (en) * 2021-01-07 2022-04-12 浙江大学 System for recognizing driving style of driver
CN113335270B (en) * 2021-07-01 2022-05-03 湖南大学 Parking path planning method and device
CN113705474B (en) * 2021-08-30 2022-04-15 北京易航远智科技有限公司 Parking space detection method and device
CN115472038B (en) * 2022-11-01 2023-02-03 南京杰智易科技有限公司 Automatic parking method and system based on deep reinforcement learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007055389A1 (en) * 2007-11-20 2009-05-28 Valeo Schalter Und Sensoren Gmbh Method and apparatus for collision prevention when planning a path for parking a vehicle
JP6764170B2 (en) * 2017-05-11 2020-09-30 日野自動車株式会社 Backward parking support device for connected vehicles
CN109318890A (en) * 2018-06-29 2019-02-12 北京理工大学 A dynamic obstacle avoidance method for unmanned vehicles based on dynamic windows and potential energy fields of obstacles
CN110136481B (en) * 2018-09-20 2021-02-02 初速度(苏州)科技有限公司 Parking strategy based on deep reinforcement learning

Also Published As

Publication number Publication date
CN112061116A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN112061116B (en) A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy
Li et al. An optimization-based path planning approach for autonomous vehicles using the DynEFWA-artificial potential field
CN113359757B (en) A method for path planning and trajectory tracking of unmanned vehicles
Shi et al. Driving decision and control for automated lane change behavior based on deep reinforcement learning
CN110136481B (en) Parking strategy based on deep reinforcement learning
Zhang et al. Trajectory planning and tracking for autonomous vehicle based on state lattice and model predictive control
Lin et al. Decision making through occluded intersections for autonomous driving
CN109976340B (en) Man-machine cooperation dynamic obstacle avoidance method and system based on deep reinforcement learning
CN110969848A (en) Automatic driving overtaking decision method based on reinforcement learning under opposite double lanes
CN114153213A (en) A deep reinforcement learning intelligent vehicle behavior decision-making method based on path planning
CN113247023B (en) Driving planning method and device, computer equipment and storage medium
CN110716562A (en) Decision-making method for multi-lane driving of driverless cars based on reinforcement learning
CN112141091B (en) Secondary parking method and system for solving parking space deviation and positioning deviation and vehicle
Fuji et al. Trajectory planning for automated parking using multi-resolution state roadmap considering non-holonomic constraints
CN117705123B (en) Track planning method, device, equipment and storage medium
CN118348975A (en) Path planning method, amphibious unmanned platform, storage medium and program product
Li et al. Adaptive sampling-based motion planning with a non-conservatively defensive strategy for autonomous driving
CN116659501A (en) Data processing method and device and vehicle
CN117007066A (en) Unmanned trajectory planning method integrated by multiple planning algorithms and related device
Yamaguchi et al. Model predictive path planning for autonomous parking based on projected C-space
Garzón et al. Game theoretic decision making based on real sensor data for autonomous vehicles’ maneuvers in high traffic
Huy et al. A practical and optimal path planning for autonomous parking using fast marching algorithm and support vector machine
CN117302204B (en) Multi-wind-lattice vehicle track tracking collision avoidance control method and device based on reinforcement learning
CN118220214A (en) Automatic driving movement planning method and system for complex parallel parking scene
CN118518105A (en) Based on improve A*Multi-target plant protection robot obstacle avoidance path optimization method of-IWOA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant