CN112061116B

CN112061116B - A Reinforcement Learning Method Based on Potential Energy Field Function Approximation for Parking Policy

Info

Publication number: CN112061116B
Application number: CN202010847538.1A
Authority: CN
Inventors: 李道飞; 刘关明; 刘傲; 林思远; 肖斌
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2021-10-29
Anticipated expiration: 2040-08-21
Also published as: CN112061116A

Abstract

The invention discloses a parking strategy based on a reinforcement learning method of potential energy field function approximation. The potential energy field is designed to approximate the state value function in the reinforcement learning process; Select the execution action in the executable action space; use the state transition equation of the vehicle to predict the next state of the vehicle according to the current state of the vehicle and the selected action; repeat the process of selecting the execution action and predicting the state until the end of parking, and all the selected action sequences constitute Real-time parking planning route. The present invention is applicable to various parking areas by training potential energy field parameters, and performs parking path planning in different scenarios, and has universality; the parking path can be planned in real time, and the path can be accurately tracked.

Description

Parking strategy of reinforcement learning method based on potential energy field function approximation

Technical Field

The invention belongs to the technical field of vehicles, and particularly relates to a parking strategy based on a potential energy field function approximation reinforcement learning method.

Background

At present, the mainstream routes of the automatic parking technology are still mostly based on the traditional path planning algorithm, and mainly divided into three types, namely a random path generation algorithm, a path planning algorithm for generating a path by using function fitting, a rule-based path planning algorithm and the like. The Random path generation algorithm, such as RRT (rapid-expanding Random Tree algorithm), PRM (Random Roadmap algorithm), etc., needs to randomly generate a path in a pre-manufactured scene map, then perform collision detection on the randomly generated path (or detect whether the path is in a drivable area), and then select an optimal parking path through an optimization target among all paths that meet the requirements. The method for generating the path by utilizing function fitting can use a polynomial, a Bezier curve, an arc tangent function and the like, under the condition of obtaining environmental information, according to a selected function form, optimizing and solving by utilizing conditions such as collision constraint, parking geometric constraint, vehicle performance constraint and the like, and finally determining parameters of the selected function, thereby generating the path. The rule-based path planning method is mainly used for planning according to driving experiences and different position relations of a vehicle and a parking space, so that the whole path is generated.

However, the above three parking path planning methods all have certain limitations, the random path generation algorithm has high requirements on the sensor, information of the whole parking environment needs to be obtained by sensing in advance, and it is difficult to ensure that the generated path is a track that can be actually realized by the vehicle. The path generated based on the function fitting has strict requirements on parking places and the initial pose of the vehicle, has almost no applicability to different scenes, is difficult to continuously plan under the condition of large path tracking error in the actual parking process of the vehicle, and is not a parking algorithm which is easy to meet real-time planning. The rule-based parking path planning method is also not flexible enough, and it is difficult to ensure the completeness of the set rule, and the algorithm rule needs to be re-established manually for a new scene.

The artificial potential energy field method is a classic robot path planning scheme, which utilizes the concept of potential energy field and generally considers only moving objects as particles. However, in a parking scene, the vehicle cannot be regarded as a simple mass point or a circle due to the shape and steering characteristics of the vehicle, and therefore a path planned directly by the artificial potential energy field cannot be actually realized for vehicle tracking control. Potential energy is defined as the energy that an object or system has due to its position or state, called potential energy, which is not owned by the object alone, but is shared by interacting objects. During the parking process of a vehicle, the parking environment (including terrain and other vehicles) can be considered as a field, the vehicle has potential energy therein, the potential energy is related to the position and the state of the vehicle, and the potential energy is considered as the target for the pursuit of the parking process, so that the parking process is a process for pursuing the potential energy to increase.

Disclosure of Invention

The invention aims to provide a parking strategy based on a reinforced learning method of potential energy field function approximation aiming at the defects of the prior art.

The purpose of the invention is realized by the following technical scheme: a parking strategy of a reinforcement learning method based on potential energy field function approximation is characterized in that a state value function in the reinforcement learning process is approximated by designing a potential energy field, and the potential energy field function is embodied as quantitative representation of different factors such as the current state of a vehicle, a target parking space, a drivable area and vehicle parameters in the vehicle state value function; predicting a next state corresponding to each executable action by using a state transfer equation according to the current state of the vehicle and a preset executable action space, then calculating a state cost function value of each predicted state by combining a potential energy field, and selecting one action with the highest state cost function value from the predicted states by an epsilon-greedy strategy; and selecting the next action according to the state corresponding to the action, repeating the process of predicting the state and selecting the execution action until the parking is finished, and finally generating a real-time parking planning path according to the selected action sequence.

Further, a guide line is designed according to a required path, a potential energy field is generated by using the guide line and parking boundary constraint, potential energy field parameters are optimized, and the finally obtained potential energy field function can be used for representing a state value function value of the vehicle in each state in the vehicle parking process; the parking area comprises a drivable area and a target parking space, and the parking boundary is the outer contour of the parking area.

Further, the designed potential energy field is divided into a potential energy field generated by an attractive force part and a potential energy field generated by a repulsive force part; the gravitational partial potential energy field is generated by designed virtual guidelines, and the fields generated by different virtual guidelines have different priorities in different regions, the field with the higher priority covering the field with the lower priority; the repulsive force component potential energy field is generated by the parking boundary.

Further, different potential energy fields are designed for different types of parking areas, and the potential energy fields of different parts have different action ranges; wherein, the potential energy of the gravitational part is a positive value, the more the gravitational potential energy is closer to the guide line in the action range, the more the gravitational potential energy is closer to the terminal point, and the gravitational potential energy of the terminal point is the largest; the potential energy of the repulsion part is a negative value, the potential energy of the repulsion at the parking boundary is negative and infinite, and the repulsion closer to the parking boundary within the action range is larger.

Furthermore, according to the vehicle state, different parts of the potential energy field have different effects on different positions of the vehicle, and the generated potential energy is different, so that the contribution to the vehicle state value function is different; the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle, the state value function generated by the parking boundary of the repulsive field part acts on four angular points of the outer contour of the vehicle, and the potential energy field generated by the contact angular points of the target parking space and the travelable area acts on the outer contour edge of the vehicle.

Further, the state cost function value v0 of the gravitational partial potential energy field is:

v0＝f(X)

X＝[(x-x_target),(y-y_target),(yaw-yaw_target)]

the vehicle state at least comprises x and y coordinates of the middle point of the rear axle of the vehicle in a parking space coordinate system and an included angle yaw between the longitudinal central axis of the vehicle and the x axle in the parking space coordinate system, and the included angle yaw is marked as (x, y, yaw); (x)_target,y_target,yaw_target) Is in the end point state; v0 represents the function value of the potential energy field caused by gravity, the function f is the function of the potential energy field caused by gravity; c₀And C₁Is the parameter to be trained.

Value of value v1 of state cost of repulsive force component potential energy field_iComprises the following steps:

v1_i＝-C₂/d_i ²

the vehicle contour has four angular points, the target parking space and the drivable area have two contact angular points, i is 1-6, d₁～d₄For the shortest distance of each vehicle contour corner to the parking boundary, d₅～d₆The shortest distance from the contact angular point of each target parking space and the drivable area to the vehicle contour edge is calculated; if d is_iBeyond the repulsive force action range, v1_i＝0；C₂Is a parameter to be trained.

The final vehicle state cost function value V is:

further, defining a parking success rate as an optimization target to optimize the potential energy field parameters, wherein the parking success rate is defined as follows: under the same scene, N rounds are trained, the initial state of the vehicle in each round is randomly generated, and if M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the group of potential energy field parameters is M/Nx 100%; the end mark comprises the successful driving of the vehicle into the parking space, the driving of the vehicle out of the parking area and the overtime of parking.

Further, the executing action consists of a steering wheel angle and gears, wherein the gears comprise a forward gear, a reverse gear and a neutral gear.

Further, before predicting the next state by using a state transition equation, removing the action for enabling the vehicle to collide with the parking boundary from a preset executable action space, and selecting the executable action by using an epsilon-greedy strategy; and after a series of action sequences from the current state to the final prediction state are obtained according to an epsilon-greedy strategy and a state transition equation, trimming the action sequences to remove circular actions, and obtaining a final parking planning path.

Further, the vehicle parameters include front suspension, rear suspension, wheel base, vehicle width, minimum turning radius, and transmission ratio.

Compared with the prior art, the invention has the beneficial effects that: different from the traditional reinforcement learning method, the method of the invention does not set timely reward, but gives a state value function by a potential energy field approximation method, and in the training process of the reinforcement learning algorithm, the method takes the vehicle observation state, the vehicle state value function prediction value, the vehicle state transition matrix and the vehicle prediction action as the basic components of the algorithm of the invention:

1. by using the method, a specific vehicle can train a parking scene in a minimum standard parking area by self, design a potential energy field function, and calculate corresponding potential energy field parameters of the potential energy field function in an off-line manner, and when meeting other travelable areas, if the other parking areas contain the trained area, the parking path can be planned by directly using the training parameters;

2. by utilizing the method, different potential energy fields can be trained for irregular or various different parking areas respectively, and parking path planning can be carried out under different scenes, so that the method has universality;

3. because the observation state of the vehicle is real-time, the vehicle can plan a track under the action of the potential energy field in different states, which is favorable for improving the generalization capability of the vehicle, and the invention is one of the invention points;

4. because the reinforcement learning is a process established under a vehicle kinematic model and the final output path is back-deduced from the predicted action sequence of the vehicle, the path of the invention is ensured to be accurately tracked by the vehicle;

5. the potential energy field is constructed by establishing a proper virtual guide line, and the condition of the potential energy field is measured by introducing the parking success rate, so that the final state value function is more scientific, which is one of the invention points of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;

fig. 1 is a schematic view of an application scenario according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a potential energy field construction method according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

The invention relates to a reinforcement learning intelligent parking strategy based on a potential energy field approximation state value function, which combines reinforcement learning and a potential energy field to plan a parking path. The method is characterized in that a potential energy field is designed to approach a state value function in the reinforcement learning process, the potential energy field is used for quantitatively expressing different factors such as the current attitude, the target parking space, the drivable area and vehicle parameters of the vehicle in the state value function of the vehicle, wherein the vehicle parameters comprise front suspension, rear suspension, axle distance, vehicle width, minimum turning radius, rotation ratio and the like; predetermining an executable action space; obtaining a state transition equation by utilizing a vehicle motion model; obtaining the current state of the vehicle at each moment, predicting the next-moment prediction state corresponding to each action of all executable action spaces according to a state transfer equation, calculating a state value function value of the prediction state, and selecting a certain action corresponding to the state value function value meeting the strategy according to an epsilon-greedy strategy; outputting the predicted state of the vehicle in the one operation; and selecting corresponding actions according to an epsilon-greedy strategy, repeating the processes of predicting the state and selecting the actions until the parking is finished, finally outputting the vehicle predicting state in the whole process, performing certain smoothing, and generating a real-time parking planning path according to the selected action sequence. The method is suitable for different parking scenes such as vertical parking, lateral parking, oblique parking and the like, and only needs to design corresponding potential energy fields and optimize parameters in different scenes.

Before the real vehicle runs, a kinematics model is built according to specific real vehicle parameters, a required parking area is determined according to the real vehicle parameters and sensor information, and corresponding potential energy field parameters are built under an offline condition. In a specific task, firstly, environment parameters are input into an algorithm, the algorithm can generate a specific potential energy field, then the current pose of the vehicle is input, and the parking algorithm can plan a path starting from the current point and enabling successful parking. And tracking the vehicle according to the planned path or according to a predicted action sequence given by an algorithm. In the real movement process of the vehicle, the algorithm can be called in real time, only the current pose needs to be known, and no matter whether the vehicle has a large tracking error or not, the parking algorithm can plan a parking path with the current point as a starting point in real time. The difficulty of design reward in a parking environment existing in the process of building strategies through reinforcement learning is avoided by using the potential energy field as the estimation of the state value function. And for each safety position in the potential energy field, the vehicle can successfully cross the local optimal state due to the searchability with certain probability of the adopted strategy, and finally successfully reaches the target parking space.

The invention specifically comprises the following steps:

(1) the value of the vehicle state cost function is determined by the parking environment, the environment information is obtained by using a sensor at the initial stage of a parking algorithm, and the vehicle state cost function is generated according to the idea of borrowing the potential energy field from the obtained environment information. The vehicle observation state at least comprises an x coordinate of a midpoint of a rear axle of the vehicle under a parking space coordinate system, a y coordinate of the midpoint of the rear axle of the vehicle under the parking space coordinate system, and an included angle yaw between a longitudinal central axis of the vehicle and the x axis under the parking space coordinate system, wherein the combination is (x, y, yaw).

For example, in the vertical parking application scenario shown in fig. 1, the length of the travelable region a is 30m, the width of the travelable region a is 5m (5m or more may be regarded as 5m), the rectangular region B is the target parking space, and the length × width of the target parking space is 6m × 2.2 m. Vehicle parameters: the wheelbase is 2.7m, the front suspension is 0.985m, the rear suspension is 0.8m, the vehicle width is 1.9m, the minimum turning radius is 5.8m, and the steering gear ratio is 20. After the specific vehicle parameters and the expected parking area, the potential energy field of the parking process can be determined accordingly.

(2) Constructing a potential energy field:

and (1.1) under the global coordinate, designing a virtual guide line according to a required path, wherein the guide line points to a parking terminal position.

(1.2) restraining and generating a potential energy field by using a guide line and a parking boundary, wherein the parking boundary is an outer contour of a drivable area and a target parking space; designing different potential energy fields for different types of parking areas; the potential energy of different obstacles and guide wires have different ranges of action; combining collision constraint and vehicle parking endpoint constraint in a potential energy field, the designed potential energy field is divided into a potential energy field generated by a gravitational part and a potential energy field generated by a repulsive part:

the potential energy field of the gravitational part is generated by a designed virtual guide line, the potential energy field of the guide line shows a positive value, and the action range of the partial field is large; in the process that the vehicle approaches the terminal state, the potential energy of the vehicle is continuously increased; fields generated by different virtual guidelines have different priorities in different regions, fields with high priority will overlap fields with low priority;

the repulsive force part potential energy field is generated by a parking boundary, and the repulsive force field only influences the vehicle in a small range, so that the collision avoidance requirement can be met; at the parking boundary, the potential energy of the repulsive force part received by the vehicle is set to be minus infinity.

According to the vehicle state, different parts of the potential energy field have different effects on different position points on the vehicle, the generated potential energy is different, and therefore the contribution to the vehicle state value function is different:

the state value function value generated by the gravitational potential energy field part acts on the central point of the rear axle of the vehicle;

a state value function generated by a parking boundary of the repulsion field part acts on four corner points of the outer contour of the vehicle, and a potential energy field generated by contact corner points of the parking space and the drivable area acts on the contour edge of the outer contour of the vehicle.

As shown in the potential energy field function design of the embodiment shown in FIG. 2, the parking boundary is characterized by repulsive force, i.e., the closer the potential energy field is, the more negative the estimated state cost function of the vehicle is, but the repulsive force acting range of the boundary is limited to a small area near the boundary. Accordingly, for a desired virtual guide line (a portion indicated by a dotted arrow in fig. 2), which is in the form of attractive force, the closer to the guide line, the more positive the estimated state cost function value of the vehicle is, and the larger the range of action of the guide line of the vehicle is, so that the vehicle can be planned by the action of the attractive force at each position of the parking area. For different virtual guide lines, different priorities are given according to different areas where the vehicle is located, and the attraction with the higher priority covers the attraction with the lower priority, for example, in a vehicle travelable area and a parking space area (denoted as area C) included in a dotted line frame shown in fig. 2, the priority of the guide line 3 is higher, that is, in the area, the vehicle is not influenced by the guide lines 1 and 2; the guideline 1,2 has a higher priority in other travelable regions (denoted as D-regions) not included in the dash-dot line region.

(3) And (3) parking: the invention obtains the parking path by a reinforcement learning method, and more particularly, learns the parking strategy from an exploration sequence. Determining the pose of the vehicle in the environment according to the current vehicle state, determining all current executable action spaces, predicting the next-moment prediction state corresponding to each action of all the executable action spaces according to a state transfer equation, calculating the state value function value of the prediction state, selecting the state value function value (maximum) meeting the strategy according to an epsilon-greedy strategy, and finally selecting the corresponding prediction action; updating the state corresponding to the predicted action to be the current state, and calculating the predicted state and the corresponding predicted state cost function of each action based on the current state so as to select the next action according to a greedy strategy; and repeating the process until the vehicle is parked successfully, the vehicle exceeds the parking area or the parking time is out, and finishing the planning. The step (3) is specifically as follows:

(3.1) the vehicle can generate an action sequence to a parking space at any safe position in the parking area according to the method, and the action selection generated strategy is an epsilon-greedy strategy (the next action is randomly selected according to the probability of epsilon, the probability of 1-epsilon with the maximum profit is selected according to the probability of epsilon, and epsilon is more than 0 and less than or equal to 1), namely, a certain exploration probability is obtained. Meanwhile, during search, all action elements which enable the vehicle to collide with the boundary line of the parking area are removed from the executable action space, and the remaining subspace is the action space for searching in the current state of the vehicle. The executable action of the vehicle, i.e. the action space (action) of the vehicle, is the output capable of controlling the motion of the vehicle, in the actual situation, the vehicle speed is set as the parking vehicle speed, and the executable action of the vehicle is composed of two dimensions of a steering wheel angle SW and Gear information (Gear), wherein the Gear comprises a forward Gear, a reverse Gear, a neutral Gear and the like.

And (3.2) generating a vehicle state transition matrix according to the performance constraint of the vehicle and the vehicle kinematic model, wherein the vehicle state transition matrix is used for representing that the vehicle obtains the next state from one state transition under a certain action. The kinematic model of the vehicle, which is established on the basis of the vehicle parameters, can be based on the current state (x) of the vehicle₀,y₀,yaw₀) And input action (SW, Gear) for estimating the state (x) of the vehicle at the next moment₁,y₁,yaw₁)。

(3.3) for each state of the vehicle, the potential energy field can give a corresponding state cost function value.

The guiding line of the vehicle gives the vehicle a target end state (x)_target,y_target,yaw_target) The state cost function value for this portion is given by the difference X:

v0＝f(X)

X＝[(x-x_target),(y-y_target),(yaw-yaw_target)]

wherein v0 represents the function value of the potential energy field caused by the attractive force, and the function f is the function of the potential energy field caused by the attractive force; c₀And C₁Is the parameter to be trained.

For the repulsive force part in the potential energy field, as each point on the vehicle body contour of the vehicle needs to meet the collision avoidance requirement, the repulsive force acts on four corner points and four contour edges of the vehicle contour. Calculating corresponding coordinate values of four corner points of the vehicle according to the current state (x, y, yaw) of the vehicle and body parameters of the vehicle, and calculating the shortest distance d from each point to the boundary line_iWherein, the subscript i of the four corner points is 1,2,3, 4; and then calculating the shortest distance d from each point to the vehicle outline according to the coordinates of two contact angular points of the parking space and the drivable area_i(i ═ 5, 6); if d is_iIn the action range of repulsion, the value of the state cost in the part can be expressed as

v1_i＝-C₂/d_i ²

Wherein, v1_iRepresents the value of the potential energy field function caused by the repulsive force, C₂Is a parameter to be trained. If d is_iIf the repulsive force range of the parking boundary is exceeded, the value of the partial state cost function is v1_i＝0。

The final vehicle state cost function value V is calculated as

(3.4) applying the method of the invention, obtaining a series of action sequences from the current state to the final prediction state. And pruning the action sequence, specifically deleting an action subsequence which enables the state track to have a loop, and then performing simulation deduction by using a state transition equation, the current state and the pruned action sequence to output a final parking planning path.

(4) According to specific vehicle parameters, obtaining optimal potential energy field function parameters through optimization training, wherein the optimal potential energy field function parameters are as follows:

defining the parking success rate related to the potential energy field function parameters as an optimization target to optimize the potential energy field parameters as follows: and (3) under the same scene, training N rounds, randomly generating the initial state of the vehicle of each round, and executing the step (3) according to the observation state of the vehicle, the executable action, the state transition matrix and the current potential energy field to perform the parking process of the vehicle. The end phase has three flags: the vehicle is successfully parked in the parking space, the vehicle is driven out of the parking area, and after a certain maximum step length, any mark of the three is met, namely the parking stage of the round is considered to be finished. If M rounds take the successful driving of the vehicle into the parking space as an end mark, the parking success rate under the set of potential energy field parameters is M/Nx 100%. And if the parking success rate defined above is related to the function of the potential energy field, the parking success rate parameter is taken as an optimization target, the potential energy field is optimized off line, and the optimal potential energy field function under the specific environment is found. Based on trained parameter C₀，C₁，C₂And (3) waiting for the number, the vehicle can plan a place with the highest state value from the current state value function value to the whole potential energy field in real time in the reinforcement learning, the state with the highest state value is in the parking space, and the finally obtained potential energy field function can be used for representing the reinforcement learning state value function value of the vehicle in each state in the parking process of the vehicle, namely, a parking path can be successfully planned by using the method.

The present invention is not limited to any particular parking area, and the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A parking strategy based on a reinforcement learning method of potential energy field function approximation, characterized in that: the state value function in the reinforcement learning process is approximated by designing a potential energy field, and the potential energy field effect is embodied as the current state of the vehicle, the target parking space, and the available space. Quantitative representation of the different factors of the driving area and vehicle parameters in the vehicle state value function; according to the current state of the vehicle and the preset executable action space, the state transition equation is used to predict the next state corresponding to each executable action, and then combined with the potential energy field Calculate the state value function value of each predicted state, and select an action with the highest state value function value through the ε-greedy strategy; then select the next action according to the state corresponding to the action, and repeat the process of predicting the state and selecting the execution action until After parking, finally generate a real-time parking planning path according to the selected action sequence;

The designed potential energy field is divided into the potential energy field generated by the gravitational part and the potential energy field generated by the repulsive force part;

The state value function value v0 of the potential energy field of the gravitational part is:

X=[

]

Among them, the vehicle state at least includes the x, y coordinates of the center point of the rear axle of the vehicle in the parking space coordinate system and the angle yaw between the longitudinal center axis of the vehicle and the x axis in the parking space coordinate system, denoted as (x, y, yaw) ; (x _target , y _target , yaw _target ) is the end state; v0 represents the potential energy field function value caused by gravity, and the function f is the potential energy field function caused by gravity; C ₀ and C ₁ are the parameters to be obtained by training;

The state value function value v1 _i of the potential energy field of the repulsive force is:

v 1 _i =-C ₂ /d _i ²

Among them, the vehicle contour has four corner points, the target parking space and the drivable area have two contact corner points, i=1~6, d ₁ ~ d ₄ is the shortest distance from each vehicle contour corner point to the parking boundary, d ₅ ~ d ₆ is the shortest distance from the contact angle between each target parking space and the drivable area to the vehicle silhouette edge; if d _i exceeds the range of repulsion, then v1 _i =0; C ₂ is a parameter to be trained;

final vehicle state value function value

for:

.

2. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1, it is characterized in that: design guide line according to required path, generate potential energy field with guide line and parking boundary constraint, optimize potential energy field parameters, The obtained potential energy field function can be used to represent the state value function value of the vehicle in each state during the vehicle parking process; the parking boundary is the outer contour of the parking area, and the parking area includes the drivable area and the target parking space.

3. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 2, it is characterized in that: the potential energy field of the gravitational part is produced by the virtual guide line of design, and the field that different virtual guide lines produce is in different areas With different priorities, the fields with higher priorities cover the fields with lower priorities; the repulsive part of the potential energy field is generated by the parking boundary.

4. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 3, is characterized in that: different potential energy fields are designed for different types of parking areas, and the potential energy fields of different parts have different scopes of action; Among them, the potential energy of the gravitational force is a positive value, and the closer it is to the guide line within its scope of action, the greater the gravitational potential energy, and the closer it is to the end point, the greater the gravitational potential energy, and the gravitational potential energy of the end point is the largest; the potential energy of the repulsive force is a negative value, the parking boundary The repulsive potential energy of is negative infinity, and the closer to the parking boundary in its action range, the greater the repulsive potential energy.

5. The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 3, is characterized in that: according to the state of the vehicle, different parts of the potential energy field have different effects on different positions of the vehicle, and the resulting potential energy is also different, so The contribution to the vehicle state-value function is different; the state-value function value generated by the gravitational potential energy field has an effect on the center point of the rear axle of the vehicle, and the state-value function generated by the parking boundary of the repulsion field part has an effect on the four corners of the outer contour of the vehicle. and the potential energy field generated by the contact angle between the target parking space and the drivable area has an effect on the outer silhouette edge of the vehicle.

6. the parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 2, it is characterized in that: define the success rate of parking to optimize the target to optimize the potential energy field parameter, and the success rate of parking is defined as follows: In the scenario, N rounds of training are planned, and the initial state of the vehicle in each round is randomly generated. If there are M rounds of which the vehicle successfully enters the parking space as the end sign, the parking success rate under the set of potential energy field parameters is as follows: M/N×100%; the end sign includes the vehicle entering the parking space successfully, the vehicle leaving the parking area and the parking time-out.

7. The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1, wherein the execution action is composed of a steering wheel angle and a gear position, wherein the gear positions include a forward gear position, a reverse gear position and Empty position.

8 . The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1 , characterized in that: before each state transition equation is used to predict the next state, the parking strategy is removed from the preset executable action space. When the vehicle collides with the parking boundary, the ε-greedy strategy is used to select the execution action; after a series of action sequences from the current state to the final predicted state are obtained according to the ε-greedy strategy and the state transition equation, the action sequence is pruned to remove the loop action to get the final parking plan path.

9 . The parking strategy based on the reinforcement learning method of potential energy field function approximation according to claim 1 , wherein the vehicle parameters include front overhang, rear overhang, wheelbase, vehicle width, minimum turning radius and transmission ratio. 10 .