CN116533992A

CN116533992A - Automatic parking path planning method and system based on deep reinforcement learning algorithm

Info

Publication number: CN116533992A
Application number: CN202310819139.8A
Authority: CN
Inventors: 谭德坤; 杨哲; 刘旭晖; 赵嘉; 秦海鸥; 付雪峰; 李桢桢; 周如春
Original assignee: Nanchang Institute of Technology
Current assignee: Nanchang Institute of Technology
Priority date: 2023-07-05
Filing date: 2023-07-05
Publication date: 2023-08-04
Anticipated expiration: 2043-07-05
Also published as: CN116533992B

Abstract

The present disclosure relates to an automatic parking path planning method and system based on a deep reinforcement learning algorithm, wherein the method comprises the following steps: generating a parking action data set in a parking garage position model through a deep reinforcement learning algorithm based on a vehicle kinematic model, wherein a neural network in the deep reinforcement learning algorithm generates probability distribution corresponding to different parking actions; constructing a reward function by using a deep reinforcement learning algorithm, guiding the parking action and the gesture of the vehicle, obtaining the data with the best quality of the parking action, and ensuring the accurate gesture and the safety of parking; and updating network parameters in the deep reinforcement learning algorithm by using the data with the best parking action quality, and performing the next iterative operation by using the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking action converges to the optimal value, thereby completing the parking strategy learning. The method and the device improve exploratory and learning efficiency of the algorithm, enable parking to be more accurate and efficient, and enable parking paths to be more excellent.

Description

Automatic parking path planning method and system based on deep reinforcement learning algorithm

Technical Field

The disclosure relates to the technical field of intelligent parking, in particular to an automatic parking path planning method and system based on a deep reinforcement learning algorithm.

Background

The automatic parking system generates a vehicle parking route according to the information of the obstacle at the edge of the parking space and the relative position information of the starting point of the vehicle to be parked according to the preset data and algorithm of the vehicle-mounted computer, so that the automatic driving vehicle is controlled to accurately and safely drive into the appointed parking space. Several important indicators for evaluating the accuracy of the path trajectory of the parking vehicle and the safety of the parking vehicle include: the length of the parking path, the number of gear switching times required by parking, the acceleration of the parking motion and the parking time, and the quality of the parking path determine the user experience of comfort, safety and the like in the automatic parking process. Thus, high-quality parking movements require a parking path that is as short as possible, a parking trajectory that is as smooth as possible, low acceleration values and as short as possible for parking. The current automatic parking path planning method under the low-speed state is mainly divided into a method of solving by rules and a method of machine learning.

The algorithm for solving the rule mainly comprises a straight line-circular arc type, a complex curve model type and an optimal control problem type algorithm, wherein the algorithm is an algorithm for obtaining an optimal travelling route by traversing a scene under the background that a vehicle kinematics basic model, a vehicle starting and stopping gesture, an obstacle on a vehicle travelling route, a relative safety distance between the vehicle and the like are taken as constraint conditions in advance. Although the algorithm based on the rule generates a path planning speed faster, the vehicle parking success rate is lower in a complex scene such as a narrow lane, and the vehicle parking operation can not be completed even though a target point can not be reached by switching the gear for a plurality of times, so that the actual requirement is difficult to meet.

The learning-based algorithm mainly utilizes the integration of a neural network, fuzzy decision and evolutionary algorithm and applies the integration to the automatic parking path planning problem. The method is applied to the practical problems of a plurality of parking situations, the construction of a path planning algorithm needs to take higher degree of freedom and larger solution space into consideration of performance evaluation indexes, and the following problems are that the state and action dimension of the vehicle are greatly increased, so that the path planning speed of the algorithm is greatly influenced. The appearance of the deep neural network can describe more complex space through approximation of the network, and the solving capability is further enhanced. The automatic parking path planning applied to deep reinforcement learning is a research hotspot of various large automatic driving research companies and scholars in recent years. Reinforcement learning is a set of decision methods that map state information to actions according to specified learning objectives, thereby obtaining a process of maximizing return. Because the traditional reinforcement learning algorithm can only solve the discrete state problem. Aiming at the automatic parking problem of various library positions in a continuous state space, a Deep neural Network is introduced to fit a cost function, state information of an intelligent agent is introduced to a Q Network to obtain action values of all actions corresponding to each state, so that a deep_Q_network algorithm (DQN) is generated, a Deep reinforcement learning algorithm of the neural Network is introduced, the problem of continuous and discrete environment space can be solved, and learning samples generated by learning and exploration can be stored according to an experience playback mechanism. However, since the DQN algorithm is highly dependent on the quality of sample information in learning sample information and updating the Q network, more random strategies are adopted in the early stage of training to explore the diversity action space, so that the initially obtained invalid and low-quality samples are more, and useful samples are not easily searched, so that the DQN (Deep Q Network) algorithm is difficult to learn valuable information from the high-quality samples, and meanwhile, the learning mechanism of the DQN (Deep Q Network) algorithm has the problems of overestimation and non-uniqueness on the state and action value, so that the algorithm cannot converge. The defects of the algorithm mechanism cause DQN (Deep Q Network) algorithm to not meet the requirements of instantaneity and accuracy when being applied to an automatic parking system, and not have practicability.

Disclosure of Invention

The invention provides an automatic parking path planning method and system based on a deep reinforcement learning algorithm, which can solve the problems that the learning efficiency of the algorithm is low, the exploratory performance is low, the algorithm convergence is slow, the automatic parking system cannot be ensured to meet the requirements of instantaneity and accuracy, and the practicability is not realized. The present disclosure provides the following technical solutions:

as one aspect of the embodiments of the present disclosure, an automatic parking path planning method based on a deep reinforcement learning algorithm is provided. The method comprises the following steps:

s10, generating a parking action data set in a parking garage position model through a deep reinforcement learning algorithm based on a vehicle kinematic model, wherein a neural network in the deep reinforcement learning algorithm generates probability distribution corresponding to different parking actions;

s20, constructing a reward function by using a deep reinforcement learning algorithm, evaluating the quality of the parking action in the parking action data set, and obtaining data with optimal parking action quality;

and S30, updating network parameters in the deep reinforcement learning algorithm by using data with the best parking action quality, and performing next iterative operation by using the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking action is converged to be optimal, so as to complete parking strategy learning.

Optionally, the generating the parking action data set in the parking garage position model through the deep reinforcement learning algorithm includes: and taking the parking space environment information of the parking space model as input information data, introducing a long-term memory network to process, and conveying the obtained parking space environment and vehicle state information with uniform dimension to the input end of the neural network in the deep reinforcement learning algorithm.

Optionally, random noise is also introduced into the input end of the neural network in the deep reinforcement learning algorithm, which can be expressed as:wherein Re isLUIs an activation function of the noise_d3qn algorithm, +.>And->Respectively represent the standard deviation and the mean of parameters in Gaussian distribution, +.>For random noise in Gaussian distribution, W is a matrix parameter used for storing random noise information in a noise network, x is action information of an input end of a neural network, and b is bias in a noise network activation function.

Optionally, constructing the reward function using a deep reinforcement learning algorithm includes: guiding parking posture and path of the vehicle by adopting a reward function, setting a constraint reward function for the intelligent agent according to the rotation angle of the vehicle, the distance from each collision detection point to the end point and the final posture of each wheel of vehicle in the driving process, wherein the reward function is that

Wherein, the method comprises the steps of, wherein,

,

，

；

depending on the distance of the vehicle from the parking point;

a prize value for assessing the final attitude of the vehicle in the parking garage,the larger the deflection angle is, the worse the parking gesture is, so that a larger punishment item is set;

the method comprises the steps of evaluating a rewarding value of a steering angle in the process of parking movement of a vehicle, wherein the larger the rewarding value is, the more stable the curve is in the process of driving the vehicle, and the better the path planning curve is;

、/>、/>、/>and->Parameters of a reward function;

when the parking of the vehicle is stopped, the relative position information of the vehicle coordinates and the garage position coordinates is obtained;

when stopping parking the vehicle, the final parking posture of the vehicle;

for the steering angle during the parking movement of the vehicle.

Optionally, the updating the network parameters in the deep reinforcement learning algorithm by using the data with the best parking action quality includes: and introducing a preferential experience playback mechanism based on a binary tree structure model into the deep reinforcement learning algorithm.

Optionally, the policy evaluation merit function of the parking action is expressed as:

，

wherein, as a function of state values>As a function of the action value(s),aas the value of the action is set,sthe state value is a state value representing vehicle state information, and comprises coordinates of eight endpoints of the vehicle, a maximum vehicle speed ratio, a vehicle pose rotation angle and a distance between the vehicle and a parking point.

Optionally, the process for acquiring the parking space environment information of the parking space model includes: acquiring vehicle posture and relative position information acquired by a satellite navigation system, acquiring the posture, speed and acceleration information of the vehicle by using an inertial measurement unit, and acquiring the position, speed and posture information of the vehicle by a INS (Inertial Navigation System) calculation unit; and carrying out data fusion positioning on the acquired information by adopting a Kalman filtering system, correcting and denoising errors of vehicle state information, feeding back the obtained filtering result to a INS (Inertial Navigation System) resolving unit so as to correct reading errors of an inertial measurement unit, and obtaining the current vehicle state information through repeated iterative operation.

As another aspect of an embodiment of the present disclosure, there is provided an automatic parking path planning system based on a deep reinforcement learning algorithm, including:

the parking action data generation module is used for generating a parking action data set in a parking garage position model through a deep reinforcement learning algorithm based on a vehicle kinematic model, wherein a neural network in the deep reinforcement learning algorithm is used for generating probability distribution corresponding to different parking actions;

The parking action quality evaluation module is used for constructing a reward function by using a deep reinforcement learning algorithm, evaluating the quality of the parking action in the parking action data set and obtaining the data with the best parking action quality;

and the parking strategy learning module is used for updating network parameters in the deep reinforcement learning algorithm by utilizing data with optimal parking action quality, and performing next iterative operation by utilizing the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking action is converged to be optimal, so that the parking strategy learning is completed.

As another aspect of an embodiment of the present disclosure, there is provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the above-described automatic parking path planning method based on a deep reinforcement learning algorithm when executing the computer program.

As another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the above-described automatic parking path planning method based on a deep reinforcement learning algorithm.

Compared with the prior art, the beneficial effects of the present disclosure are:

1. According to the method and the device, the pre-training of the parking operation is carried out under the simulation environment based on the standard parameter multiple garage models, in the subsequent real-scene parking operation, the vehicle can call the corresponding pre-training model to directly park according to the garage type, and compared with a learning-based method, the method and the device are higher in instantaneity and can be applied to multiple parking environments.

2. The parking path planning algorithm fully considers comprehensive factors such as safety and comfort of parking, utilizes 11 state dimensions of the vehicle state to accurately control the azimuth, the attitude, the speed and the acceleration information of the vehicle, controls the minimum safe distance between the vehicle and an obstacle and the variation of the speed and the acceleration through constraint conditions and reinforcement learning reward functions, thereby ensuring the accurate attitude, the safety and good user experience of parking, and reduces the calculation complexity of input information through dimension mapping of a neural network.

3. The method adds a priority experience playback mechanism to the deep reinforcement learning algorithm mechanism for enhancing the utilization rate of effective experience, and enhances the learning efficiency of the algorithm; the long-term memory network and the noise network are added at the state and action input end, so that the method has the capabilities of extracting environment and state characteristics more efficiently and exploring better data.

Drawings

FIG. 1 is a flow chart of an automatic parking path planning method based on a deep reinforcement learning algorithm in embodiment 1;

FIG. 2 is a flow chart of the path planning of the automatic parking system in embodiment 1;

FIG. 3 is a schematic diagram of a parallel parking garage position model of example 1;

FIG. 4 is a schematic view of a vertical parking garage position model of example 1;

FIG. 5 is a schematic view of a parking garage position model of example 1;

FIG. 6 is a schematic diagram of a parking garage environment recognition system in accordance with example 1;

FIG. 7 is a flow chart of parking space recognition in embodiment 1;

fig. 8 is a schematic diagram of a vehicle motion model in embodiment 1;

fig. 9 is a schematic diagram of a discretization operation of the vehicle in embodiment 1;

FIG. 10 is a flowchart of the extraction of the input-side long-short-term memory network characteristics in embodiment 1;

FIG. 11 is a schematic diagram of the structure of SumPree in example 1;

FIG. 12 is a block diagram of a deep reinforcement learning algorithm for parking path planning in embodiment 1;

FIG. 13 is a schematic diagram of a loosely coupled integrated navigation system in accordance with example 1;

FIG. 14 is a flow chart of the path planning of the automatic parking system in embodiment 1;

FIG. 15 is a flow chart of learning based on pre-selected experience in example 1;

fig. 16 is a schematic block diagram of an automatic parking path planning system based on a deep reinforcement learning algorithm in embodiment 2.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure.

In addition, the disclosure further provides an automatic parking path planning method and system based on the deep reinforcement learning algorithm, and any one of the automatic parking path planning methods based on the deep reinforcement learning algorithm provided in the disclosure may be implemented, and corresponding technical schemes and descriptions and corresponding descriptions of method parts are omitted.

The execution subject of the automatic parking path planning method based on the deep reinforcement learning algorithm may be a computer or other automatic parking path planning apparatus capable of implementing the automatic parking path planning method based on the deep reinforcement learning algorithm, for example, the method may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the automatic parking path planning method based on the deep reinforcement learning algorithm may be implemented by a manner in which a processor invokes computer readable instructions stored in a memory.

Example 1

The embodiment provides an automatic parking path planning method based on a deep reinforcement learning algorithm, as shown in fig. 1, comprising the following steps:

The steps of the embodiments of the present disclosure are described in detail below, respectively.

In this embodiment, as shown in fig. 2, the environment information is a parking space environment discussed in detail below, the sensing module is responsible for identifying a vehicle for a space type and monitoring relative position information of the vehicle and a parking space, and the related sensor is a laser radar. The vehicle state sensor is responsible for monitoring the attitude and positioning information of the vehicle, and the related sensors are an inertial measurement unit and GNSS (Global Navigation Satellite System). The control module is responsible for speed control and steering operation of the vehicle based on a vehicle kinematic model. The planning module plans a vehicle parking path according to the environmental information by using an improved deep reinforcement learning algorithm based on the pre-training model.

The parking space model comprises: parallel parking garage position model, perpendicular parking garage position model and inclined parking garage position model.

The parking space environment of the parallel parking space model is generally formed by front and rear vehicles or front and rear boundary lines of the parking space, as shown in fig. 3. According to the size, the position and the parking minimum safety distance information of the front and rear vehicles, the front and rear vehicles are abstracted into rectangular frames and two end points P and O. And a parking environment coordinate system is established by taking the left rear corner end point of the rear obstacle vehicle as a coordinate origin, taking the direction of the right vehicle body pointing to the vehicle head as an X axis and taking the direction of the rear vehicle body pointing to the movement as a Y axis, and measuring the size of a parking area by taking the PO end point distance and the vehicle width of the obstacle. Pose information of the vehicle to be parked is in the coordinate system.

The parking space environment of the vertical parking space model is generally formed by left and right vehicles or left and right boundary lines of the parking space, as shown in fig. 4. According to the size, the position and the parking minimum safety distance information of the left and right vehicles, the left and right vehicles are abstracted into rectangular frames and two end points P and O. And a parking environment coordinate system is established by taking a right front corner end point of the right obstacle vehicle as a coordinate origin, taking the direction of the right side vehicle body pointing to the other end of the vehicle body as an X axis and taking the direction of the rear side vehicle body pointing to the front side vehicle body as a Y axis, and measuring the size of a parking area by taking the PO end point distance and the vehicle width of the obstacle. Pose information of the vehicle to be parked is in the coordinate system.

The parking space environment of the inclined parking space model is generally formed by left and right boundary lines of two vehicles or spaces in an inclined direction, as shown in fig. 5. According to the size, position and minimum parking safety distance information of vehicles on the left and right sides of the bevel edge, the left and right vehicles are abstracted into rectangular frames and two end points P and O. And a right rear corner end point of the right obstacle vehicle is taken as an origin of coordinates, a longitudinal axis direction of the vehicle to be parked is taken as an X axis, a Y axis is established along a vertical direction of the X axis, a parking environment coordinate system is established, and the size of a parking area is measured by a PO end point distance and the width of the obstacle vehicle. Pose information of the vehicle to be parked is in the coordinate system.

In this embodiment, the intelligent laser radar recognition system for recognizing the parking garage environment information is composed of five parts, namely an automatic driving vehicle to be parked, an automatic parking system control module, a point cloud data storage server, a three-dimensional imaging analysis module and a front-end operation interface, and the recognition system for recognizing the parking garage environment information is shown in fig. 6.

The automatic driving vehicle is provided with a SICK laser radar scanner, a vehicle-mounted mobile power supply and other devices, and is used for receiving and responding to a control instruction sent by the automatic driving system control module, feeding back the current running state of the automatic driving vehicle to the automatic driving system control module and sending point cloud data to the point cloud data storage server. The automatic driving system control module is used for sending various control instructions to the automatic driving vehicle, such as starting a scanning task, advancing the automatic driving vehicle, backing the vehicle and the like, and dynamically adjusting the control instructions according to the running state information returned by the automatic driving vehicle. The point cloud data storage server is used for receiving and storing the point cloud data. The three-dimensional imaging analysis module is used for carrying out a series of calibration and screening on the point cloud data, constructing a three-dimensional curved surface of a parking garage position environment, and combining site parameters of a parking lot in a city to obtain a complete three-dimensional model; calculating the model volume to obtain the size of the parking garage and the distance relative to the vehicle; and sending the three-dimensional model and the size data to a front-end operation interface. The front-end operation interface comprises a control interface of the automatic driving vehicle, a three-dimensional model presentation interface and a volume data display interface.

In this embodiment, the vehicle recognizes parking garage environment information according to the lidar and executes a parking operation, as shown in fig. 7. When the automatic driving vehicle control module sends a command of 'start task' to the vehicle, the automatic driving vehicle drives from the current position to the parking start point, the laser radar scanner is started while the vehicle advances, and the vehicle position and posture information acquired through the vehicle-mounted positioning system in the advancing process of the automatic driving vehicle judges whether the vehicle reaches the parking start point. When a vehicle arrives at a parking point, the laser radar is utilized to scan parking garage position environment information, corresponding point cloud data information is acquired, a point cloud map is drawn, and the acquired point cloud map is subjected to map matching with the garage position information in the database by utilizing a preset point cloud registration algorithm, so that the parking garage position model where the current vehicle is judged, and corresponding parking actions are adopted accordingly. And after the vehicle reaches the parking point, confirming whether the vehicle is parked in a correct posture according to the vehicle-mounted fusion positioning system, and stopping running when the parking operation is completed. And generating point cloud data in the working process of the laser radar scanner and uploading the point cloud data to a point cloud data storage server.

In the present embodiment, assuming that the turning process of the vehicle is circular motion, the four-wheel ackerman steering model is simplified to a front-drive two-wheel model in consideration of the fact that the vehicle speed is low and no offset motion occurs when the vehicle moves on the parking path. Considering that the size of the outer contour of the automobile directly influences the quality of parking movement and the maximum safety distance set for avoiding collision of the running vehicle and an obstacle, the vehicle is wrapped by a rectangular frame with the size of the vehicle and the maximum safety distance, and a vehicle movement model is built according to the vehicle movement model, as shown in fig. 8.

In the vehicle motion model, a vehicle center point is taken as a vehicle reference point, in this embodiment, it is assumed that, according to a motion geometric relationship of a parallel parking garage position environment, coordinate values of each vertex of a vehicle outer contour are calculated as follows:

，

。

in this embodiment, the length of the arc curve is constructed according to the time required for the steering wheel to rotate. The curvature of the arc curve is proportional to the distance from a certain point to the starting point on the arc, namely, the curvature k=s/C, wherein C is the rate of change of the curvature to the distance, so that the steering wheel rotation time and the length of the arc curve are respectively:

，/>。

further, when searching for the parking strategy, continuous vehicle actions are discretized and are respectively nine actions of transverse and longitudinal directions, four synthetic directions and static, and because the parking space model in the embodiment is a parallel parking space environment, the actions of the transverse direction and the synthetic direction play a dominant role in controlling the vehicle actions in the parking process, and the vehicle speed strategy of the vehicle model in the embodiment is divided into an acceleration stage, a stabilization stage, a deceleration stage and a termination stage by combining the vehicle directional action strategies.

In the acceleration and deceleration stage of the vehicle, the speed is controlled through the acceleration, as shown in fig. 9, an actual parking scene is simulated in the vehicle speed change stage, and in the parking process, the acceleration of the vehicle is increased and then reduced (in the parking and warehousing stage, the vehicle moves forward, the speed is marked as positive, the vehicle moves backward, the speed is marked as negative, and the acceleration is constant as an absolute value). The vehicle speed instruction formula is:

，

。

in this embodiment, based on a deep reinforcement learning algorithm combined with a vehicle kinematic model, a simulation model is placed in a parallel parking garage to perform simulation training, a parking action data set is generated, a neural network generates probability distributions corresponding to different parking actions for the deep reinforcement learning algorithm, a subsequent solution searching process is guided through a roulette strategy, and meanwhile, the deep reinforcement learning algorithm gives consideration to guiding and random searching of the neural network for seeking diversity of an optimal solution.

Since the current state quantity in the reinforcement learning algorithm is only related to the state at the previous time, and no valuable learning experience can be obtained from the state information at a longer time, a long-short-term memory network (Long Short Term Memory, LSTM) is introduced in the embodiment to enhance the long-term memory and learning capability of the automatic parking system. As shown in fig. 10, the feature extraction flow chart of the long-short-term memory network for the state information records the parking garage position environment information as an input sequence S, and in each decision step, the real-time state information S received during the running of the vehicle is input into a LSTM (Long Short Term Memory) unit, and key information is extracted by using a forgetting gate in the LSTM (Long Short Term Memory) module and stored in h. When the obstacle information s of the autonomous vehicle is input, the important information of the parking environment obstacle edge detection is stored using h, and the relatively minor information is discarded. After the state information of all parking environments is input, the dimension stored in the state information LSTM (Long Short Term Memory) may be converted into a state vector s of a uniform dimension and transferred to a subsequent automatic parking system. After being processed by a LSTM (Long Short Term Memory) network as input information data, the parking space environment information s is processed to obtain parking space environment states s with uniform dimensions and automatic driving vehicle state information s, and the parking space environment information s and the automatic driving vehicle state information s are transmitted to the input end of a neural network in a deep reinforcement learning algorithm.

The LSTM (Long Short Term Memory) utilizes 3 "gating" mechanisms to assign corresponding weights to the input information based on its importance. The filtering and neglecting process of the secondary information is determined by a forgetting gate, and the output of the last LSTM (Long Short Term Memory) unit is readInput to the current LSTM (Long Short Term Memory) unitInformation filtered by sigmoid activation function is output to +.>. The input gate stores the filtered new information, sigmoid function update information +.>The output gate outputs a candidate vector +.>New information->Cell added with history status->Cell renewal was completed. Output gate output +.>The value, first of all, sigmoid function output +.>Then output with the tanh function>Multiply, output->。

The working principle of each unit door of LSTM (Long Short Term Memory) is as follows:

，

；

the noise network is added at the input end to enhance the exploratory property of path planning curve solution, so as to explore a curve solution set which is more excellent than a rule-based method, and the basic principle of the deep reinforcement learning algorithm is to replace the parameter w in the neural network with the parameter w which obeys Gaussian distribution，/>And->Respectively represent the standard deviation and the mean of the parameters in the Gaussian distribution, which are derived from learning empirical data, < - >Each element in the Gaussian distribution is randomly sampled from the standard normal distribution N (0, 1) independently, so the Q value of the deep reinforcement learning algorithm is expressed as Q (s, a, & lt/EN & gt>)。

The input end of the neural network in the deep reinforcement learning algorithm also introduces random noise, which can be expressed as:

，

wherein Re isLUIs an activation function of the noise _ D3QN algorithm,and->Respectively represent the standard deviation and the mean of parameters in Gaussian distribution, +.>For random noise in Gaussian distribution, W is a matrix parameter used for storing random noise information in a noise network, x is action information of an input end of a neural network, and b is bias in a noise network activation function.

in the embodiment, in the reinforcement learning process, the intelligent agent guides the intelligent agent trolley to learn the optimal parking posture and path through a preset rewarding function in the interaction process with the environment, so as to drive the intelligent agent to select the behavior strategy corresponding to the maximum rewarding value in continuous trial and error. Aiming at the problem that sparse rewards can make an intelligent agent difficult to quickly learn any useful attitude control strategy and even can not converge, a guided parking system rewarding function is designed.

Wherein, the method comprises the steps of, wherein,

,

，

；

depending on the distance of the vehicle from the parking point;

the method comprises the steps of setting a larger punishment item according to a reward value for evaluating the final gesture of the vehicle in a parking garage position, wherein the larger the deflection angle is, the worse the parking gesture is, and the larger punishment item is set;

、/>、/>、/>and->Parameters of a reward function;

when stopping parking the vehicle, the final parking posture of the vehicle;

for the steering angle during the parking movement of the vehicle.

The learning index for guiding the intelligent agent to conduct the optimal parking path is mainly used for guiding the action learning process of the vehicle from three aspects of Euclidean distance of the vehicle from the parking coordinate, switching times and steering angle of a steering wheel in the parking process and final posture of the vehicle when the parking action is completed. Wherein, According to the distance between the vehicle and the parking point, the closer the four end points of the vehicle are to the parking area, the more ideal the whole path is described; meanwhile, considering the safety and comfort of parking behavior, if the steering wheel is frequently switched and the gear switching times are more, the parking path curve is more undesirable, the parking action quality is poor, and a larger punishment value is set; finally, byPrize value->And (3) evaluating the final posture of the vehicle in the parking garage position, wherein the larger the deflection angle is, the worse the parking posture is, so that a larger punishment item is set.

The corresponding parameter settings in the bonus function are shown in table 1.

The classical DQN (Deep Q Network) algorithm uses an experience playback mechanism based on uniform sampling to repeatedly extract historical experiences to repeatedly train an intelligent agent, so that correlation of training samples is hindered, and the problem that the Q-Learning algorithm is unstable and is easy to fall into a local optimal solution is solved. However, even sampling of the DQN (Deep Q Network) algorithm makes it difficult to accurately select higher value information for use, and the empirical playback mechanism has low utilization of historical information, making it difficult for the classical DQN (Deep Q Network) algorithm to converge quickly. In this embodiment, a priority experience playback mechanism based on a binary tree structure model is introduced into the deep reinforcement learning algorithm.

SumTiee adopts a binary tree structure as shown in FIG. 11. The data stored in the nodes of each layer are sample priorities, the numerical value is positively correlated with the node priorities, and the utilization efficiency of high-value samples can be improved through a SumPreeE-based priority experience playback mechanism. The priority of each node in the SumPree structure is determined by a time sequence difference error TD-error, the difference is the difference between a target network value function and a training network function when the deep reinforcement learning algorithm is updated, the larger the difference is, the larger the value is, the larger the prediction precision of the training network is, the larger the rising space is, the higher the trained value is, the higher the priority is set, the numerical value below the leaf node represents the numerical value interval where the sample is located, and the larger the numerical value interval is, the higher the probability is selected in the sampling process.

The D3QN (Dueling Double Deep Q Network) network is a combination of double_ DQN (Double Deep Q Network) network and lasting_ DQN (Dueling Deep Q Network) network, wherein the introduction of double_dqn network improves the overestimation problem in DQN (Deep Q Network) algorithm, action selection of the algorithm and td_target update mechanism:

，

；

by using two network models with different parameters, the overestimation problem of the DQN (Deep Q Network) algorithm is effectively solved.

On the other hand, the introduction of the dueling_ DQN (Dueling Deep Q Network) network has the inherent defect of non-uniqueness, namely the same, for improving the DQN (Deep Q Network) network mechanismQ(s,a) Possibly from different state valuessAnd action valueaIs generated by the joint distribution of (a) and thus the system cannot be based onQAccurate training of values how an agent targets different state valuessSelecting the best actionaIn addition, in obstacle avoidance issues such as parking and path planning, the vehicle will generate the same if left or right turn is selectedQValue, thus action valueaThe final result cannot be determined in specific situationsQValues. A deep reinforcement learning algorithm architecture for parking path planning is shown in fig. 12. The parameters of the DuelingDQN (Dueling Deep Q Network) algorithm output layer are Value and Advantage respectively, and are used for evaluating state Value and action Advantage, and are expressed as follows:

，

Wherein, ，sthe state information of the vehicle to be parked is respectively the coordinates of eight endpoints of the vehicle, the maximum vehicle speed ratio, the vehicle pose rotation angle and the distance between the vehicle and the parking point.

Strategy determined for each iteration of the parking systemThe state value and the action value obtained by policy evaluation are respectively as follows:

，，

and then the state value function and the action value function, and the strategy evaluation advantage function of the parking action is obtained as follows:

，

According to the defined state value function, action value function and strategy evaluation dominance function, an optimal state value function, action value function, and an optimal strategy evaluation dominance function and an optimal parking action corresponding to the optimal state value function, action value function and strategy evaluation dominance function can be defined as follows:

，/>

，

。

in this embodiment, an improved GNSS/INS (Global Navigation Satellite System/Inertial Navigation System) loosely coupled integrated navigation model is provided. The basic architecture of the model is shown in fig. 13, and the process of obtaining the parking space environment information of the parking space model includes: the satellite navigation receiver corresponding to the GNSS (Global Navigation Satellite System) module acquires the posture and the relative position information of the running vehicle, the posture, the speed and the acceleration information of the running vehicle, which are acquired by the inertial navigation system corresponding to the IMU (Inertial Measurement Unit) module and are obtained through INS (Inertial Navigation System) calculation, the data fusion positioning is carried out by the improved unscented Kalman filter, the error of the vehicle state information is corrected and noise reduced, the obtained filtering result is used for feeding back to the INS (Inertial Navigation System) calculation module to correct the reading error of the IMU (Inertial Measurement Unit) module, the current vehicle state information is obtained through repeated iterative operation of the system, the position information of the vehicle and the parking space environment information of the vehicle are fed back to the path planning decision system of the vehicle, and whether the parking operation of the vehicle is completed in the correct posture or not is judged according to the parking standard.

In some embodiments, the multi-bin automatic parking path planning flowchart is shown in fig. 14, and the behavior of the optimal parking path planning module is divided into two phases, pre-training and real-time execution of the later scene.

Early pre-training stage: firstly, a vehicle kinematic model, a parallel parking, vertical parking and oblique parking garage simulation vehicle and environment model are constructed according to national vehicle and garage standard information data. The environment information is transmitted to a deep reinforcement learning network, and then the intelligent agent initializes the state information of the vehicle according to the environment information and the initial position information of the vehicle, wherein the state quantity of the vehicle has eleven dimensions, including distance information of eight azimuth obstacles, the maximum speed ratio of the vehicle, the steering angle of the vehicle and the distance of the vehicle from a parking point. And after receiving the state information of the vehicle, the deep neural network of the vehicle control module searches various vehicle actions according to the preset random probability and controls the vehicle to park. In the parking process, four vertexes of a rectangular vehicle are respectively provided with a vehicle-mounted laser radar, a vehicle-mounted fusion positioning system is matched for detecting the distance between the vehicle and an obstacle in real time, when the vehicle touches the obstacle or the information of the vehicle and the obstacle is detected to be smaller than the minimum safe distance, a collision detection algorithm of a decision module judges that parking fails, a punishment function of reinforcement learning is utilized for evaluating and scoring the behavior of the round, the vehicle is guided to learn the valuable experience of the sample in further iterative operation according to the return value, and therefore the track of the optimized parking path is continuously learned in each round of complete iterative operation period. Therefore, it can be concluded that the learning of the intelligent agent for the parking behavior with high efficiency is not separated from the efficient utilization of the sample experience, and the improvement of the recycling of the experience existing for the deep reinforcement learning in the embodiment is as follows.

Based on the pre-selected experience learning flow chart, as shown in fig. 15, from the iteration start time, the agent immediately explores the optimal action according to a certain probability, and learns the existing experience including unsuccessful according to the complementary probability, so that the algorithm will have a large amount of unqualified parking behavior experience in the initial stage of training. The algorithm stores the model trained each time, the model comprises state information state(s) of the vehicle of each parking action, action (a), evaluation of rewarding value (r) of penalty function and next state information state '(s') based on current behaviors, after the experience pool is filled, iterative training is completed, and then the behavior quality of all the current experiences is compared according to a certain standard. According to the embodiment, according to the observation of the parking action, the parking behavior corresponding to the sample with the rewarding value larger than 5500 is effective parking according to the actual situation, so that all models with the rewarding value larger than 5500 are automatically reserved according to a preset screening algorithm and are loaded into an experience pool for executing parking. According to all optimal experiences meeting the parking standard stored in the parking experience pool, the intelligent parking system can be used for guiding an intelligent body to park in real time and guiding the intelligent body to conduct better behaviors and behavior exploration guidance of other parking environments.

And after the parking experience pool is loaded with a pre-training model conforming to the real standard, the pre-training is completed. And further, the vehicle is directly guided to carry out simulated parking operation according to the pre-training model, the random exploration behavior of the intelligent body is stopped in the parking execution stage, and the parking operation is carried out by the optimal model through full probability learning. Because the pre-training model is a qualified sample completely, and the intelligent agent does not spontaneously explore unknown behaviors, the convergence of the algorithm and the stability and reliability of the parking system can be ensured.

The execution stage: after the learning of the intelligent parking behavior applied to the simulation model is completed, the vehicle is guided to park in real time according to the pre-training model in a real scene. Because the vehicle size data and the parking space size in the simulation environment are set according to the national general standard, the experience set in the simulation environment can be completely applied to the guidance of real-time parking. When the vehicle is parked in real time, the vehicle acquires the implemented positioning information of the current vehicle according to the vehicle-mounted GNSS (Global Navigation Satellite System) positioning system, and the vehicle is controlled to reach the corresponding vehicle to be parked starting position under the simulation model environment. And constructing a 3D environment point cloud map around the parking point by using the laser radar, performing matching operation with the parking information environment in the database of the vehicle-mounted path planning system, determining what kind of parking experience sample is used for controlling the vehicle, and giving a motion instruction to the vehicle according to the parking action of the optimal pre-training model after determining the type of the parking place and the corresponding parking action, thereby realizing parking.

Example 2

As another aspect of the embodiments of the present disclosure, there is also provided an automatic parking path planning system 100 based on a deep reinforcement learning algorithm, as shown in fig. 16, including the steps of:

the parking action data generation module 1 is used for generating a parking action data set in a parking garage position model through a deep reinforcement learning algorithm based on a vehicle kinematic model, wherein a neural network in the deep reinforcement learning algorithm is used for generating probability distribution corresponding to different parking actions;

the parking action quality evaluation module 2 is used for constructing a reward function by utilizing a deep reinforcement learning algorithm, evaluating the quality of the parking action in the parking action data set and obtaining the data with the best parking action quality;

the parking strategy learning module 3 updates network parameters in the deep reinforcement learning algorithm by using the data with the best parking action quality, and performs the next iterative operation by using the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking action converges to the optimal value, thereby completing the parking strategy learning.

Based on the module, the embodiment of the disclosure constructs an automatic parking path planning method based on a deep reinforcement learning algorithm, and based on a vehicle kinematic model, a parking action data set is generated in a parking garage position model through the deep reinforcement learning algorithm, and a neural network in the deep reinforcement learning algorithm generates probability distribution corresponding to different parking actions; constructing a reward function by using a deep reinforcement learning algorithm, guiding the parking action and the gesture of the vehicle, obtaining the data with the best quality of the parking action, and ensuring the accurate gesture and the safety of parking; and updating network parameters in the deep reinforcement learning algorithm by using the data with the best parking action quality, and performing the next iterative operation by using the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking action converges to the optimal value, thereby completing the parking strategy learning. An automated parking path planning system 100 based on a deep reinforcement learning algorithm is implemented.

The following describes each module of the embodiments of the present disclosure in detail.

in this embodiment, the environmental information is a parking space environment discussed in detail below, the sensing module is responsible for identifying a vehicle for a space type and monitoring relative position information of the vehicle and the parking space, and the related sensor is a laser radar. The vehicle state sensor is responsible for monitoring the attitude and positioning information of the vehicle, and the related sensors are an inertial measurement unit and GNSS (Global Navigation Satellite System). The control module is responsible for speed control and steering operation of the vehicle based on a vehicle kinematic model. The planning module plans a vehicle parking path according to the environmental information by using an improved deep reinforcement learning algorithm based on the pre-training model.

The parking space environment of the parallel parking space model is generally formed by front and rear vehicles or front and rear boundary lines of the parking space. According to the size, the position and the parking minimum safety distance information of the front and rear vehicles, the front and rear vehicles are abstracted into rectangular frames and two end points P and O. And a parking environment coordinate system is established by taking the left rear corner end point of the rear obstacle vehicle as a coordinate origin, taking the direction of the right vehicle body pointing to the vehicle head as an X axis and taking the direction of the rear vehicle body pointing to the movement as a Y axis, and measuring the size of a parking area by taking the PO end point distance and the vehicle width of the obstacle. Pose information of the vehicle to be parked is in the coordinate system.

The parking space environment of the vertical parking space model is generally composed of left and right vehicles or left and right boundary lines of the parking space. According to the size, the position and the parking minimum safety distance information of the left and right vehicles, the left and right vehicles are abstracted into rectangular frames and two end points P and O. And a parking environment coordinate system is established by taking a right front corner end point of the right obstacle vehicle as a coordinate origin, taking the direction of the right side vehicle body pointing to the other end of the vehicle body as an X axis and taking the direction of the rear side vehicle body pointing to the front side vehicle body as a Y axis, and measuring the size of a parking area by taking the PO end point distance and the vehicle width of the obstacle. Pose information of the vehicle to be parked is in the coordinate system.

The parking space environment of the inclined parking space model is generally composed of two vehicles in an inclined direction or left and right boundary lines of the parking space. According to the size, position and minimum parking safety distance information of vehicles on the left and right sides of the bevel edge, the left and right vehicles are abstracted into rectangular frames and two end points P and O. And a right rear corner end point of the right obstacle vehicle is taken as an origin of coordinates, a longitudinal axis direction of the vehicle to be parked is taken as an X axis, a Y axis is established along a vertical direction of the X axis, a parking environment coordinate system is established, and the size of a parking area is measured by a PO end point distance and the width of the obstacle vehicle. Pose information of the vehicle to be parked is in the coordinate system.

In this embodiment, the intelligent laser radar recognition system for recognizing the parking garage environment information is composed of five parts, namely an automatic driving vehicle to be parked, an automatic parking system control module, a point cloud data storage server, a three-dimensional imaging analysis module and a front-end operation interface.

In this embodiment, the vehicle identifies parking garage environment information according to the lidar and executes a parking operation flow. When the automatic driving vehicle control module sends a command of 'start task' to the vehicle, the automatic driving vehicle drives from the current position to the parking start point, meanwhile, the laser radar scanner is started, point cloud data are collected in real time in the advancing process of the automatic driving vehicle, and whether the vehicle reaches the parking start point is judged by measuring the relative distance between the vehicle and the parking point through surrounding environment information. When a vehicle arrives at a parking point, the laser radar is utilized to scan the parking garage environment information, a corresponding point cloud map is constructed, and the point cloud map is compared with the garage map information in the data, so that the parking garage model where the current vehicle is judged, and accordingly, corresponding parking actions are taken. And after the vehicle reaches the parking point, confirming whether the vehicle is parked in a correct posture according to the vehicle-mounted fusion positioning system, and stopping running when the parking operation is completed. And generating point cloud data in the working process of the laser radar scanner and uploading the point cloud data to a point cloud data storage server.

In the present embodiment, assuming that the turning process of the vehicle is circular motion, the four-wheel ackerman steering model is simplified to a front-drive two-wheel model in consideration of the fact that the vehicle speed is low and no offset motion occurs when the vehicle moves on the parking path. Considering that the size of the outer contour of the automobile directly influences the quality of parking movement and the maximum safety distance set for avoiding collision of the running vehicle and an obstacle, the vehicle is wrapped by a rectangular frame with the size of the vehicle and the maximum safety distance, and thus the vehicle movement model is built.

，

。

，/>。

The acceleration and deceleration stages of the vehicle realize speed control through acceleration, an actual parking scene is simulated in the vehicle speed change stage, and the acceleration of the vehicle is increased and then reduced in the parking process (in the parking and warehousing stage, the vehicle moves forward, the speed is recorded as positive, the vehicle moves backward, the speed is recorded as negative, and the acceleration is constant as an absolute value). The vehicle speed instruction formula is:

，

，/>

，

。

Since the current state quantity in the reinforcement learning algorithm is only related to the state at the previous time, and no valuable learning experience can be obtained from the state information at a longer time, the long-short term memory network LSTM (Long Short Term Memory) is introduced in the embodiment to enhance the long-term memory and learning capability of the automatic parking system. The parking garage position environment information is recorded as an input sequence S, real-time state information S received during vehicle running is input into a LSTM (Long Short Term Memory) unit in each decision step, key information is extracted by using a forgetting door in a LSTM (Long Short Term Memory) module, and the key information is stored in h. When the obstacle information s of the autonomous vehicle is input, the important information of the parking environment obstacle edge detection is stored using h, and the relatively minor information is discarded. After the state information of all parking environments is input, the dimension stored in the state information LSTM (Long Short Term Memory) may be converted into a state vector s of a uniform dimension and transferred to a subsequent automatic parking system. After being processed by a LSTM (Long Short Term Memory) network as input information data, the parking space environment information s is processed to obtain parking space environment states s with uniform dimensions and automatic driving vehicle state information s, and the parking space environment information s and the automatic driving vehicle state information s are transmitted to the input end of a neural network in a deep reinforcement learning algorithm.

，

；

the noise network is added at the input end to enhance the exploratory property of path planning curve solution, so as to explore a curve solution set which is more excellent than a rule-based method, and the basic principle of the deep reinforcement learning algorithm is to replace the parameter w in the neural network with the parameter w which obeys Gaussian distribution，/>And->Respectively represent the standard deviation and the mean of the parameters in the Gaussian distribution, which are derived from learning empirical data, < - >Each element in the Gaussian distribution is randomly sampled from the standard normal distribution N (0, 1) independently, so the Q value of the deep reinforcement learning algorithm is expressed as Q (s, a, & lt/EN & gt>). The input end of the neural network in the deep reinforcement learning algorithm also introduces random noise, which can be expressed as:wherein->And->Respectively represent the standard deviation and the mean of parameters in Gaussian distribution, +.>For random noise in Gaussian distribution, W is a matrix parameter used for storing random noise information in a noise network, x is action information of an input end of a neural network, and b is bias in a noise network activation function.

Wherein, the method comprises the steps of, wherein,

,

，

；

depending on the distance of the vehicle from the parking point;

reward value for assessing steering angle during parking movement of vehicle, the greater the value, the smoother curve during driving of vehicleThe better the path planning curve;

、/>、/>、/>and->Parameters of a reward function;

when the parking of the vehicle is stopped, the relative position information of the vehicle coordinates and the garage position coordinates is obtained; />

When stopping parking the vehicle, the final parking posture of the vehicle;

for the steering angle during the parking movement of the vehicle.

The learning index for guiding the intelligent agent to conduct the optimal parking path is mainly used for guiding the action learning process of the vehicle from three aspects of Euclidean distance of the vehicle from the parking coordinate, switching times and steering angle of a steering wheel in the parking process and final posture of the vehicle when the parking action is completed;

Wherein, according to the distance between the vehicle and the parking point, the closer the four end points of the vehicle are to the parking area, the more ideal the whole path is described; at the same time, taking into account safety and comfort considerations of the parking behavior, if the steering wheel is frequently switched and the gear is switchedThe more the number of the parking positions is, the less ideal the parking path curve is, the poor parking action quality is, and a larger punishment value is set; finally, use the prize value->And (3) evaluating the final posture of the vehicle in the parking garage position, wherein the larger the deflection angle is, the worse the parking posture is, so that a larger punishment item is set.

The classical DQN (Deep Q Network) algorithm uses an experience playback mechanism based on uniform sampling to repeatedly extract historical experiences to repeatedly train an intelligent agent, so that correlation of training samples is hindered, and the problem that the Q-Learning algorithm is unstable and is easy to fall into a local optimal solution is solved. However, even sampling of the DQN (Deep Q Network) algorithm makes it difficult to accurately select higher value information for use, and the empirical playback mechanism has low utilization of historical information, making it difficult for the classical DQN (Deep Q Network) algorithm to converge quickly.

In this embodiment, a priority experience playback mechanism based on a binary tree structure model is introduced into the deep reinforcement learning algorithm.

The SumPree adopts a binary tree structure, the data stored in the nodes of each layer are sample priorities, the numerical value is positively correlated with the node priorities, and the utilization efficiency of high-value samples can be improved through a priority experience playback mechanism based on the SumPree. The priority of each node in the SumPree structure is determined by a time sequence difference error TD-error, the difference is the difference between a target network value function and a training network function when the deep reinforcement learning algorithm is updated, the larger the difference is, the larger the value is, the larger the prediction precision of the training network is, the larger the rising space is, the higher the trained value is, the higher the priority is set, the numerical value below the leaf node represents the numerical value interval where the sample is located, and the larger the numerical value interval is, the higher the probability is selected in the sampling process.

，

；

On the other hand, the introduction of the dueling_ DQN (Dueling Deep Q Network) network has the inherent defect of non-uniqueness, namely the same, for improving the DQN (Deep Q Network) network mechanismQ(s,a) Possibly from different state valuessAnd action valueaIs generated by the joint distribution of (a) and thus the system cannot be based onQAccurate training of values how an agent targets different state valuessSelecting the best actionaIn addition, in obstacle avoidance issues such as parking and path planning, the vehicle will generate the same if left or right turn is selectedQValue, thus action valueaThe final result cannot be determined in specific situationsQValues. A deep reinforcement learning algorithm architecture for parking path planning is shown in fig. 12.

The parameters of the DuelingDQN (Dueling Deep Q Network) algorithm output layer are Value and Advantage respectively, and are used for evaluating state Value and action Advantage, and are expressed as follows:

Wherein->，

sThe state information of the vehicle to be parked is respectively the coordinates of eight endpoints of the vehicle, the maximum vehicle speed ratio, the vehicle pose rotation angle and the distance between the vehicle and the parking point.

，；

wherein->As a function of state values>As a function of the action value(s),aas the value of the action is set,sthe state value is a state value representing vehicle state information, and comprises coordinates of eight endpoints of the vehicle, a maximum vehicle speed ratio, a vehicle pose rotation angle and a distance between the vehicle and a parking point.

，

。

in this embodiment, an improved GNSS/INS (Global Navigation Satellite System/Inertial Navigation System) loosely coupled integrated navigation model is provided. The parking space environment information acquisition process of the parking space model comprises the following steps: the satellite navigation receiver corresponding to the GNSS (Global Navigation Satellite System) module acquires the posture and the relative position information of the running vehicle, the posture, the speed and the acceleration information of the running vehicle, which are acquired by the inertial navigation system corresponding to the IMU (Inertial Measurement Unit) module and are obtained through INS (Inertial Navigation System) calculation, the data fusion positioning is carried out by the improved unscented Kalman filter, the error of the vehicle state information is corrected and noise reduced, the obtained filtering result is used for feeding back to the INS (Inertial Navigation System) calculation module to correct the reading error of the IMU (Inertial Measurement Unit) module, the current vehicle state information is obtained through repeated iterative operation of the system, the position information of the vehicle and the parking space environment information of the vehicle are fed back to the path planning decision system of the vehicle, and whether the parking operation of the vehicle is completed in the correct posture or not is judged according to the parking standard.

In some embodiments, the above-described system 100 operates in use in the following manner:

s1: the method comprises the steps of running a parking action data generation module 1, generating a parking action data set in a parking garage position model through a deep reinforcement learning algorithm based on a vehicle kinematic model, wherein a neural network in the deep reinforcement learning algorithm generates probability distribution corresponding to different parking actions;

s2: running a parking action quality evaluation module 2, constructing a reward function by using a deep reinforcement learning algorithm, evaluating the quality of the parking action in the parking action data set, and obtaining data with optimal parking action quality;

s3: and running a parking strategy learning module 3, updating network parameters in the deep reinforcement learning algorithm by using data with optimal parking action quality, and performing next iterative operation by using the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking action converges to the optimal value, so as to complete the parking strategy learning.

Based on the description of the above embodiments, the embodiments of the present disclosure can achieve the following technical effects:

(1) According to the method and the device, the pre-training of the parking operation is carried out under the simulation environment based on the standard parameter multiple garage models, in the subsequent real-scene parking operation, the vehicle can call the corresponding pre-training model to directly park according to the garage type, and compared with a learning-based method, the method and the device are higher in instantaneity and can be applied to multiple parking environments.

(2) The parking path planning algorithm fully considers comprehensive factors such as safety and comfort of parking, utilizes 11 state dimensions of the vehicle state to accurately control the azimuth, the attitude, the speed and the acceleration information of the vehicle, controls the minimum safe distance between the vehicle and an obstacle and the variation of the speed and the acceleration through constraint conditions and reinforcement learning reward functions, thereby ensuring the accurate attitude, the safety and good user experience of parking, and reduces the calculation complexity of input information through dimension mapping of a neural network.

(3) The method adds a priority experience playback mechanism to the deep reinforcement learning algorithm mechanism for enhancing the utilization rate of effective experience, and enhances the learning efficiency of the algorithm; the long-term memory network and the noise network are added at the state and action input end, so that the method has the capabilities of extracting environment and state characteristics more efficiently and exploring better data.

Example 3

An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the automated parking path planning method of embodiment 1 based on a deep reinforcement learning algorithm when the computer program is executed.

Embodiment 3 of the present disclosure is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure.

The electronic device may be in the form of a general purpose computing device, which may be a server device, for example. Components of an electronic device may include, but are not limited to: at least one processor, at least one memory, a bus connecting different system components, including the memory and the processor.

The buses include a data bus, an address bus, and a control bus.

The memory may include volatile memory such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).

The memory may also include program means having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The processor executes various functional applications and data processing by running computer programs stored in the memory.

The electronic device may also communicate with one or more external devices (e.g., keyboard, pointing device, etc.). Such communication may be through an input/output (I/O) interface. And, the electronic device may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through a network adapter. The network adapter communicates with other modules of the electronic device via a bus. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with an electronic device, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.

It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module according to embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Example 4

A computer-readable storage medium storing a computer program which, when executed by a processor, implements the steps of the automated parking path planning method based on the deep reinforcement learning algorithm in embodiment 1.

More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation manner, the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the depth reinforcement learning algorithm-based automatic parking path planning method described in embodiment 1, when the program product is run on the terminal device.

Wherein the program code for carrying out the present disclosure may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device, partly on the remote device or entirely on the remote device.

Although embodiments of the present disclosure have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the disclosure, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The automatic parking path planning method based on the deep reinforcement learning algorithm is characterized by comprising the following steps of:

s20, constructing a reward function by using a deep reinforcement learning algorithm, and evaluating the quality of the parking action in the parking action data set to obtain data with optimal parking action quality;

And S30, updating network parameters in the deep reinforcement learning algorithm by using data with optimal parking motion quality, and performing next iterative operation by using the updated deep reinforcement learning algorithm until the strategy evaluation dominance function of the parking motion converges to the optimal value, so as to complete parking strategy learning.

2. The method for planning an automatic parking path based on a deep reinforcement learning algorithm according to claim 1, wherein the generating a parking action data set in a parking garage model by the deep reinforcement learning algorithm comprises: and taking the parking space environment information of the parking space model as input information data, introducing a long-term memory network to process, and conveying the obtained parking space environment and vehicle state information with uniform dimension to the input end of the neural network in the deep reinforcement learning algorithm.

3. The method for planning an automatic parking path based on a deep reinforcement learning algorithm according to claim 2, wherein random noise is further introduced into the input end of the neural network in the deep reinforcement learning algorithm, and the random noise can be expressed as:

wherein Re isLUIs an activation function of the noise_d3qn algorithm, +.>And->Respectively represent the standard deviation and the mean of parameters in Gaussian distribution, +. >For random noise in Gaussian distribution, W is a matrix parameter used for storing random noise information in a noise network, x is action information of an input end of a neural network, and b is bias in a noise network activation function.

4. The automatic parking path planning method based on a deep reinforcement learning algorithm according to claim 3, wherein constructing the bonus function using the deep reinforcement learning algorithm includes: guiding parking posture and path of the vehicle by adopting a rewarding function, setting a constraint rewarding function for the intelligent agent according to the rotation angle of the vehicle, the distance from each collision detection point to the end point and the final posture of each wheel of vehicle in the driving process,

the reward function is

Wherein, the method comprises the steps of, wherein,

,/>，

；

depending on the distance of the vehicle from the parking point;

the method comprises the steps of setting a larger punishment item according to a reward value for evaluating the final gesture of the vehicle in a parking garage position, wherein the larger the deflection angle is, the worse the parking gesture is, and the larger punishment item is set; />The method comprises the steps of evaluating a rewarding value of a steering angle in the process of parking movement of a vehicle, wherein the larger the rewarding value is, the more stable the curve is in the process of driving the vehicle, and the better the path planning curve is;

、/>、/>、/>and->Parameters of a reward function;

vehicle coordinates and library when parking the vehicle Relative position information of the bit coordinates;

when stopping parking the vehicle, the final parking posture of the vehicle;

for the steering angle during the parking movement of the vehicle.

5. The method for planning an automatic parking path based on a deep reinforcement learning algorithm according to claim 4, wherein updating network parameters in the deep reinforcement learning algorithm using data having the best quality of the parking action comprises: and introducing a preferential experience playback mechanism based on a binary tree structure model into the deep reinforcement learning algorithm.

6. The depth reinforcement learning algorithm-based automatic parking path planning method according to any one of claims 1 to 5, wherein the strategy evaluation dominance function of the parking action is expressed as:

，

7. The automatic parking path planning method based on the deep reinforcement learning algorithm according to claim 2, wherein the process of acquiring the parking space environment information of the parking space model includes: the satellite navigation system obtains the posture, speed and acceleration information of the vehicle by using an inertial measurement unit through automatically driving the vehicle and obtaining the posture and relative position information of the vehicle through an INS (inertial sensing system) resolving unit; and correcting and denoising the acquired attitude, speed and acceleration information of the vehicle by adopting a Kalman filtering system, and feeding back the obtained filtering result to an INS resolving unit so as to correct the reading error of the inertial measurement unit.

8. An automatic parking path planning system based on a deep reinforcement learning algorithm, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the depth reinforcement learning algorithm-based auto park path planning method of any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the automatic parking path planning method based on the deep reinforcement learning algorithm as claimed in any one of claims 1 to 7.