CN115472038B - Automatic parking method and system based on deep reinforcement learning - Google Patents

Automatic parking method and system based on deep reinforcement learning Download PDF

Info

Publication number
CN115472038B
CN115472038B CN202211353517.XA CN202211353517A CN115472038B CN 115472038 B CN115472038 B CN 115472038B CN 202211353517 A CN202211353517 A CN 202211353517A CN 115472038 B CN115472038 B CN 115472038B
Authority
CN
China
Prior art keywords
network
initial
action
vehicle
executor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211353517.XA
Other languages
Chinese (zh)
Other versions
CN115472038A (en
Inventor
邱思杰
黄忠虎
贾鹏
马豪
伍坪
谢华
刘春明
纪联南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Jiezhiyi Technology Co ltd
Sanming University
Original Assignee
Nanjing Jiezhiyi Technology Co ltd
Sanming University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Jiezhiyi Technology Co ltd, Sanming University filed Critical Nanjing Jiezhiyi Technology Co ltd
Priority to CN202211353517.XA priority Critical patent/CN115472038B/en
Publication of CN115472038A publication Critical patent/CN115472038A/en
Application granted granted Critical
Publication of CN115472038B publication Critical patent/CN115472038B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/14Traffic control systems for road vehicles indicating individual free spaces in parking areas
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/09Arrangements for giving variable traffic instructions
    • G08G1/0962Arrangements for giving variable traffic instructions having an indicator mounted inside the vehicle, e.g. giving voice messages
    • G08G1/0967Systems involving transmission of highway information, e.g. weather, speed limits
    • G08G1/096708Systems involving transmission of highway information, e.g. weather, speed limits where the received information might be used to generate an automatic action on the vehicle control
    • G08G1/096725Systems involving transmission of highway information, e.g. weather, speed limits where the received information might be used to generate an automatic action on the vehicle control where the received information generates an automatic action on the vehicle control
    • GPHYSICS
    • G08SIGNALLING
    • G08GTRAFFIC CONTROL SYSTEMS
    • G08G1/00Traffic control systems for road vehicles
    • G08G1/09Arrangements for giving variable traffic instructions
    • G08G1/0962Arrangements for giving variable traffic instructions having an indicator mounted inside the vehicle, e.g. giving voice messages
    • G08G1/0968Systems involving transmission of navigation instructions to the vehicle
    • G08G1/096805Systems involving transmission of navigation instructions to the vehicle where the transmitted instructions are used to compute a route
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Atmospheric Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an automatic parking method and system based on deep reinforcement learning, which comprises the steps of constructing an initial evaluator network and an initial executor network; training the initial evaluator network and the initial executor network to obtain an executor network based on a state value baseline of a state; acquiring a current image of a vehicle; acquiring the current vehicle position and the parking position; inputting the current image, the current vehicle position and the parking space position into the executor network, and outputting a current action execution strategy by the executor network; the vehicle executes the strategy execution action based on the current action, and acquires a next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle finishes an automatic parking task; the control instruction of the vehicle is generated by using the deep neural network, and the training of the deep neural network is completed by an evaluator executor algorithm, so that automatic parking can be better realized.

Description

Automatic parking method and system based on deep reinforcement learning
Technical Field
The invention relates to the technical field of automatic driving, in particular to an automatic parking method and system based on deep reinforcement learning.
Background
The parking task is a situation frequently encountered in daily life, and particularly when the range of a feasible driving space around a target parking space is small, the parking task often requires a lot of driving experience and driving skill of a driver, which cannot ensure completion of the corresponding parking task for inexperienced drivers. In the traditional scheme, multiple cameras and a vehicle-mounted radar are mostly adopted as a vehicle environment sensing means, the system cost is improved, the complexity of characteristic information extraction is increased, the vehicle path planning and the motion control of the vehicle are mutually split, and the parking system module is complex in design.
In view of the above, the present invention provides an automatic parking method and system based on deep reinforcement learning, so as to provide an end-to-end automatic parking solution while meeting the requirements for automatic parking tasks in daily life. The invention adopts a camera as an environment perception means, uses a deep neural network to generate a control instruction of a vehicle, completes the training of the deep neural network through an evaluator executor algorithm, and finally realizes the automatic parking function.
Disclosure of Invention
The invention aims to provide an automatic parking method based on deep reinforcement learning, which comprises the steps of constructing an initial evaluator network and an initial executor network; training the initial evaluator network and the initial executor network based on a state value baseline of a state to obtain an executor network; training to obtain an executor network, and constructing a profit gradient of the initial executor network based on the value of an action execution strategy and the state value baseline; wherein the formula for constructing the profit gradient is:
Figure 317592DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 656169DEST_PATH_IMAGE002
representing the revenue gradient;
Figure 889704DEST_PATH_IMAGE003
representing the accumulated revenue;
Figure 189099DEST_PATH_IMAGE004
representing an action award;
Figure 746375DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 877142DEST_PATH_IMAGE006
representing a state value baseline of the vehicle at time t + 1;
Figure 219262DEST_PATH_IMAGE007
representing a state value baseline of the vehicle at time t;
Figure 68269DEST_PATH_IMAGE008
is shown in a state
Figure 645881DEST_PATH_IMAGE009
Perform an action
Figure 631155DEST_PATH_IMAGE010
The sample action execution policy of (1); updating network parameters of the initial actor network based on the benefit gradient until the benefit gradient reaches a maximum value; using the initial executor network as training when obtaining maximum profit gradientA trained actor network; acquiring a current image of a vehicle; the current image includes a state of the vehicle in a current environment; acquiring the current vehicle position and the parking position; inputting the current image, the current vehicle position and the parking space position into the executor network, and outputting a current action execution strategy by the executor network; and the vehicle executes the strategy execution action based on the current action, and acquires a next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle finishes the automatic parking task.
Further, by constructing a multi-layer data structure, obtaining the initial evaluator network and the initial executor network, including a convolution operation and a maximal pooling operation with 7 × 7 in a first layer of the data structure; a second layer of the data structure adopts a residual error module to perform feature extraction; the third layer of the data structure adopts a residual error module to extract the characteristics; a fourth layer of the data structure adopts a residual error module to perform feature extraction; a fifth layer of the data structure adopts a residual error module to extract the characteristics; the sixth layer of the data structure employs an averaging pooling operation.
Further, the training is performed to obtain an executor network, which comprises inputting a sample image, a sample vehicle position and a sample parking space position into the initial executor network, and the initial executor network outputs a sample action execution strategy; the vehicle executes a policy-enforcement action based on the sample action; obtaining an action reward for executing the sample action execution strategy; taking the sample image, the execution action, the action reward and the next sample image as training samples and storing the training samples in an experience pool; the next sample image is an image of the vehicle environment obtained after the action is executed; randomly extracting training samples from the experience pool; inputting a sample image in the extracted training sample and a next sample image into the initial executor network to obtain the value of an action execution strategy and the state value baseline; updating network parameters of the initial actor network and the initial evaluator network based on the value of the action execution policy and the state value baseline; and when the vehicle is not collided and the training of the initial executor network and the initial evaluator network is finished, obtaining the trained executor network and the trained evaluator network.
Further, the formula for updating the network parameters of the initial actor network is:
Figure 206493DEST_PATH_IMAGE011
wherein, the first and the second end of the pipe are connected with each other,
Figure 480479DEST_PATH_IMAGE012
network parameters representing the updated initial actor network;
Figure 799465DEST_PATH_IMAGE013
network parameters representing the initial actor network;
Figure 983453DEST_PATH_IMAGE014
representing a learning rate of the initial actor network;
Figure 667375DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 959816DEST_PATH_IMAGE015
representing a value of the action execution policy;
Figure 410389DEST_PATH_IMAGE016
representing the state value baseline;
Figure 104675DEST_PATH_IMAGE008
a sample action execution strategy representing the extracted training samples; the formula for updating the network parameters of the initial evaluator network is:
Figure 21816DEST_PATH_IMAGE017
wherein, the first and the second end of the pipe are connected with each other,
Figure 535974DEST_PATH_IMAGE018
network parameters representing the updated initial evaluator network;
Figure 508347DEST_PATH_IMAGE019
network parameters representing the initial evaluator network;
Figure 322719DEST_PATH_IMAGE020
representing a learning rate of the initial evaluator network;
Figure 879602DEST_PATH_IMAGE015
representing a value of the action execution policy;
Figure 474532DEST_PATH_IMAGE016
representing the state value baseline;
Figure 939011DEST_PATH_IMAGE007
representing a state value baseline of the selected training sample.
Further, the evaluator network training is completed, including constructing a loss function of the initial evaluator network based on the state value baseline; updating network parameters of the initial evaluator network based on the loss function until the loss function reaches a minimum value; and taking the initial evaluator network when the minimum loss function is obtained as the trained evaluator network.
Further, the formula for constructing the loss function is as follows:
Figure 545573DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 148724DEST_PATH_IMAGE022
indicating that the initial evaluator network has network parameters of
Figure 168632DEST_PATH_IMAGE023
A loss function of time;
Figure 108907DEST_PATH_IMAGE024
representing an action award;
Figure 897871DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 655611DEST_PATH_IMAGE025
representing a state value baseline of the vehicle at time t + 1;
Figure 897237DEST_PATH_IMAGE026
representing the vehicle's state value baseline at time t.
Further, the formula of the action execution strategy is as follows:
Figure 641202DEST_PATH_IMAGE027
Figure 550252DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 731091DEST_PATH_IMAGE029
representing the selected action;
Figure 663275DEST_PATH_IMAGE030
indicating a driving direction of the vehicle;
Figure 7669DEST_PATH_IMAGE031
indicating steering of the steering wheel.
The invention aims to provide an automatic parking system based on deep reinforcement learning, which comprises a deep neural network construction module, a deep neural network training module, an image acquisition module, a position acquisition module, a determination module and a circulation module; the deep neural network construction module is used for constructing an initial evaluator network and an initial executor network; the deep neural network training module is used for training the initial evaluator network and the initial executor network to obtain an executor network based on a state value baseline of a state; training to obtain an executor network, and constructing a profit gradient of the initial executor network based on the value of an action execution strategy and the state value baseline; wherein the formula for constructing the profit gradient is:
Figure 99122DEST_PATH_IMAGE032
wherein, the first and the second end of the pipe are connected with each other,
Figure 808452DEST_PATH_IMAGE002
representing the revenue gradient;
Figure 24669DEST_PATH_IMAGE033
representing the accumulated revenue;
Figure 172754DEST_PATH_IMAGE004
representing an action award;
Figure 603866DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 811994DEST_PATH_IMAGE006
representing a state value baseline of the vehicle at time t + 1;
Figure 453191DEST_PATH_IMAGE007
representing a state value baseline of the vehicle at time t;
Figure 139387DEST_PATH_IMAGE008
is shown in a state
Figure 939853DEST_PATH_IMAGE009
Perform an action
Figure 53302DEST_PATH_IMAGE010
The sample action execution policy of (1); updating network parameters of the initial actor network based on the benefit gradient until the benefit gradient reaches a maximum value; taking the initial executor network when the maximum profit gradient is obtained as a trained executor network; the image acquisition module is used for acquiring a current image of the vehicle; the current image includes a state of the vehicle in a current environment; the position acquisition module is used for acquiring the current vehicle position and the parking space position; the determining module is used for inputting the current image, the current vehicle position and the parking space position into the executor network, and the executor network outputs a current action execution strategy; and the circulation module is used for executing the strategy execution action by the vehicle based on the current action, and acquiring a next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle finishes an automatic parking task.
The technical scheme of the embodiment of the invention at least has the following advantages and beneficial effects:
some embodiments of the invention can greatly improve the convergence rate of network training by constructing a profit gradient and updating the actor network based on the maximum value of the profit gradient.
Some embodiments of the invention train the network by adopting an evaluator executor algorithm based on a state baseline, train the evaluator network while training the executor network, so that the updated executor network can update the parameters of the network based on the evaluation of the updated evaluator network, improve the accuracy of parameter update, and the state value baseline is an evaluation reference obtained by the evaluator network according to historical actions and values, so that the change of the evaluation value can be within a certain range, and reduce errors.
Some embodiments of the present invention enable further increasing the efficiency of vehicle exploration for the environment space in the deep reinforcement learning environment by using a reward function based on a potential function difference form.
Drawings
FIG. 1 is a flowchart illustrating an example method for automatic parking based on deep reinforcement learning according to some embodiments of the present disclosure;
FIG. 2 is an exemplary flow diagram of training a resulting actor network provided by some embodiments of the present invention;
fig. 3 is a block diagram of an automatic parking system based on deep reinforcement learning according to some embodiments of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Fig. 1 is an exemplary flowchart of an automatic parking method based on deep reinforcement learning according to some embodiments of the present invention. In some embodiments, process 100 may be performed by system 300. The process 100 shown in fig. 1 may include the following steps:
step 110, an initial evaluator network and an initial executor network are constructed. In some embodiments, step 110 may be performed by deep neural network building module 310.
The initial actor network may refer to a deep neural network used to train the resulting actor network. The actor network may be used to determine an action execution policy based on the input current image of the vehicle. The current image may refer to an image of the vehicle in the current environment. In some embodiments, a camera is disposed on the vehicle, and the camera may acquire an image of an environment in which the vehicle is located. An action execution strategy may refer to an action that can be performed based on the environment in which the vehicle is currently located. For example, the current image of the vehicle with dimension 3 × 224 is input to the actor network, and the actor network outputs an action execution policy of 10 × 1. Where 10 represents ten execution actions that may be made.
The execution of the action takes a discrete action space, and in some embodiments, the formula of the action execution policy is:
Figure 181795DEST_PATH_IMAGE027
Figure 671682DEST_PATH_IMAGE028
wherein the content of the first and second substances,
Figure 575922DEST_PATH_IMAGE029
an action that represents that may be selected;
Figure 860273DEST_PATH_IMAGE030
indicating the driving direction of the vehicle. For example, forward or backward.
Figure 538379DEST_PATH_IMAGE031
Indicating the steering of the steering wheel. For example, the steering wheel steering angles are five, i.e., 90 degrees left, 45 degrees left, neutral, 45 degrees right, and 90 degrees right.
The initial evaluator network may refer to a deep neural network used to train the resulting evaluator network. The evaluator network may be configured to determine a value of an action execution policy based on the action execution policy.
In some embodiments, the deep neural network construction module 310 may construct the actor network and the evaluator network in various ways to construct a deep neural network.
And step 120, training the initial evaluator network and the initial executor network based on the action value baseline of the state to obtain an executor network. In some embodiments, step 120 may be performed by deep neural network training module 320.
The state value baseline may reflect the current state of the vehicle. In some embodiments, the state value baseline may be obtained through an evaluator network. For example, a current image of the vehicle, a next image after the execution of the action, and a corresponding action execution policy may be input to an evaluator network, which outputs a state value baseline of the vehicle in a current state and a state value baseline of the vehicle in a next state.
In some embodiments, the deep neural network training module 320 may train the initial actor network by various methods of training the deep machine learning model, resulting in an actor network. For more on training the executive network, see figure 2 and its associated description.
Step 130, acquiring a current image of the vehicle; the current image includes a state in which the vehicle is in the current environment. In some embodiments, step 130 may be performed by image acquisition module 330.
See step 110 and its associated description for the current image and the associated content of the acquired current image.
And 140, acquiring the current vehicle position and the parking space position. In some embodiments, step 140 may be performed by location acquisition module 340.
In some embodiments, the position obtaining module 340 may obtain the current vehicle position and the parking space position in various feasible manners. For example, the current position of the vehicle may be acquired by an onboard GPS; and the position information of the idle parking spaces in the garage is acquired through communication connection with the garage.
And 150, inputting the current image, the current vehicle position and the parking space position into an executor network, and outputting a current action execution strategy by the executor network. In some embodiments, step 150 may be performed by determination module 350.
For example, will
Figure 504061DEST_PATH_IMAGE034
The environmental image of the vehicle, the position of the vehicle and the position of the parking space at the moment are input into an executor network, and the executor network outputs the action that the vehicle can be selected at the current position and the probability of the selection. In some embodiments, the action with the highest probability may be taken as the execution action.
And step 160, the vehicle executes the action based on the current action execution strategy, and acquires the next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle finishes the automatic parking task. In some embodiments, step 160 may be performed by loop module 360.
The next image may refer to an image of the environment around the vehicle acquired when the vehicle reaches the next state after performing the motion. The next image is acquired in the same manner as the current image. The next vehicle position may refer to a position that the vehicle reaches after performing the action. The next vehicle position is acquired in the same manner as the current vehicle position is acquired.
In some embodiments, the performer network may determine a number of performance actions of the vehicle based on the environment in which the vehicle is located a number of times, respectively. The vehicles may each perform an action until the vehicle reaches a parking location. In some embodiments, a GNSS sensor may be provided on the vehicle, and whether the vehicle reaches the parking location may be determined based on a difference between a sensor longitude, a sensor latitude, and a vehicle attitude of the vehicle and a parking space longitude, a parking space latitude, and an attitude of the requested vehicle, which are acquired by the GNSS sensor. In some embodiments, a potential function representation of the state may be constructed by selecting a distance between a latitude and longitude coordinate corresponding to a GNSS of a current location of the vehicle and a GNSS latitude and longitude coordinate of the vehicle at a destination location (parking space):
Figure 951223DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 734371DEST_PATH_IMAGE036
to represent
Figure 899773DEST_PATH_IMAGE037
At the first moment
Figure 669146DEST_PATH_IMAGE034
The dimensional information displayed by the individual GNSS sensors,
Figure 970814DEST_PATH_IMAGE038
represent
Figure 675596DEST_PATH_IMAGE037
At the first moment
Figure 62715DEST_PATH_IMAGE034
The longitude information displayed by the individual GNSS sensors,
Figure 432517DEST_PATH_IMAGE039
represents the first
Figure 791954DEST_PATH_IMAGE034
The latitude coordinate of the sensor at the end position,
Figure 854588DEST_PATH_IMAGE040
represents the first
Figure 791320DEST_PATH_IMAGE034
Longitude coordinates of the sensor at the end position.
The greater the difference in vehicle position from the terminal, the greater the corresponding state-based potential function. Value of current potential function
Figure 230392DEST_PATH_IMAGE041
And when the difference value is smaller than the preset difference value threshold value, determining that the vehicle finishes automatic parking. The preset difference threshold may refer to a maximum value of a potential function for the vehicle to complete parking. The preset difference threshold may be set empirically.
According to some embodiments of the invention, the change of the vehicle state is represented through the change of the image information, and then the optimal action execution strategy is determined according to different states of the vehicle, so that the dynamic planning of the vehicle parking path is realized, and the motion control of the vehicle is completed through the action execution strategy. And the control strategy of the vehicle is output after the input image of the front camera of the vehicle is calculated through a deep neural network, so that the end-to-end automatic parking function is realized.
The architecture of the initial evaluator network and the initial executor network may include an input layer, an output layer, and six-layer data structure layers. Obtaining an initial evaluator network and an initial executor network by constructing a multi-layer data structure, wherein the first layer of the data structure adopts 7-by-7 convolution operation and maximum pooling operation; the second layer of the data structure adopts a residual error module to extract the characteristics; the third layer of the data structure adopts a residual error module to extract the characteristics; the fourth layer of the data structure adopts a residual error module to extract the characteristics; the fifth layer of the data structure adopts a residual error module to extract the characteristics; the sixth layer of the data structure employs an averaging pooling operation.
The input layer may be used to input 3 x 224 image data. In some embodiments, the input pixel size accepts an RGB color picture of size 224 x 224. And the output layer is used for outputting the obtained feature vectors after full connection. For the actor network, the dimensions of the feature vectors it outputs are 10 × 1, where 10 corresponds to 10 actions of the vehicle. For the evaluator network, the dimension of the output feature vector is 1 × 1, where 1 corresponds to the value of the action.
Figure 2 is an exemplary flow diagram of training a resulting actor network provided by some embodiments of the present invention. In some embodiments, the process 200 may be performed by the deep neural network training module 320. As shown in fig. 2, the process 200 may include the following steps:
and step 210, inputting the sample image, the sample vehicle position and the sample parking space position into an initial executor network, and outputting a sample action execution strategy by the initial executor network. Wherein the sample image includes the current state of the vehicle, which may be recorded as
Figure 444335DEST_PATH_IMAGE042
(ii) a The sample action execution policy may be noted as
Figure 677870DEST_PATH_IMAGE043
The sample image may refer to an image of the current environment of a vehicle used to train the actor network. The sample vehicle location may refer to the current location of the vehicle used to train the actor network. The sample slot location may refer to a location of a slot in which the vehicle is required to park during the training process. In some embodiments, the sample image, sample vehicle location, and sample parking spot locations may be obtained by automatic parking of the vehicle. For example, a sample parking space position for parking and an initial position of the vehicle may be preset, and then an actual working scene of the vehicle may be simulated to obtain a sample image and a sample vehicle position. Wherein the initial position may be based on the design of the environment.
In step 220, the vehicle executes a policy enforcement action based on the sample action. Wherein the execution action may be noted as
Figure 354096DEST_PATH_IMAGE044
Performing the action may refer to an action of the vehicle to go from a current state to a next state. For example, the vehicle moves according to the action with the highest probability of being selected in the execution strategy.
An action reward for executing the sample action execution policy is obtained, step 230. Wherein the action reward may be recorded as
Figure 65700DEST_PATH_IMAGE004
In some embodiments, an action reward is calculated
Figure 196467DEST_PATH_IMAGE004
The formula of (1) is as follows:
Figure 538586DEST_PATH_IMAGE045
wherein the content of the first and second substances,
Figure 387594DEST_PATH_IMAGE046
represents a proportionality coefficient, the effect of which is to
Figure 965206DEST_PATH_IMAGE047
The scaling is carried out to a reasonable interval, and the scaling can be actually determined according to requirements;
Figure 216058DEST_PATH_IMAGE047
to represent
Figure 729079DEST_PATH_IMAGE037
Time reward letterA component of the number based on the potential function difference;
Figure 144011DEST_PATH_IMAGE048
represents a collision penalty of 0 when no collision occurs and-2 when a collision occurs;
Figure 197418DEST_PATH_IMAGE049
indicating a reward after completion of the auto-dock, a reward of +5 is given when the task is completed.
In some embodiments, calculating
Figure 302777DEST_PATH_IMAGE037
Component of time reward function based on potential function difference
Figure 986699DEST_PATH_IMAGE047
The formula of (1) is:
Figure 544720DEST_PATH_IMAGE050
i.e. the difference between the two preceding and following vehicle situational functions, with respect to
Figure 729713DEST_PATH_IMAGE041
See fig. 1 and its associated description.
Step 240, taking the sample image, the execution action, the action reward and the next sample image as training samples and storing the training samples in an experience pool; the next sample image is an image of the vehicle environment obtained after the action is performed. Wherein the next sample image can be used to represent the next state of the vehicle, which can be noted as
Figure 689579DEST_PATH_IMAGE051
In some embodiments, the training samples stored in the experience pool may be in the format of
Figure 544403DEST_PATH_IMAGE052
At step 250, training samples are randomly drawn from the experience pool. Wherein the extracted training samples may be recorded as
Figure 855298DEST_PATH_IMAGE053
Step 260, inputting the sample image in the extracted training sample and the next sample image into the initial executor network to obtain the value and state value baseline of the action execution strategy.
Step 270, updating the network parameters of the initial actor network and the initial evaluator network based on the value of the action execution policy and the state value baseline.
In some embodiments, the network parameters of the initial actor network and the initial evaluator network may be updated in an iterative manner.
In some embodiments, the formula for updating the network parameters of the initial actor network is:
Figure 827671DEST_PATH_IMAGE011
wherein, the first and the second end of the pipe are connected with each other,
Figure 642044DEST_PATH_IMAGE012
network parameters representing the updated initial actor network;
Figure 730085DEST_PATH_IMAGE013
representing the network parameters of the initial executor network, wherein the initial network parameters are obtained through an initialization model;
Figure 200381DEST_PATH_IMAGE014
representing a learning rate of an initial actor network, the learning rate of the initial actor network being determined by initializing a simulated parking environment;
Figure 399281DEST_PATH_IMAGE015
represents the value of the action execution policy;
Figure 396056DEST_PATH_IMAGE016
representing a status value baseline;
Figure 920578DEST_PATH_IMAGE008
a sample action execution policy representing the extracted training samples.
The formula for updating the network parameters of the initial evaluator network is:
Figure 612591DEST_PATH_IMAGE017
wherein, the first and the second end of the pipe are connected with each other,
Figure 880761DEST_PATH_IMAGE018
network parameters representing the updated initial evaluator network;
Figure 482775DEST_PATH_IMAGE019
representing the network parameters of an initial evaluator network, wherein the initial network parameters are obtained through an initialization model;
Figure 912619DEST_PATH_IMAGE020
representing the learning rate of an initial evaluator network, the learning rate of the initial evaluator network being determined by initializing a simulated parking environment;
Figure 357507DEST_PATH_IMAGE015
represents the value of the action execution policy;
Figure 898210DEST_PATH_IMAGE016
representing a status value baseline;
Figure 541681DEST_PATH_IMAGE007
representing the state value baseline of the selected training sample.
In some embodiments of the present invention, the,
Figure 470322DEST_PATH_IMAGE054
Figure 199244DEST_PATH_IMAGE055
wherein, the first and the second end of the pipe are connected with each other,
Figure 746900DEST_PATH_IMAGE004
representing an action award;
Figure 344410DEST_PATH_IMAGE005
the discount rate of the action reward is represented and is determined by initializing a simulated parking environment;
Figure 850478DEST_PATH_IMAGE006
representing a state value baseline of the vehicle at time t + 1;
Figure 269958DEST_PATH_IMAGE007
representing the vehicle's state value baseline at time t.
And step 280, when the vehicle is not collided and the training of the initial executor network and the initial evaluator network is finished, obtaining the trained executor network and the trained evaluator network.
The non-collision means that the vehicle does not collide during the completion of parking.
In some embodiments, the actor network training is complete, including constructing a revenue gradient for the initial actor network based on the value and state value baselines of the action execution strategy; updating network parameters of the initial executor network based on the profit gradient until the profit gradient reaches a maximum value; and taking the initial executor network obtained when the maximum profit gradient is obtained as a trained executor network.
In some embodiments, the formula for constructing the revenue gradient is:
Figure 418043DEST_PATH_IMAGE032
wherein the content of the first and second substances,
Figure 98423DEST_PATH_IMAGE002
representing a revenue gradient;
Figure 40971DEST_PATH_IMAGE033
representing the accumulated revenue;
Figure 682168DEST_PATH_IMAGE004
representing an action award;
Figure 712572DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 185142DEST_PATH_IMAGE006
representing a state value baseline of the vehicle at time t + 1;
Figure 501853DEST_PATH_IMAGE007
representing a state value baseline of the vehicle at time t;
Figure 692663DEST_PATH_IMAGE008
is shown in a state
Figure 244867DEST_PATH_IMAGE009
Perform an action
Figure 837523DEST_PATH_IMAGE010
The sample action execution policy of (1).
In some embodiments, evaluator network training is complete, including constructing a loss function for an initial evaluator network based on a state value baseline; updating the network parameters of the initial evaluator network based on the loss function until the loss function reaches a minimum value; and taking the initial evaluator network when the minimum loss function is obtained as the trained evaluator network.
In some embodiments, the formula for constructing the loss function is:
Figure 325136DEST_PATH_IMAGE021
wherein, the first and the second end of the pipe are connected with each other,
Figure 472083DEST_PATH_IMAGE022
indicating that the initial evaluator network has network parameters of
Figure 765661DEST_PATH_IMAGE023
A loss function of time;
Figure 789987DEST_PATH_IMAGE024
representing an action award;
Figure 448501DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 348324DEST_PATH_IMAGE025
representing a state value baseline of the vehicle at time t + 1;
Figure 180014DEST_PATH_IMAGE026
representing the vehicle's state value baseline at time t.
And if the vehicle is collided or the automatic parking task is finished, re-initializing the system environment of the vehicle, and training again until the action strategy output by the network meets the automatic parking requirement.
Some embodiments herein employ two deep neural networks for the generation of a vehicle action enforcement strategy, i.e., an actor, respectively; and an evaluator, which is an estimate of the vehicle state value. The vehicle action value function in the process of the strategy gradient algorithm is estimated by using the reinforcement learning algorithm based on the value function, so that the defect that the vehicle state value cannot be accurately obtained by the strategy gradient algorithm in an unknown environment is overcome.
Fig. 3 is a block diagram of an automatic parking system based on deep reinforcement learning according to some embodiments of the present invention. As shown in fig. 3, the system 300 may include a deep neural network construction module 310, a deep neural network training module 320, an image acquisition module 330, a location acquisition module 340, a determination module 350, and a loop module 360.
The deep neural network building module 310 is used to build an initial evaluator network and an initial performer network. For more on the deep neural network building block 310, refer to fig. 1 and its associated description.
The deep neural network training module 320 is configured to train the initial evaluator network and the initial executor network to obtain an executor network based on the state value baseline of the state. For more on the deep neural network training module 320, refer to fig. 1 and its associated description.
The image acquisition module 330 is used for acquiring a current image of the vehicle; the current image includes a state in which the vehicle is in the current environment. For more on the image acquisition module 330, refer to fig. 1 and its associated description.
The position obtaining module 340 is used for obtaining the current vehicle position and the parking space position. For more on the location acquisition module 340, refer to fig. 1 and its related description.
The determination module 350 is configured to input the current image, the current vehicle position, and the parking space position into the executor network, where the executor network outputs a current action execution policy. For more of the determination module 350, refer to fig. 1 and its associated description.
The loop module 360 is used for the vehicle to execute the action based on the current action execution strategy, and to obtain the next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle completes the automatic parking task. For more of the loop module 360, see FIG. 1 and its associated description.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. An automatic parking method based on deep reinforcement learning is characterized by comprising the following steps:
constructing an initial evaluator network and an initial executor network; obtaining the initial evaluator network and the initial executor network by constructing a multi-layer data structure, including:
a first level of the data structure employs a convolution operation of 7 x 7 and a max pooling operation;
a second layer of the data structure adopts a residual error module to perform feature extraction;
the third layer of the data structure adopts a residual error module to extract the characteristics;
a fourth layer of the data structure adopts a residual error module to perform feature extraction;
the fifth layer of the data structure adopts a residual error module to extract the characteristics;
the sixth layer of the data structure adopts an average pooling operation;
training the initial evaluator network and the initial executor network based on a state value baseline of a state to obtain an executor network; wherein training obtains an actor network comprising:
constructing a revenue gradient for the initial actor network based on the value of an action execution policy and the state value baseline; wherein the formula for constructing the profit gradient is:
Figure 383688DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 53704DEST_PATH_IMAGE002
representing the revenue gradient;
Figure 902711DEST_PATH_IMAGE003
representing the accumulated revenue;
Figure 558952DEST_PATH_IMAGE004
representing an action award;
Figure 75384DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 40935DEST_PATH_IMAGE006
representing a state value baseline of the vehicle at time t + 1;
Figure 642817DEST_PATH_IMAGE007
representing a state value baseline of the vehicle at time t;
Figure 102749DEST_PATH_IMAGE008
is shown in a state
Figure 208108DEST_PATH_IMAGE009
Perform an action
Figure 610139DEST_PATH_IMAGE010
The sample action execution policy of (1);
updating network parameters of the initial actor network based on the benefit gradient until the benefit gradient reaches a maximum value;
taking the initial executor network when the maximum profit gradient is obtained as a trained executor network;
acquiring a current image of a vehicle; the current image includes a state of the vehicle in a current environment;
acquiring the current vehicle position and the parking position;
inputting the current image, the current vehicle position and the parking space position into the executor network, and outputting a current action execution strategy by the executor network; the action execution strategy refers to the execution action made based on the environment where the vehicle is currently located; the formula of the action execution strategy is:
Figure 168160DEST_PATH_IMAGE011
Figure 431782DEST_PATH_IMAGE012
wherein, the first and the second end of the pipe are connected with each other,
Figure 657227DEST_PATH_IMAGE013
representing the selected action;
Figure 964580DEST_PATH_IMAGE014
indicating a driving direction of the vehicle;
Figure 275476DEST_PATH_IMAGE015
indicating steering of the steering wheel;
the training obtains an executor network, further comprising:
inputting a sample image, a sample vehicle position and a sample parking space position into the initial executor network, and outputting a sample action execution strategy by the initial executor network;
the vehicle executes a policy-enforcement action based on the sample action;
obtaining an action reward for executing the sample action execution strategy;
taking the sample image, the execution action, the action reward and the next sample image as training samples and storing the training samples in an experience pool; the next sample image is an image of the vehicle environment obtained after the action is executed;
randomly extracting training samples from the experience pool;
inputting a sample image in the extracted training sample and a next sample image into the initial executor network to obtain the value of an action execution strategy and the state value baseline;
updating network parameters of the initial actor network and the initial evaluator network based on the value of the action execution policy and the state value baseline;
when the vehicle is not collided and the training of the initial executor network and the initial evaluator network is finished, obtaining the trained executor network and the trained evaluator network;
and the vehicle executes the strategy execution action based on the current action, and acquires a next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle finishes the automatic parking task.
2. The deep reinforcement learning-based automatic parking method according to claim 1, wherein the formula for updating the network parameters of the initial actor network is as follows:
Figure 201844DEST_PATH_IMAGE016
wherein, the first and the second end of the pipe are connected with each other,
Figure 157161DEST_PATH_IMAGE017
network parameters representing the updated initial actor network;
Figure 776361DEST_PATH_IMAGE018
network parameters representing the initial actor network;
Figure 699187DEST_PATH_IMAGE019
representing a learning rate of the initial actor network;
Figure 429245DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 239070DEST_PATH_IMAGE020
representing a value of the action execution policy;
Figure 763592DEST_PATH_IMAGE021
representing the state value baseline;
Figure 173714DEST_PATH_IMAGE008
a sample action execution strategy representing the extracted training samples;
the formula for updating the network parameters of the initial evaluator network is:
Figure 441884DEST_PATH_IMAGE022
wherein, the first and the second end of the pipe are connected with each other,
Figure 371794DEST_PATH_IMAGE023
network parameters representing the updated initial evaluator network;
Figure 67217DEST_PATH_IMAGE024
a network parameter representing the initial evaluator network;
Figure 574422DEST_PATH_IMAGE025
representing a learning rate of the initial evaluator network;
Figure 505338DEST_PATH_IMAGE020
representing a value of the action execution policy;
Figure 679967DEST_PATH_IMAGE021
representing the state value baseline;
Figure 280713DEST_PATH_IMAGE007
representing a state value baseline of the selected training sample.
3. The automatic parking method based on the deep reinforcement learning of claim 1, wherein the evaluator network training is completed and comprises:
constructing a loss function of the initial evaluator network based on the state value baseline;
updating network parameters of the initial evaluator network based on the loss function until the loss function reaches a minimum value;
and taking the initial evaluator network when the minimum loss function is obtained as the trained evaluator network.
4. The automatic parking method based on deep reinforcement learning according to claim 3, wherein the formula for constructing the loss function is as follows:
Figure 416159DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 760553DEST_PATH_IMAGE027
indicating that the initial evaluator network has network parameters of
Figure 914322DEST_PATH_IMAGE028
A loss function of time;
Figure 685969DEST_PATH_IMAGE029
representing an action award;
Figure 433345DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 456796DEST_PATH_IMAGE030
representing a state value baseline of the vehicle at time t + 1;
Figure 340439DEST_PATH_IMAGE031
representing the vehicle's state value baseline at time t.
5. An automatic parking system based on deep reinforcement learning is characterized by comprising a deep neural network construction module, a deep neural network training module, an image acquisition module, a position acquisition module, a determination module and a circulation module;
the deep neural network construction module is used for constructing an initial evaluator network and an initial executor network; obtaining the initial evaluator network and the initial executor network by constructing a multi-layer data structure, including:
a first level of the data structure employs a convolution operation of 7 x 7 and a max pooling operation;
a second layer of the data structure adopts a residual error module to perform feature extraction;
the third layer of the data structure adopts a residual error module to extract the characteristics;
a fourth layer of the data structure adopts a residual error module to perform feature extraction;
the fifth layer of the data structure adopts a residual error module to extract the characteristics;
the sixth layer of the data structure adopts an average pooling operation;
the deep neural network training module is used for training the initial evaluator network and the initial executor network to obtain an executor network based on a state value baseline of a state; training to obtain an executor network, and constructing a profit gradient of the initial executor network based on the value of an action execution strategy and the state value baseline; wherein the formula for constructing the profit gradient is:
Figure 938779DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 642293DEST_PATH_IMAGE002
representing the revenue gradient;
Figure 594068DEST_PATH_IMAGE003
representing the accumulated revenue;
Figure 207583DEST_PATH_IMAGE004
representing an action award;
Figure 586612DEST_PATH_IMAGE005
a discount rate representing an action award;
Figure 777422DEST_PATH_IMAGE006
representing a state value baseline of the vehicle at time t + 1;
Figure 680960DEST_PATH_IMAGE007
representing a state value baseline of the vehicle at time t;
Figure 539194DEST_PATH_IMAGE008
is shown in a state
Figure 964491DEST_PATH_IMAGE009
Perform an action
Figure 908176DEST_PATH_IMAGE010
The sample action execution policy of (1); updating network parameters of the initial actor network based on the benefit gradient until the benefit gradient reaches a maximum value; taking the initial executor network when the maximum profit gradient is obtained as a trained executor network;
the image acquisition module is used for acquiring a current image of the vehicle; the current image includes a state of the vehicle in a current environment;
the position acquisition module is used for acquiring the current vehicle position and the parking space position;
the determining module is used for inputting the current image, the current vehicle position and the parking space position into the executor network, and the executor network outputs a current action execution strategy; the action execution strategy refers to the execution action made based on the environment where the vehicle is currently located; the formula of the action execution strategy is:
Figure 936175DEST_PATH_IMAGE011
Figure 773549DEST_PATH_IMAGE012
wherein, the first and the second end of the pipe are connected with each other,
Figure 759960DEST_PATH_IMAGE013
representing the selected action;
Figure 66308DEST_PATH_IMAGE014
indicating a driving direction of the vehicle;
Figure 163577DEST_PATH_IMAGE015
indicating steering of the steering wheel;
the training obtains an executor network, further comprising:
inputting a sample image, a sample vehicle position and a sample parking space position into the initial executor network, and outputting a sample action execution strategy by the initial executor network;
the vehicle executes a policy-enforcement action based on the sample action;
obtaining an action reward for executing the sample action execution strategy;
taking the sample image, the execution action, the action reward and the next sample image as training samples and storing the training samples in an experience pool; the next sample image is an image of the vehicle environment obtained after the action is executed;
randomly extracting training samples from the experience pool;
inputting a sample image in the extracted training sample and a next sample image into the initial executor network to obtain the value of an action execution strategy and the state value baseline;
updating network parameters of the initial actor network and the initial evaluator network based on the value of the action execution policy and the state value baseline;
when the vehicle is not collided and the training of the initial executor network and the initial evaluator network is completed, obtaining the trained executor network and the trained evaluator network;
and the circulation module is used for executing the action by the vehicle based on the current action execution strategy and acquiring a next action execution strategy based on the executed next image, the next vehicle position and the parking space position until the vehicle finishes an automatic parking task.
CN202211353517.XA 2022-11-01 2022-11-01 Automatic parking method and system based on deep reinforcement learning Active CN115472038B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211353517.XA CN115472038B (en) 2022-11-01 2022-11-01 Automatic parking method and system based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211353517.XA CN115472038B (en) 2022-11-01 2022-11-01 Automatic parking method and system based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN115472038A CN115472038A (en) 2022-12-13
CN115472038B true CN115472038B (en) 2023-02-03

Family

ID=84337502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211353517.XA Active CN115472038B (en) 2022-11-01 2022-11-01 Automatic parking method and system based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN115472038B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN111645673A (en) * 2020-06-17 2020-09-11 西南科技大学 Automatic parking method based on deep reinforcement learning
CN112061116A (en) * 2020-08-21 2020-12-11 浙江大学 Parking strategy of reinforcement learning method based on potential energy field function approximation
CN112356830A (en) * 2020-11-25 2021-02-12 同济大学 Intelligent parking method based on model reinforcement learning
CN113859226A (en) * 2021-11-04 2021-12-31 赵奕帆 Movement planning and automatic parking method based on reinforcement learning
CN114454875A (en) * 2022-02-25 2022-05-10 深圳信息职业技术学院 Urban road automatic parking method and system based on reinforcement learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190220737A1 (en) * 2018-01-17 2019-07-18 Hengshuai Yao Method of generating training data for training a neural network, method of training a neural network and using neural network for autonomous operations

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110136481A (en) * 2018-09-20 2019-08-16 初速度(苏州)科技有限公司 A kind of parking strategy based on deeply study
CN111645673A (en) * 2020-06-17 2020-09-11 西南科技大学 Automatic parking method based on deep reinforcement learning
CN112061116A (en) * 2020-08-21 2020-12-11 浙江大学 Parking strategy of reinforcement learning method based on potential energy field function approximation
CN112356830A (en) * 2020-11-25 2021-02-12 同济大学 Intelligent parking method based on model reinforcement learning
CN113859226A (en) * 2021-11-04 2021-12-31 赵奕帆 Movement planning and automatic parking method based on reinforcement learning
CN114454875A (en) * 2022-02-25 2022-05-10 深圳信息职业技术学院 Urban road automatic parking method and system based on reinforcement learning

Also Published As

Publication number Publication date
CN115472038A (en) 2022-12-13

Similar Documents

Publication Publication Date Title
US20230043931A1 (en) Multi-Task Multi-Sensor Fusion for Three-Dimensional Object Detection
CN110136199B (en) Camera-based vehicle positioning and mapping method and device
CN110969655B (en) Method, device, equipment, storage medium and vehicle for detecting parking space
CN110954113B (en) Vehicle pose correction method and device
CN109733383A (en) A kind of adaptive automatic parking method and system
CN110126817A (en) A kind of method and system parked or recalled between adaptive arbitrary point and fixed point
CN111860072A (en) Parking control method and device, computer equipment and computer readable storage medium
CN113970922A (en) Point cloud data processing method and intelligent driving control method and device
CN115235500A (en) Lane line constraint-based pose correction method and device and all-condition static environment modeling method and device
CN114943952A (en) Method, system, device and medium for obstacle fusion under multi-camera overlapped view field
CN115249266A (en) Method, system, device and storage medium for predicting position of waypoint
CN115472038B (en) Automatic parking method and system based on deep reinforcement learning
CN114220040A (en) Parking method, terminal and computer readable storage medium
CN109752952B (en) Method and device for acquiring multi-dimensional random distribution and strengthening controller
CN111476062A (en) Lane line detection method and device, electronic equipment and driving system
US20210398014A1 (en) Reinforcement learning based control of imitative policies for autonomous driving
CN116734850A (en) Unmanned platform reinforcement learning autonomous navigation system and method based on visual input
CN116664498A (en) Training method of parking space detection model, parking space detection method, device and equipment
CN114708568B (en) Pure vision automatic driving control system, method and medium based on improved RTFNet
CN114104005B (en) Decision-making method, device and equipment of automatic driving equipment and readable storage medium
CN113034538B (en) Pose tracking method and device of visual inertial navigation equipment and visual inertial navigation equipment
EP4281945A1 (en) Static occupancy tracking
CN117058474B (en) Depth estimation method and system based on multi-sensor fusion
CN113624223B (en) Indoor parking lot map construction method and device
CN116968726B (en) Memory parking method and device, vehicle and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant