Invention content
In order to solve existing the problem of storage method stability is poor, and reliability is relatively low, and flexibility is poor that stop, this hair
It is bright to provide a kind of parking storage method and device.The technical solution is as follows:
In a first aspect, a kind of parking storage method is provided, the method includes:
First action control parameter is determined according to the status information of this vehicle, the status information includes this vehicle in parking environment
In location information in the parking environment of location information, target parking stall, the first action control parameter includes throttle
Or brake the first starting force angle value, direction disc spins the first initial angle;
The first action control parameter is assessed, determines the second action control parameter, second action control
Parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake;
According to the location status of described this vehicle of second action control parameter adjustment, the location status of this vehicle after adjustment is the
One location status;
Judge whether the first position state is preset state;
If the first position state is preset state, in parking environment when this vehicle is in the first position state
In location information in the parking environment of location information and the target parking stall as adaptive dynamic programming algorithm
Input parameter obtains the output parameter of the adaptive dynamic programming algorithm;
Using the corresponding path of output parameter of the adaptive dynamic programming algorithm as target travel path;
According to the target travel path clustering, this vehicle completes parking storage action.
Optionally, the status information according to this vehicle determines the first action control parameter, including:
Will be after status information described in action network inputs, the parameter of the action network output is determined as described first and moves
Make control parameter, the action network is the multiple-input and multiple-output nonlinear neural network comprising hidden layer;
It is described that the first action control parameter is assessed, determine the second action control parameter, including:
The first action control parameter is assessed using evaluation network, determines the second of the action network output
Action control parameter, the evaluation network is the multiple-input and multiple-output nonlinear neural network comprising hidden layer.
Optionally, it is described to judge whether the first position state is preset state, including:
Judge whether this vehicle collides with barrier;
Ruo Benche collides with the barrier, and it is preset state to determine the first position state;
Ruo Benche does not collide with the barrier, and it is not preset state to determine the first position state.
Optionally, the parking environment is divided into the grid of at least two area equations, and each grid corresponds to one
Location status, before this vehicle completion parking storage action according to the target travel path clustering, the method is also wrapped
It includes:
Ruo Benche does not collide with the barrier, then detects whether this vehicle reaches the target parking stall;
Ruo Benche reaches the target parking stall, then position when this vehicle being in the first position state in parking environment
Confidence ceases and location information of the target parking stall in the parking environment is as the defeated of the adaptive dynamic programming algorithm
Enter parameter, obtain the output parameter of the adaptive dynamic programming algorithm, and by the output of the adaptive dynamic programming algorithm
The corresponding path of parameter is as target travel path;
Ruo Benche does not reach the target parking stall, then whether the mobile step number for detecting this vehicle is more than default step number, the shifting
The grid number that dynamic step number is once passed through for the movement of this vehicle;
The mobile step number of Ruo Benche is more than the default step number, then stops when this vehicle being in the first position state
The location information of location information and the target parking stall in the parking environment in environment is advised as the adaptive dynamic
The input parameter of cost-effective method, obtains the output parameter of the adaptive dynamic programming algorithm, and by the adaptive Dynamic Programming
The corresponding path of output parameter of algorithm is as target travel path;
The mobile step number of Ruo Benche is not more than the default step number, then determines third action control according to the current state of this vehicle
Parameter processed, at the beginning of the third action control parameter includes throttle or the second starting force angle value of brake, the second of direction disc spins
Beginning angle.
Optionally, location information when this vehicle is in the first position state in parking environment and the mesh
Location information input parameter as adaptive dynamic programming algorithm of the parking stall in the parking environment is marked, is obtained described adaptive
The output parameter of dynamic programming algorithm is answered, including:
Location information and target parking stall when this vehicle is in the first position state in parking environment is in institute
Input parameter of the location information as the action network in parking environment is stated, obtains the first output ginseng of the action network
Number;
According to the enhancing signal group that enhancing learning algorithm determines, using the evaluation network to the first of the action network
Output parameter is assessed, and obtains the second output parameter of the action network, and the enhancing signal group includes this vehicle each
The corresponding enhancing signal of location status;
The corresponding path of output parameter using the adaptive dynamic programming algorithm is as target travel path, packet
It includes:
Using the corresponding path of the second output parameter of the action network as the target travel path.
Second aspect, provides a kind of parking loading device, and described device includes:
First determination unit, for determining the first action control parameter, the status information according to the status information of this vehicle
Including this vehicle the location information of location information, target parking stall in the parking environment in parking environment, described first is dynamic
Make the first initial angle that control parameter includes the first starting force angle value of throttle or brake, direction disc spins;
Second determination unit for assessing the first action control parameter, determines the second action control parameter,
The second action control parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake;
Adjustment unit, for the location status according to described this vehicle of second action control parameter adjustment, this vehicle after adjustment
Location status be first position state;
Judging unit, for judging whether the first position state is preset state;
First processing units, for when the first position state is preset state, this vehicle to be in described first
The location information of location information and the target parking stall in parking environment when putting state in parking environment is used as certainly
The input parameter of dynamic programming algorithm is adapted to, obtains the output parameter of the adaptive dynamic programming algorithm;
Second processing unit, for using the corresponding path of output parameter of the adaptive dynamic programming algorithm as target
Driving path;
Control unit completes parking storage action for this vehicle according to the target travel path clustering.
Optionally, first determination unit, including:
First determining module, for will be to described in action network inputs after status information, the ginseng of the action network output
Number is determined as the first action control parameter, and the action network is the multiple-input and multiple-output non-linear neural comprising hidden layer
Network;
Second determination unit, including:
Second determining module for being assessed using evaluation network the first action control parameter, is determined described
The second action control parameter of network output is acted, the evaluation network is the non-linear god of the multiple-input and multiple-output comprising hidden layer
Through network.
Optionally, the judging unit, including:
Judgment module, for judging whether this vehicle collides with barrier;
Third determining module, for when Ben Che and the barrier collide, determining that the first position state is
Preset state;
4th determining module, for when Ben Che and the barrier do not collide, determining the first position state
It is not preset state.
Optionally, the parking environment is divided into the grid of at least two area equations, and each grid corresponds to one
Location status, described device further include:
First detection unit, for when Ben Che and the barrier do not collide, detect this vehicle whether reach it is described
Target parking stall;
Third processing unit, for when this vehicle reaches the target parking stall, this vehicle to be in the first position state
When the location information of location information and the target parking stall in the parking environment in parking environment as described adaptive
The input parameter of dynamic programming algorithm is answered, obtains the output parameter of the adaptive dynamic programming algorithm, and will be described adaptive
The corresponding path of output parameter of dynamic programming algorithm is as target travel path;
Second detection unit, for when this vehicle does not reach the target parking stall, whether the mobile step number for detecting this vehicle to be big
In the grid number that default step number, the mobile step number are once passed through for the movement of this vehicle;
Fourth processing unit, for when the mobile step number of this vehicle is more than the default step number, this vehicle to be in described the
The location information of location information and the target parking stall in parking environment during one location status in parking environment is made
For the input parameter of the adaptive dynamic programming algorithm, the output parameter of the adaptive dynamic programming algorithm is obtained, and will
The corresponding path of output parameter of the adaptive dynamic programming algorithm is as target travel path;
Third determination unit, for the mobile step number of this vehicle be not more than the default step number when, according to the current of this vehicle
State determines third action control parameter, the third action control parameter include throttle or brake the second starting force angle value,
Second initial angle of direction disc spins.
Optionally, the first processing units, including:
First processing module, location information during for this vehicle to be in the first position state in parking environment and
Input parameter of location information of the target parking stall in the parking environment as the action network, obtains the action
First output parameter of network;
Second processing module, for the enhancing signal group determined according to enhancing learning algorithm, using the evaluation network pair
First output parameter of the action network is assessed, and obtains the second output parameter of the action network, the enhancing letter
Number group includes this vehicle in the corresponding enhancing signal of each location status;
The second processing unit, including:
Third processing module, for using the corresponding path of the second output parameter of the action network as the target line
Sail path.
The present invention provides a kind of parking storage method and devices, the first action control parameter can be assessed, really
Fixed second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status is pre-
If state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage action,
Compared to the prior art, the independent learning ability of vehicle is improved, this improves stability during vehicle parking storage, can
By property and flexibility.
It should be understood that above general description and following detailed description are only exemplary and explanatory, not
It can the limitation present invention.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
An embodiment of the present invention provides a kind of parking storage method, as shown in Figure 1, the parking storage method can include:
Step 101 determines the first action control parameter according to the status information of this vehicle, which is stopping including this vehicle
The location information of location information, target parking stall in parking environment in vehicle environment, the first action control parameter include throttle
Or brake the first starting force angle value, direction disc spins the first initial angle.
Step 102 assesses the first action control parameter, determines the second action control parameter, the second action control
Parameter processed includes reference load angle value, the reference angle of direction disc spins of throttle or brake.
Step 103, the location status according to second action control parameter adjustment this vehicle, the location status of this vehicle after adjustment
For first position state.
Step 104 judges whether first position state is preset state.
If step 105, first position state are preset state, in parking environment when this vehicle is in first position state
In input parameter of the location information as adaptive dynamic programming algorithm in parking environment of location information and target parking stall,
Obtain the output parameter of adaptive dynamic programming algorithm.
Step 106, using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target travel path.
Step 107, according to target travel path clustering, this vehicle completes parking storage action.
In conclusion parking storage method provided in an embodiment of the present invention, can comment the first action control parameter
Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status
For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage
Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage
Property, reliability and flexibility.
Optionally, step 101 includes:Will be after action network inputs status information, the parameter of action network output determines
For the first action control parameter, which is the multiple-input and multiple-output nonlinear neural network comprising hidden layer.
Step 102 includes:The first action control parameter is assessed using evaluation network, determines action network output
Second action control parameter, the evaluation network are the multiple-input and multiple-output nonlinear neural network comprising hidden layer.
Step 104 includes:Judge whether this vehicle collides with barrier;Ruo Benche collides with barrier, determines
First position state is preset state;Ruo Benche does not collide with barrier, and it is not preset state to determine first position state.
Parking environment is divided into the grid of at least two area equations, and each grid corresponds to a location status, in step
Before 107, which can also include:
Ruo Benche does not collide with barrier, then detects whether this vehicle reaches target parking stall;
Ruo Benche reaches target parking stall, then location information when this vehicle be in first position state in parking environment with
Input parameter of location information of the target parking stall in parking environment as adaptive dynamic programming algorithm, obtains adaptive dynamic
The output parameter of planning algorithm, and using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target travel road
Diameter;
Ruo Benche does not reach target parking stall, then whether the mobile step number for detecting this vehicle is more than default step number, the movement step number
The grid number once passed through for the movement of this vehicle;
The mobile step number of Ruo Benche is more than default step number, then when this vehicle being in first position state in parking environment
Input parameter of the location information of location information and target parking stall in parking environment as adaptive dynamic programming algorithm, obtains
The output parameter of adaptive dynamic programming algorithm, and using the corresponding path of the output parameter of adaptive dynamic programming algorithm as mesh
Mark driving path;
The mobile step number of Ruo Benche is not more than default step number, then determines that third action control is joined according to the current state of this vehicle
Number, the third action control parameter include the second initial angle of the second starting force angle value of throttle or brake, direction disc spins.
Optionally, step 105 includes:Location information and mesh when this vehicle is in first position state in parking environment
Input parameter of location information of the parking stall in parking environment as action network is marked, obtains the first output ginseng of action network
Number;According to the enhancing signal group that enhancing learning algorithm determines, the first output parameter for acting network is carried out using evaluation network
Assessment, obtains the second output parameter of action network, and enhancing signal group includes this vehicle in the corresponding enhancing letter of each location status
Number.
Correspondingly, step 106 includes:The corresponding path of the second output parameter of network will be acted as target travel road
Diameter.
In conclusion parking storage method provided in an embodiment of the present invention, can comment the first action control parameter
Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status
For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage
Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage
Property, reliability and flexibility.
An embodiment of the present invention provides a kind of parking storage methods, and as shown in Fig. 2-1, which can wrap
It includes:
Step 201 determines the first action control parameter according to the status information of this vehicle.Perform step 202.
The status information of this vehicle includes this vehicle the position of location information, target parking stall in parking environment in parking environment
Confidence ceases, and the first action control parameter includes the first starting force angle value, the first initial angle of direction disc spins of throttle or brake
Degree.
Parking storage method provided in an embodiment of the present invention determines target travel path using adaptive dynamic programming algorithm.
The process of parking storage method based on adaptive dynamic programming algorithm is the process of a study, and vehicle (i.e. this vehicle) is in success
With learning how that completing parking storage using shortest driving path acts in the experience of failure.Since vehicle is in learning process
The phenomenon that in the presence of failure, therefore can first be tested on computers, treat vehicle (i.e. virtual vehicle) completion learning process it
Afterwards, then by the correlation in learning algorithm it is transplanted in actual vehicle.Computer can be first set when being tested at the beginning of two
Beginning parameter, such as maximum test number (TN) MaxTrail=1000, the i.e. default step number MaxStep=7 of the mobile step number of maximum.Parking environment
The grid of at least two area equations can be divided into, each grid corresponds to a location status, and moves step number and refer to vehicle
The mobile grid number once passed through.Fig. 2-2 shows the corresponding system construction drawing of adaptive dynamic programming algorithm, as Fig. 2-
Shown in 2, which is made of action network and evaluation two neural networks of network, and action network and evaluation network are comprising hidden
The multiple-input and multiple-output nonlinear neural network of layer is hidden, two Net works use the forward direction of Nonlinear Multi perceptron structure
Transport net.Use the system allow vehicle carry out autonomous learning process for:Act current state of the network according to vehicle
It measures X (t) and generates decision action U (t), wherein, X (t) is stopping including position of the vehicle in parking environment and target parking stall
Position in vehicle environment, the corresponding set control parameters of decision action U (t), the action control parameter include throttle or brake
The first starting force angle value, direction disc spins the first initial angle.Decision action U (t) can change the current position shape of vehicle
State so that vehicle is converted to a new location status from current position state, accordingly obtains a new quantity of state X (t+1).
At the same time, parking environment can feed back to one enhancing signal r (t) of vehicle, which fights to the finish for expression and instigate
Make the return immediately of U (t).The reward or punishment that the embodiment of the present invention is subject to by enhancing signal expression vehicle.In general, enhancing letter
It number is represented with numerical value, the size of numerical value represents " good " and " bad " of decision action.Enhancing signal can be input into evaluation network,
Evaluation network is made to export cost function J (t), the decision for then making evaluation network and being generated to action network acts corresponding action
Control parameter is assessed in real time.After two neural networks generate output, system will carry out feedback regulation to two outputs,
In, evaluating the feedback regulation strategy of network is:Go approximate conversion enhancing signal infinite cumulative using the value of cost function J (t) and.
The feedback regulation strategy of action network is by comparing utility function Uc(t) desired value and the size of cost function J (t), obtain
Action error according to the action error, is adjusted, and then make action the weights of two neural networks using gradient descent method
The action control parameter of network output tends to be optimal, achievees the purpose that decision action is made to tend to be optimal.Wherein, cost function J (t)
For representing that vehicle acts the cost paid when driving, utility function U according to the decision that action network exportsc(t) for representing
Relationship between the input parameter of adaptive dynamic programming algorithm system and decision action U (t).X (t) and X (t+ in Fig. 2-2
1) it is the input quantity of system, R (t) is the cumulative and U for enhancing signalc(t) it is utility function, α is commutation factor, for representing
Latter state is to the influence degree of previous state, the cost function of J (t-1) expression previous states.
Equally, for new quantity of state X (t+1), vehicle can also make new decision action U (t+1), and from parking environment
One new enhancing signal r (t+1) of middle acquisition.And so on, i.e., vehicle can interact at each moment with parking environment, according to
The enhancing signal value of parking environment feedback, on-line control action policy, to obtain maximum return in being acted in follow-up decision.
With reference to figure 2-2, step 201 specifically includes:Will be after action network inputs status information, the ginseng of action network output
Number is determined as the first action control parameter.
The state of vehicle is randomly selected, and using the state as the original state of vehicle.When vehicle is in original state,
Test number (TN) trail=0.Each time during on-test, mobile step number step=0.The state of vehicle refers to that vehicle stops
The position of position and target parking stall in parking environment in environment.The embodiment of the present invention, such as can be with by parking environment discretization
Parking environment is divided into multiple grids, each grid corresponds to a location status, and parking environment as Figure 2-3 is divided
For 11*9 grid, in general, vehicle is travelled from current location to target parking stall, need by multiple grids.231 in Fig. 2-3
Represent other vehicles, 232 also illustrate that other vehicles, and 233 represent this vehicle, and 234 represent target parking stall.It should be added that
In practical application, the divided grid number of parking environment is much more more than the grid number in Fig. 2-3, the embodiment of the present invention to this not
It is construed as limiting.
It is exemplary, radar and video camera on vehicle can be installed, detect the state of vehicle in real time by radar and video camera,
The status information of vehicle is obtained, such as by the boundary of detections of radar target parking stall and other vehicles, by imaging machine testing parking stall
Line.Wherein, other vehicles are obstacle vehicle, and the boundary of obstacle vehicle and target parking stall is barrier.When vehicle is in original state,
Action network will randomly generate a decision action, and then obtain the first action control parameter, be joined according to first action control
Number can make vehicle reach a new location status.
Step 202 assesses the first action control parameter, determines the second action control parameter.Perform step 203.
Second action control parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake.Work as vehicle
Reach a new location status when, vehicle can be given by parking environment one enhancing signal, evaluation network according to parking ring
The enhancing signal that border is given generates cost function, and acting corresponding action control parameter to the decision that action network generates carries out
Assessment in real time, obtains the second action control parameter.
With reference to figure 2-2, step 202 specifically includes:The first action control parameter is assessed using evaluation network, is determined
Act the second action control parameter of network output.
Step 203, the location status according to second action control parameter adjustment this vehicle, the location status of this vehicle after adjustment
For first position state.Perform step 204.
According to the location status of second action control parameter adjustment this vehicle, the location status of this vehicle after adjustment is first
State is put, at this point, mobile step number step=step '+1, step ' represent that vehicle is in corresponding mobile step during a upper location status
Number.
Step 204 judges whether this vehicle collides with barrier.Ruo Benche collides with barrier, performs step
205;Ruo Benche does not collide with barrier, performs step 208.
Judge whether vehicle collides with barrier, specifically, may determine that whether vehicle collides with obstacle vehicle,
Or whether vehicle is exercised to the boundary of target parking stall.Ruo Benche collides with barrier, then is learned according to step 205 according to enhancing
Algorithm update enhancing signal value is practised, and is tested next time, randomly selects the original state of vehicle again, it is final to obtain enhancing
The output parameter of learning algorithm, so that it is determined that the target travel path of vehicle;Ruo Benche does not collide with barrier, then detects
Whether this vehicle reaches target parking stall.
Step 205, location information when this vehicle is in first position state in parking environment and target parking stall are being stopped
Input parameter of the location information as adaptive dynamic programming algorithm in vehicle environment, obtains the defeated of adaptive dynamic programming algorithm
Go out parameter.Perform step 206.
Specifically, as in Figure 2-4, step 205 can include:
Step 2051, location information when this vehicle is in first position state in parking environment and target parking stall are being stopped
Input parameter of the location information as action network in vehicle environment, obtains the first output parameter of action network.
Step 2052, according to the enhancing signal group that determines of enhancing learning algorithm, using evaluation network to the of action network
One output parameter is assessed, and obtains the second output parameter of action network.
In the embodiment of the present invention, the setting rule of enhancing signal of the vehicle in original state can be:Vehicle reaches mesh
It marks parking stall and obtains enhancing signal r=+1, vehicle collides with barrier obtains enhancing signal r=-0.2, other states are got off
Obtain enhancing signal r=0.Fig. 2-5 shows the setting rule schematic diagram of corresponding enhancing signal.In Fig. 2-5,281 represent
This vehicle, 282 represent other vehicles, and 283 represent target parking stalls, the digital representation vehicle beside arrow according to a certain decision act from
One location status is transferred to the enhancing signal of another location state.
The enhancing signal group includes this vehicle in the corresponding enhancing signal of each location status.Wherein, (Q- is learnt according to enhancing
Learning) algorithm determines that the process of enhancing signal group can be with reference chart 2-6 to Fig. 2-8,9 grids in Fig. 2-6 to Fig. 2-8
Represent that vehicle is likely located at 9 location status in parking environment, this 9 location status is respectively S1 to S9.The embodiment of the present invention
Assuming that S3 is target parking stall.As shown in figures 2-6, arrow is used to indicate after a certain decision of vehicle selection acts from a location status
It is transferred to another location state, the digital representation beside arrow presets enhancing signal.It is exemplary, it is advised according to the setting of enhancing signal
Then, it is 0 that vehicle is transferred to the enhancing signal r12 (not identified in Fig. 2-5) of location status S2 from location status S1, and vehicle is from position
The enhancing signal r23 (not identified in Fig. 2-5) that state S2 is transferred to location status S3 is 1.
Using enhancing more new formula, the enhancing signal in Fig. 2-6 is updated, obtains the enhancing signal in Fig. 2-7, it should
Enhancing more new formula is:
Wherein, r is the enhancing signal in Fig. 2-5, represents that vehicle is transferred to next location status from current position state
Enhancing signal, α is commutation factor, exemplary, α can be 0.8.X represents current position state, and x ' represents next position shape
State, u ' represent that the corresponding decision of the next position state acts, and maxQ (x ', u ') represent vehicle in the next position condition selecting decision
The maximum enhancing signal generated during action,For the enhancing signal in Fig. 2-7, represent that vehicle is selected in current position state
Select the enhancing signal that a certain decision acts corresponding action control gain of parameter.Assuming that the enhancing signal r12 in Fig. 2-6 is carried out
Update can then be obtained using enhancing more new formula:
It is 0.8 so as to obtain the updated enhancing signals of r12.As illustrated in figs. 2-7, it is a certain to be used to indicate vehicle selection for arrow
Another location state is transferred to from a location status, the number beside arrow is Q values, which represents vehicle after decision action
The largest cumulative for a decision being selected to act corresponding action control gain of parameter from a location status enhances signal.
Then on the basis of Fig. 2-7, according to maximum value formula, determine vehicle in the corresponding enhancing letter of each location status
Number, obtain enhancing signal group.The maximum value formula is:
Wherein, x represents current position state, and u represents the corresponding decision action of current position state, V*(x) vehicle is represented
In the corresponding enhancing signal of current position state.Fig. 2-8 shows the schematic diagram of the enhancing signal group, and in Fig. 2-8, arrow is used for
Instruction vehicle selects to be transferred to another location state from a location status after the action of a certain decision, and the number beside arrow is V
Value represents vehicle in the corresponding enhancing signal of each location status, which also illustrates that vehicle obtained under a location status
Largest cumulative enhances signal.
After enhancing signal group is determined by the above process, the evaluation network in adaptive Dynamic Programming system may be used
The first output parameter for acting network is assessed, obtains the second output parameter of action network.Second output parameter is
For action control parameter, evaluation process can refer to the related description in step 201.Increasing based on adaptive dynamic programming algorithm
Strong signal group is to be fitted the enhancing signal group based on enhancing learning algorithm by evaluating the cost function of network generation.
Step 206, using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target travel path.It performs
Step 207.
Step 206 specifically includes:The corresponding path of the second output parameter of network will be acted as target travel path.The
Two output parameters are the parameter that step 2052 obtains.
Fig. 2-9 shows the schematic diagram in the corresponding path of the second output parameter of action network.Assuming that original state vehicle
Location status S7 from Fig. 2-8, by Fig. 2-9 it is found that vehicle arrival target parking stall ascends the throne and puts the most short traveling of state S3
Path, that is, target travel path can have a plurality of, which can be S7-S4-S1-S2-S3, or S7-
S8-S9-S6-S3 can also be S7-S4-S5-S6-S3 etc..Since there are many quantity of the corresponding grid of parking environment, so vehicle
Traveling route be similar to smooth curve, travel route as Figure 2-3.
Step 207, according to target travel path clustering, this vehicle completes parking storage action.
Obtain target travel path, you can this vehicle completes parking storage action according to target travel path clustering.
Step 208 detects whether this vehicle reaches target parking stall.Ruo Benche reaches target parking stall, performs step 205, if this
Vehicle does not reach target parking stall, performs step 209.
Ruo Benche reaches target parking stall, then updates enhancing signal value according to adaptive dynamic programming algorithm, and carry out next
Secondary experiment randomly selects the original state of vehicle again.Ruo Benche does not reach target parking stall, then detects the mobile step number of this vehicle and be
It is no to be more than default step number.
Whether step 209 detects the mobile step number of this vehicle more than default step number.The mobile step number of Ruo Benche is more than default step
Number, performs step 205, and the mobile step number of Ruo Benche is not more than default step number, performs step 201.
The mobile step number of Ruo Benche is more than default step number, then updates enhancing signal value according to adaptive dynamic programming algorithm,
And tested next time, the original state of vehicle is randomly selected again.The mobile step number of Ruo Benche is not more than default step number, then
Step 201 is performed, the first action control parameter is determined according to the status information of vehicle.
Step 205 update enhancing signal value is repeated, until test number (TN) is more than maximum test number (TN) MaxTrail=
1000.Then the newer enhancing signal value, that is, learning value of last time is transplanted in actual vehicle so that actual vehicle is existing
Parking storage process is completed according to the learning value in real parking environment.
It should be noted that if default enhancing signal sets bigger, the increasing obtained when reaching target parking stall such as vehicle
Strong signal r=100, then after entire off-test, for the ease of calculating, can will enhance the output parameter of learning algorithm into
Row normalized, that is, the newer value unit for enhancing signal value of last time is made to be [0,1], after then normalizing
Value of the result as final parking storage, and be transplanted in actual vehicle so that actual vehicle is in real parking environment
In can according to the learning value complete parking storage process.
Parking storage method provided in an embodiment of the present invention solves asking for the autonomous parking toll algorithm design of intelligent vehicle
Topic, using adaptive dynamic programming algorithm, makes intelligent vehicle independently be interacted with parking environment, obtains corresponding enhancing signal,
And pass through enhance signal value autonomous learning parking storage experience and store parking storage experience, finally so that intelligent vehicle it is wide in parking stall
Degree, the position of obstacle vehicle and vehicle initial position it is unfixed in the case of, specific preferably stability and adaptivity.This stops
Vehicle storage method can make intelligent vehicle autonomous learning, realize the optimal parking strategy under different parking environments, intelligent vehicle is made to exist
Driving path during parking storage is minimum, so as to which intelligent vehicle be made independently to stop with better stability, adaptivity, mobility
And flexibility.
In conclusion parking storage method provided in an embodiment of the present invention, can comment the first action control parameter
Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status
For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage
Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage
Property, reliability, mobility and flexibility.
An embodiment of the present invention provides a kind of parking loading devices, and as shown in figure 3-1, which can wrap
It includes:
First determination unit 301, for determining the first action control parameter, status information packet according to the status information of this vehicle
Include location information, target parking stall location information in parking environment of this vehicle in parking environment, first action control ginseng
Number includes the first initial angle of the first starting force angle value of throttle or brake, direction disc spins.
Second determination unit 302 for assessing the first action control parameter, determines the second action control parameter,
The second action control parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake.
Adjustment unit 303, for the location status according to second action control parameter adjustment this vehicle, this vehicle after adjustment
Location status is first position state.
Judging unit 304, for judging whether first position state is preset state.
First processing units 305, for when first position state is preset state, this vehicle to be in first position state
When the location information of location information and target parking stall in parking environment in parking environment calculated as adaptive Dynamic Programming
The input parameter of method obtains the output parameter of adaptive dynamic programming algorithm.
Second processing unit 306, for using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target
Driving path.
Control unit 307 completes parking storage action for this vehicle according to target travel path clustering.
In conclusion parking loading device provided in an embodiment of the present invention, can comment the first action control parameter
Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status
For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage
Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage
Property, reliability and flexibility.
Optionally, as shown in figure 3-2, the first determination unit 301, including:
First determining module 3011, for the parameter for after action network inputs status information, acting network output is true
It is set to the first action control parameter, which is the multiple-input and multiple-output nonlinear neural network comprising hidden layer.
As shown in Fig. 3-3, the second determination unit 302, including:
Second determining module 3021 for being assessed using evaluation network the first action control parameter, determines action
Second action control parameter of network output, the evaluation network are the multiple-input and multiple-output non-linear neural net comprising hidden layer
Network.
As shown in Figure 3-4, judging unit 304, including:
Judgment module 3041, for judging whether this vehicle collides with barrier.
Third determining module 3042, for when Ben Che and barrier collide, it to be default to determine first position state
State.
4th determining module 3043, for when Ben Che and barrier do not collide, determining that first position state is not
Preset state.
As in Figure 3-5, which can also include:
First detection unit 308, for when Ben Che and barrier do not collide, detecting whether this vehicle reaches target carriage
Position.
Third processing unit 309, for when this vehicle reaches target parking stall, stopping when this vehicle is in first position state
The location information of location information and target parking stall in parking environment in vehicle environment is as the defeated of adaptive dynamic programming algorithm
Enter parameter, obtain the output parameter of adaptive dynamic programming algorithm, and the output parameter of adaptive dynamic programming algorithm is corresponded to
Path as target travel path.
Second detection unit 310, for when this vehicle does not reach target parking stall, whether the mobile step number for detecting this vehicle to be more than
Default step number, the grid number that mobile step number is once passed through for the movement of this vehicle.
Fourth processing unit 311, for when the mobile step number of this vehicle is more than default step number, this vehicle to be in first position
The location information of location information and target parking stall in parking environment during state in parking environment is advised as adaptive dynamic
The input parameter of cost-effective method, obtains the output parameter of adaptive dynamic programming algorithm, and by the defeated of adaptive dynamic programming algorithm
Go out the corresponding path of parameter as target travel path.
Third determination unit 312, for when the mobile step number of this vehicle is not more than default step number, according to the current shape of this vehicle
State determines third action control parameter, which includes the second starting force angle value, the direction of throttle or brake
Second initial angle of disc spins.
Optionally, as seen in figures 3-6, first processing units 305, including:
First processing module 3051, location information during for this vehicle to be in first position state in parking environment and
Input parameter of location information of the target parking stall in parking environment as action network obtains the first output ginseng of action network
Number.
Second processing module 3052, for the enhancing signal group determined according to enhancing learning algorithm, using evaluation network pair
First output parameter of action network is assessed, and obtains the second output parameter of action network, and enhancing signal group includes this vehicle
In the corresponding enhancing signal of each location status.
Correspondingly, as shown in fig. 3 to 7, second processing unit 306, including:
Third processing module 3061, for the corresponding path of the second output parameter of network will to be acted as target travel road
Diameter.
In conclusion parking loading device provided in an embodiment of the present invention, can comment the first action control parameter
Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status
For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage
Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage
Property, reliability, mobility and flexibility.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description,
The specific work process of unit and module can refer to the corresponding process in preceding method embodiment, and details are not described herein.
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.