Summary of the invention
In order to solve existing parking storage method less stable, reliability is lower, and the problem that alerting ability is poor, the invention provides a kind of parking storage method and device.Described technical scheme is as follows:
First aspect, provide a kind of parking storage method, described method comprises:
The first action control parameter is determined according to the status information of this car, described status information comprises the location information of this car in parking environment, the location information of target parking stall in described parking environment, and described first action control parameter comprises the first starting force angle value of throttle or brake, the first initial angle of bearing circle rotation;
Assess described first action control parameter, determine the second action control parameter, described second action control parameter comprises the reference load angle value of throttle or brake, the reference angle of bearing circle rotation;
According to the location status of described second this car of action control parameter adjustment, the location status of this car after adjustment is primary importance state;
Judge whether described primary importance state is preset state;
If described primary importance state is preset state, location information and the described target parking stall location information in described parking environment when then this car being in described primary importance state in parking environment, as the input parameter of self adaptation dynamic programming algorithm, obtains the output parameter of described self adaptation dynamic programming algorithm;
Using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path;
Travel this car of path clustering according to described target and complete warehouse-in action of stopping.
Optionally, the described status information according to this car determines the first action control parameter, comprising:
After inputting described status information to action network, the parameter that described action network exports is defined as described first action control parameter, and described action network is the multiinput-multioutput nonlinear neural network comprising hidden layer;
Described described first action control parameter to be assessed, determines the second action control parameter, comprising:
Adopt evaluating network to assess described first action control parameter, determine the second action control parameter that described action network exports, described evaluating network is the multiinput-multioutput nonlinear neural network comprising hidden layer.
Optionally, describedly judge whether described primary importance state is preset state, comprising:
Judge whether this car and obstacle collide;
Ruo Benche and described obstacle collide, and determine that described primary importance state is preset state;
Ruo Benche and described obstacle do not collide, and determine that described primary importance state is not preset state.
Optionally, described parking environment is divided into the grid of at least two area equation, the corresponding location status of each described grid, and to travel before this car of path clustering completes warehouse-in action of stopping according to described target described, described method also comprises:
Ruo Benche and described obstacle do not collide, then detect this car and whether arrive described target parking stall;
Ruo Benche arrives described target parking stall, location information and the described target parking stall location information in described parking environment when then this car being in described primary importance state in parking environment is as the input parameter of described self adaptation dynamic programming algorithm, obtain the output parameter of described self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path;
Ruo Benche does not arrive described target parking stall, then whether the mobile step number detecting this car is greater than default step number, described mobile step number by this car move once the grid number of process;
The mobile step number of Ruo Benche is greater than described default step number, location information and the described target parking stall location information in described parking environment when then this car being in described primary importance state in parking environment is as the input parameter of described self adaptation dynamic programming algorithm, obtain the output parameter of described self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path;
The mobile step number of Ruo Benche is not more than described default step number, then determine the 3rd action control parameter according to the current state of this car, and described 3rd action control parameter comprises the second starting force angle value of throttle or brake, the second initial angle of bearing circle rotation.
Optionally, described location information and the described target parking stall location information in described parking environment when this car being in described primary importance state in parking environment is as the input parameter of self adaptation dynamic programming algorithm, obtain the output parameter of described self adaptation dynamic programming algorithm, comprising:
Location information and the described target parking stall location information in described parking environment when this car being in described primary importance state in parking environment, as the input parameter of described action network, obtains the first output parameter of described action network;
According to the enhancing signal group that enhancing learning algorithm is determined, first output parameter of described evaluating network to described action network is adopted to assess, obtain the second output parameter of described action network, described enhancing signal group comprises this car at enhancing signal corresponding to each location status;
Described using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path, comprising:
Using path corresponding for the second output parameter of described action network as described target driving path.
Second aspect, provide a kind of parking loading device, described device comprises:
First determining unit, for determining the first action control parameter according to the status information of this car, described status information comprises the location information of this car in parking environment, the location information of target parking stall in described parking environment, and described first action control parameter comprises the first starting force angle value of throttle or brake, the first initial angle of bearing circle rotation;
Second determining unit, for assessing described first action control parameter, determines the second action control parameter, and described second action control parameter comprises the reference load angle value of throttle or brake, the reference angle of bearing circle rotation;
Adjustment unit, for the location status according to described second this car of action control parameter adjustment, the location status of this car after adjustment is primary importance state;
Judging unit, for judging whether described primary importance state is preset state;
First processing unit, for when described primary importance state is preset state, location information and the described target parking stall location information in described parking environment when this car being in described primary importance state in parking environment, as the input parameter of self adaptation dynamic programming algorithm, obtains the output parameter of described self adaptation dynamic programming algorithm;
Second processing unit, for using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path;
Control unit, completes for travelling this car of path clustering according to described target warehouse-in action of stopping.
Optionally, described first determining unit, comprising:
First determination module, after inputting described status information to action network, the parameter that described action network exports will be defined as described first action control parameter, and described action network is the multiinput-multioutput nonlinear neural network comprising hidden layer;
Described second determining unit, comprising:
Second determination module, for adopting evaluating network to assess described first action control parameter, determine the second action control parameter that described action network exports, described evaluating network is the multiinput-multioutput nonlinear neural network comprising hidden layer.
Optionally, described judging unit, comprising:
Judge module, for judging whether this car and obstacle collide;
3rd determination module, for when Ben Che and described obstacle collide, determines that described primary importance state is preset state;
4th determination module, for when Ben Che and described obstacle do not collide, determines that described primary importance state is not preset state.
Optionally, described parking environment is divided into the grid of at least two area equation, the corresponding location status of each described grid, and described device also comprises:
First detecting unit, for when Ben Che and described obstacle do not collide, detects this car and whether arrives described target parking stall;
3rd processing unit, during for arriving described target parking stall at this car, location information and the described target parking stall location information in described parking environment when this car being in described primary importance state in parking environment is as the input parameter of described self adaptation dynamic programming algorithm, obtain the output parameter of described self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path;
Second detecting unit, during for not arriving described target parking stall at this car, whether the mobile step number detecting this car is greater than default step number, described mobile step number by this car move once the grid number of process;
Fourth processing unit, for when the mobile step number of this car is greater than described default step number, location information and the described target parking stall location information in described parking environment when this car being in described primary importance state in parking environment is as the input parameter of described self adaptation dynamic programming algorithm, obtain the output parameter of described self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of described self adaptation dynamic programming algorithm as target driving path;
3rd determining unit, for when the mobile step number of this car is not more than described default step number, determine the 3rd action control parameter according to the current state of this car, described 3rd action control parameter comprises the second starting force angle value of throttle or brake, the second initial angle of bearing circle rotation.
Optionally, described first processing unit, comprising:
First processing module, location information and the described target parking stall location information in described parking environment during for this car being in described primary importance state in parking environment, as the input parameter of described action network, obtains the first output parameter of described action network;
Second processing module, for the enhancing signal group determined according to enhancing learning algorithm, first output parameter of described evaluating network to described action network is adopted to assess, obtain the second output parameter of described action network, described enhancing signal group comprises this car at enhancing signal corresponding to each location status;
Described second processing unit, comprising:
3rd processing module, for using path corresponding for the second output parameter of described action network as described target driving path.
The invention provides a kind of parking storage method and device, can assess the first action control parameter, determine the second action control parameter, again according to the location status of second this car of action control parameter adjustment, if this location status is preset state, then according to self adaptation dynamic programming algorithm determination target driving path, thus control this car complete stop warehouse-in action, compared to prior art, improve the independent learning ability of vehicle, this improves stability, reliability and alerting ability during vehicle parking warehouse-in.
Should be understood that, it is only exemplary and explanatory that above general description and details hereinafter describe, and can not limit the present invention.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiments provide a kind of parking storage method, as shown in Figure 1, this parking storage method can comprise:
Step 101, determine the first action control parameter according to the status information of this car, this status information comprises the location information of this car in parking environment, the location information of target parking stall in parking environment, and this first action control parameter comprises the first starting force angle value of throttle or brake, the first initial angle of bearing circle rotation.
Step 102, the first action control parameter to be assessed, determines the second action control parameter, this second action control parameter comprises the reference load angle value of throttle or brake, reference angle that bearing circle rotates.
Step 103, location status according to second this car of action control parameter adjustment, the location status of this car after adjustment is primary importance state.
Step 104, judge whether primary importance state is preset state.
If step 105 primary importance state is preset state, location information and the target parking stall location information in parking environment when then this car being in primary importance state in parking environment, as the input parameter of self adaptation dynamic programming algorithm, obtains the output parameter of self adaptation dynamic programming algorithm.
Step 106, using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path.
Step 107, travel path clustering this car according to target and complete warehouse-in action of stopping.
In sum, the parking storage method that the embodiment of the present invention provides, can assess the first action control parameter, determine the second action control parameter, again according to the location status of second this car of action control parameter adjustment, if this location status is preset state, then according to self adaptation dynamic programming algorithm determination target driving path, thus control this car complete stop warehouse-in action, compared to prior art, improve the independent learning ability of vehicle, therefore, improve stability, reliability and alerting ability during vehicle parking warehouse-in.
Optionally, step 101 comprises: will after action network input state information, and the parameter that action network exports is defined as the first action control parameter, and this action network is the multiinput-multioutput nonlinear neural network comprising hidden layer.
Step 102 comprises: adopt evaluating network to assess the first action control parameter, determine the second action control parameter that action network exports, this evaluating network is the multiinput-multioutput nonlinear neural network comprising hidden layer.
Step 104 comprises: judge whether this car and obstacle collide; Ruo Benche and obstacle collide, and determine that primary importance state is preset state; Ruo Benche and obstacle do not collide, and determine that primary importance state is not preset state.
Parking environment is divided into the grid of at least two area equation, the corresponding location status of each grid, and before step 107, this parking storage method can also comprise:
Ruo Benche and obstacle do not collide, then detect this car and whether arrive target parking stall;
Ruo Benche arrives target parking stall, location information and the target parking stall location information in parking environment when then this car being in primary importance state in parking environment is as the input parameter of self adaptation dynamic programming algorithm, obtain the output parameter of self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path;
Ruo Benche does not arrive target parking stall, then whether the mobile step number detecting this car is greater than default step number, this move step number by this car move once the grid number of process;
The mobile step number of Ruo Benche is greater than default step number, location information and the target parking stall location information in parking environment when then this car being in primary importance state in parking environment is as the input parameter of self adaptation dynamic programming algorithm, obtain the output parameter of self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path;
The mobile step number of Ruo Benche is not more than default step number, then determine the 3rd action control parameter according to the current state of this car, and the 3rd action control parameter comprises the second starting force angle value of throttle or brake, the second initial angle of bearing circle rotation.
Optionally, step 105 comprises: location information and the target parking stall location information in parking environment when this car being in primary importance state in parking environment, as the input parameter of action network, obtains the first output parameter of action network; According to strengthening the enhancing signal group determined of learning algorithm, adopting first output parameter of evaluating network to action network to assess, obtaining the second output parameter of action network, strengthen signal group and comprise this car at enhancing signal corresponding to each location status.
Accordingly, step 106 comprises: using path corresponding for the second output parameter of action network as target driving path.
In sum, the parking storage method that the embodiment of the present invention provides, can assess the first action control parameter, determine the second action control parameter, again according to the location status of second this car of action control parameter adjustment, if this location status is preset state, then according to self adaptation dynamic programming algorithm determination target driving path, thus control this car complete stop warehouse-in action, compared to prior art, improve the independent learning ability of vehicle, therefore, improve stability, reliability and alerting ability during vehicle parking warehouse-in.
Embodiments provide a kind of parking storage method, as shown in Fig. 2-1, this parking storage method can comprise:
Step 201, determine the first action control parameter according to the status information of this car.Perform step 202.
The status information of this car comprises the location information of this car in parking environment, the location information of target parking stall in parking environment, and the first action control parameter comprises the first starting force angle value of throttle or brake, the first initial angle of bearing circle rotation.
The parking storage method that the embodiment of the present invention provides adopts self adaptation dynamic programming algorithm determination target driving path.Process based on the parking storage method of self adaptation dynamic programming algorithm is the process of a study, and vehicle (i.e. this car) is learned how to use the shortest driving path to complete the action of parking warehouse-in in the experience of success and failure.Because vehicle exists failed phenomenon in learning process, therefore first can test on computers, after treating that vehicle (i.e. virtual vehicle) completes learning process, then the correlation in learning algorithm is transplanted in actual vehicle.Computing machine first can arrange two initial parameters when testing, and as maximum test number MaxTrail=1000, namely maximum mobile step number presets step number MaxStep=7.Parking environment can be divided into the grid of at least two area equation, the corresponding location status of each grid, and mobile step number refer to vehicle move once the grid number of process.Fig. 2-2 shows system construction drawing corresponding to self adaptation dynamic programming algorithm, as shown in Fig. 2-2, this system is made up of action network and evaluating network two neural networks, action network and evaluating network are the multiinput-multioutput nonlinear neural network comprising hidden layer, and two Net work all adopt the forward transport net of Nonlinear Multi perceptron structure.The process adopting this system to make vehicle carry out autonomous learning can be: action network produces decision-making action U (t) according to current state amount X (t) of vehicle, wherein, X (t) comprises position and the target parking stall position in parking environment of vehicle in parking environment, the corresponding set controling parameters of decision-making action U (t), this action control parameter comprises the first starting force angle value of throttle or brake, the first initial angle of bearing circle rotation.Decision-making action U (t) can change the current location status of vehicle, makes vehicle be converted to a new location status from current position state, the quantity of state X (t+1) that corresponding acquisition one is new.Meanwhile, parking environment can feed back to one, vehicle and strengthen signal r (t), and this enhancing signal r (t) instigates to do the return immediately of U (t) for representing to fight to the finish.The embodiment of the present invention is by the award that strengthens signal indication vehicle and be subject to or punishment.Usually, strengthen signal numerical value and represent, the size of numerical value represents " good " and " bad " of decision-making action.Strengthen signal and can be imported into evaluating network, make evaluating network export cost function J (t), then make evaluating network and real-time assessment is carried out to action control parameter corresponding to decision-making action that action network produces.After two neural networks produce and export, system will carry out feedback regulation to two outputs, and wherein, the feedback regulation strategy of evaluating network is: utilize the value of cost function J (t) to go approximate conversion to strengthen the infinite cumulative sum of signal.The feedback regulation strategy of action network is by comparative utility function U
cthe size of (t) expectation value and cost function J (t), obtain actuated error, according to this actuated error, the weights of gradient descent method to two neural networks are utilized to regulate, and then the action control parameter making action network export is tending towards optimum, reaches the object making decision-making action be tending towards optimum.Wherein, the cost paid when cost function J (t) is for representing that vehicle travels according to the decision-making action that action network exports, utility function U
ct () is for representing the relation between the input parameter of self adaptation dynamic programming algorithm system and decision-making action U (t).The input that X (t) in Fig. 2-2 and X (t+1) is system, R (t) is the cumulative sum strengthening signal, U
ct () is utility function, α is commutation factor, and for representing that a rear state is to the influence degree of previous state, J (t-1) represents the cost function of previous state.
Equally, for new quantity of state X (t+1), vehicle also can be the decision-making action U (t+1) made new advances, and the enhancing signal r (t+1) that acquisition one is new from parking environment.By that analogy, namely vehicle can be mutual with parking environment in each moment, and according to the enhancing signal value of parking environment feedback, on-line control action policy, to obtain maximum return in follow-up decision action.
With reference to figure 2-2, step 201 specifically comprises: will after action network input state information, and the parameter that action network exports is defined as the first action control parameter.
The state of random selecting vehicle, and using the initial condition of this state as vehicle.When vehicle is in initial condition, test number trail=0.Each time during on-test, mobile step number step=0.The state of vehicle refers to position and the target parking stall position in parking environment of vehicle in parking environment.The embodiment of the present invention by parking environment discretization, as parking environment being divided into multiple grid, the corresponding location status of each grid, parking environment is as Figure 2-3 divided into 11*9 grid, usually, vehicle travels to target parking stall from current location, needs through multiple grid.In Fig. 2-3 231 represents other vehicles, and 232 also represent other vehicles, and 233 represent this car, and 234 represent target parking stall.It should be added that, in practical application, the grid number that parking environment is divided is much more more than the grid number in Fig. 2-3, and the embodiment of the present invention is not construed as limiting this.
Example, vehicle can install radar and pick up camera, detected the state of vehicle by radar and pick up camera in real time, obtain the status information of vehicle, as passed through border and other vehicles of detections of radar target parking stall, by pick up camera inspection vehicle bit line.Wherein, other vehicles are obstacle car, and the border of obstacle car and target parking stall is obstacle.When vehicle is in initial condition, action network will produce a decision-making action at random, and then obtain the first action control parameter, vehicle can be made to arrive a new location status according to this first action control parameter.
Step 202, the first action control parameter to be assessed, determine the second action control parameter.Perform step 203.
Second action control parameter comprises the reference load angle value of throttle or brake, the reference angle of bearing circle rotation.When vehicle arrives a new location status, vehicle can be given one by parking environment and strengthen signal, the enhancing signal that evaluating network gives according to parking environment, produce cost function, the action control parameter corresponding to the decision-making action of action network generation carries out real-time assessment, obtains the second action control parameter.
With reference to figure 2-2, step 202 specifically comprises: adopt evaluating network to assess the first action control parameter, determine the second action control parameter that action network exports.
Step 203, location status according to second this car of action control parameter adjustment, the location status of this car after adjustment is primary importance state.Perform step 204.
According to the location status of second this car of action control parameter adjustment, the location status of this car after adjustment is primary importance state, and now, mobile step number step=step '+1, step ' represents the mobile step number that vehicle is corresponding when being in a location status.
Step 204, judge whether this car and obstacle collide.Ruo Benche and obstacle collide, and perform step 205; Ruo Benche and obstacle do not collide, and perform step 208.
Judge whether vehicle and obstacle collide, concrete, can judge whether vehicle and obstacle car collide, or whether vehicle exercises the border to target parking stall.Ruo Benche and obstacle collide, then upgrade enhancing signal value according to step 205 according to enhancing learning algorithm, and test, the initial condition of random selecting vehicle again next time, the final output parameter obtaining enhancing learning algorithm, thus determine the target driving path of vehicle; Ruo Benche and obstacle do not collide, then detect this car and whether arrive target parking stall.
Step 205, location information and the target parking stall location information in parking environment when this car being in primary importance state in parking environment, as the input parameter of self adaptation dynamic programming algorithm, obtain the output parameter of self adaptation dynamic programming algorithm.Perform step 206.
Concrete, as in Figure 2-4, step 205 can comprise:
Step 2051, location information and the target parking stall location information in parking environment when this car being in primary importance state in parking environment, as the input parameter of action network, obtain the first output parameter of action network.
Step 2052, the enhancing signal group determined according to enhancing learning algorithm, adopt first output parameter of evaluating network to action network to assess, obtain the second output parameter of action network.
In the embodiment of the present invention, rule is set can be of the enhancing signal of vehicle when initial condition: vehicle arrives target parking stall and obtains and strengthen signal r=+1, vehicle and obstacle collide to obtain and strengthen signal r=-0.2, and under other states, vehicle obtains and strengthens signal r=0.What Fig. 2-5 showed corresponding enhancing signal arranges regular schematic diagram.In Fig. 2-5,281 represent this car, and 282 represent other vehicles, and 283 represent target parking stall, and the enhancing signal of another location state transferred to by the numeral vehicle on arrow side from a location status according to a certain decision-making action.
This enhancing signal group comprises this car at enhancing signal corresponding to each location status.Wherein, process according to strengthening study (Q-Learning) algorithm and determine to strengthen signal group can reference diagram 2-6 to Fig. 2-8, to 9 in Fig. 2-8, a grid representation vehicle may be arranged in 9 location statuss of parking environment to Fig. 2-6, and this 9 location status is respectively S1 to S9.The invention process example assumes S3 is target parking stall.As shown in figures 2-6, arrow is used to indicate after a certain decision-making action selected by vehicle and transfers to another location state from a location status, and the numeral on arrow side is preset and strengthened signal.Example, according to strengthen signal rule is set, the enhancing signal r12 (not identifying Fig. 2-5) that vehicle transfers to location status S2 from location status S1 is 0, and the enhancing signal r23 (not identifying Fig. 2-5) that vehicle transfers to location status S3 from location status S2 is 1.
Employing strengthens more new formula, upgrades the enhancing signal in Fig. 2-6, obtains the enhancing signal in Fig. 2-7, and this enhancing more new formula is:
Wherein, r is the enhancing signal in Fig. 2-5, and represent that the enhancing signal of next location status transferred to by vehicle from current position state, α is commutation factor, example, and α can be 0.8.X represents current position state, and x ' represents next location status, and u ' represents the decision-making action that the next position state is corresponding, maxQ (x ', u ') represent the maximum enhancing signal that vehicle produces when the decision-making action of the next position condition selecting,
for the enhancing signal in Fig. 2-7, represent the enhancing signal of the action control gain of parameter that vehicle selects a certain decision-making action corresponding in current position state.Suppose to upgrade the enhancing signal r12 in Fig. 2-6, then adopt and strengthen more new formula and can obtain:
Thus the enhancing signal obtained after r12 renewal is 0.8.As illustrated in figs. 2-7, arrow is used to indicate after a certain decision-making action selected by vehicle and transfers to another location state from a location status, the numeral on arrow side is Q value, and this Q value represents that the largest cumulative of the action control gain of parameter that vehicle selects decision-making action corresponding from a location status strengthens signal.
Then on the basis of Fig. 2-7, according to maxim formula, determine that vehicle is at enhancing signal corresponding to each location status, be enhanced signal group.This maxim formula is:
Wherein, x represents current position state, and u represents the decision-making action that current position state is corresponding, V
*x () represents that vehicle is at enhancing signal corresponding to current position state.Fig. 2-8 shows the schematic diagram of this enhancing signal group, in Fig. 2-8, arrow is used to indicate after a certain decision-making action selected by vehicle and transfers to another location state from a location status, the numeral on arrow side is V value, represent that vehicle is at enhancing signal corresponding to each location status, this V value also represents the largest cumulative enhancing signal that vehicle obtains under a location status.
After determining enhancing signal group by said process, first output parameter of the evaluating network in self adaptation dynamic programming system to action network can be adopted to assess, obtain the second output parameter of action network.This second output parameter is action control parameter, and evaluation process can related description in refer step 201.Enhancing signal group based on self adaptation dynamic programming algorithm is that the cost function produced by evaluating network carrys out matching based on the enhancing signal group strengthening learning algorithm.
Step 206, using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path.Perform step 207.
Step 206 specifically comprises: using path corresponding for the second output parameter of action network as target driving path.Second output parameter is the parameter that step 2052 obtains.
Fig. 2-9 shows the schematic diagram in path corresponding to the second output parameter of action network.Assumed initial state vehicle is from the location status S7 in Fig. 2-8, from Fig. 2-9, vehicle arrives the ascend the throne the shortest driving path of configuration state S3 and target driving path of target parking stall can many, this target driving path can be S7-S4-S1-S2-S3, also can be S7-S8-S9-S6-S3, can also be S7-S4-S5-S6-S3 etc.Because the quantity of grid corresponding to parking environment is a lot, so the route that vehicle travels is similar to smooth curve, running route as Figure 2-3.
Step 207, travel path clustering this car according to target and complete warehouse-in action of stopping.
Obtain target driving path, this car of path clustering can be travelled according to target and complete warehouse-in action of stopping.
Step 208, detect this car and whether arrive target parking stall.Ruo Benche arrives target parking stall, and perform step 205, Ruo Benche does not arrive target parking stall, performs step 209.
Ruo Benche arrives target parking stall, then upgrade according to self adaptation dynamic programming algorithm and strengthen signal value, and test, the initial condition of random selecting vehicle again next time.Ruo Benche does not arrive target parking stall, then whether the mobile step number detecting this car is greater than default step number.
Whether step 209, the mobile step number detecting this car are greater than default step number.The mobile step number of Ruo Benche is greater than default step number, and perform step 205, the mobile step number of Ruo Benche is not more than default step number, performs step 201.
The mobile step number of Ruo Benche is greater than default step number, then upgrade according to self adaptation dynamic programming algorithm and strengthen signal value, and test, the initial condition of random selecting vehicle again next time.The mobile step number of Ruo Benche is not more than default step number, then perform step 201, determine the first action control parameter according to the status information of vehicle.
Repeated execution of steps 205 upgrades and strengthens signal value, until test number is greater than maximum test number MaxTrail=1000.Then the enhancing signal value upgraded for the last time and learning value are transplanted in actual vehicle, make actual vehicle complete parking warehouse-in process according to this learning value in real parking environment.
It should be noted that, strengthening signal setting if preset must be larger, as the enhancing signal r=100 obtained when vehicle arrives target parking stall, then after whole end of test, for the ease of calculating, the output parameter strengthening learning algorithm can be normalized, the value unit namely making the last enhancing signal value upgraded is [0,1], then using the value of the result after normalization method as the final warehouse-in that stops, and be transplanted in actual vehicle, make actual vehicle can complete parking warehouse-in process according to this learning value in real parking environment.
The parking storage method that the embodiment of the present invention provides, solve the problem of the autonomous parking toll algorithm design of intelligent vehicle, utilize self adaptation dynamic programming algorithm, autonomous and the parking environment of intelligent vehicle is made to carry out alternately, obtain and strengthen signal accordingly, and by strengthening signal value autonomous learning parking warehouse-in experience store parking warehouse-in experience, finally make intelligent vehicle in the unfixed situation of initial position of the position of parking stall width, obstacle car and vehicle, concrete better stability and adaptivity.This parking storage method can make intelligent vehicle autonomous learning, realizing optimum parking strategy under different parking environment, making the driving path of intelligent vehicle when stopping warehouse-in minimum, thus intelligent vehicle is independently stopped there is better stability, adaptivity, manoevreability and alerting ability.
In sum, the parking storage method that the embodiment of the present invention provides, can assess the first action control parameter, determine the second action control parameter, again according to the location status of second this car of action control parameter adjustment, if this location status is preset state, then according to self adaptation dynamic programming algorithm determination target driving path, thus control this car complete stop warehouse-in action, compared to prior art, improve the independent learning ability of vehicle, therefore, improve stability, reliability, manoevreability and alerting ability during vehicle parking warehouse-in.
Embodiments provide a kind of parking loading device, as shown in figure 3-1, this parking loading device can comprise:
First determining unit 301, for determining the first action control parameter according to the status information of this car, status information comprises the location information of this car in parking environment, the location information of target parking stall in parking environment, and this first action control parameter comprises the first starting force angle value of throttle or brake, the first initial angle of bearing circle rotation.
Second determining unit 302, for assessing the first action control parameter, determines the second action control parameter, and this second action control parameter comprises the reference load angle value of throttle or brake, the reference angle of bearing circle rotation.
Adjustment unit 303, for the location status according to second this car of action control parameter adjustment, the location status of this car after adjustment is primary importance state.
Judging unit 304, for judging whether primary importance state is preset state.
First processing unit 305, for when primary importance state is preset state, location information and the target parking stall location information in parking environment when this car being in primary importance state in parking environment, as the input parameter of self adaptation dynamic programming algorithm, obtains the output parameter of self adaptation dynamic programming algorithm.
Second processing unit 306, for using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path.
Control unit 307, completes for travelling this car of path clustering according to target warehouse-in action of stopping.
In sum, the parking loading device that the embodiment of the present invention provides, can assess the first action control parameter, determine the second action control parameter, again according to the location status of second this car of action control parameter adjustment, if this location status is preset state, then according to self adaptation dynamic programming algorithm determination target driving path, thus control this car complete stop warehouse-in action, compared to prior art, improve the independent learning ability of vehicle, therefore, improve stability, reliability and alerting ability during vehicle parking warehouse-in.
Optionally, as shown in figure 3-2, the first determining unit 301, comprising:
First determination module 3011, for will after action network input state information, the parameter that action network exports be defined as the first action control parameter, and this action network is the multiinput-multioutput nonlinear neural network comprising hidden layer.
As shown in Fig. 3-3, the second determining unit 302, comprising:
Second determination module 3021, for adopting evaluating network to assess the first action control parameter, determine the second action control parameter that action network exports, this evaluating network is the multiinput-multioutput nonlinear neural network comprising hidden layer.
As shown in Figure 3-4, judging unit 304, comprising:
Judge module 3041, for judging whether this car and obstacle collide.
3rd determination module 3042, for when Ben Che and obstacle collide, determines that primary importance state is preset state.
4th determination module 3043, for when Ben Che and obstacle do not collide, determines that primary importance state is not preset state.
As in Figure 3-5, this parking loading device can also comprise:
First detecting unit 308, for when Ben Che and obstacle do not collide, detects this car and whether arrives target parking stall.
3rd processing unit 309, for when this car arrives target parking stall, location information and the target parking stall location information in parking environment when this car being in primary importance state in parking environment is as the input parameter of self adaptation dynamic programming algorithm, obtain the output parameter of self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path.
Second detecting unit 310, during for not arriving target parking stall at this car, whether the mobile step number detecting this car is greater than default step number, mobile step number by this car move once the grid number of process.
Fourth processing unit 311, for when the mobile step number of this car is greater than default step number, location information and the target parking stall location information in parking environment when this car being in primary importance state in parking environment is as the input parameter of self adaptation dynamic programming algorithm, obtain the output parameter of self adaptation dynamic programming algorithm, and using path corresponding for the output parameter of self adaptation dynamic programming algorithm as target driving path.
3rd determining unit 312, for when the mobile step number of this car is not more than default step number, determine the 3rd action control parameter according to the current state of this car, the 3rd action control parameter comprises the second starting force angle value of throttle or brake, the second initial angle of bearing circle rotation.
Optionally, as seen in figures 3-6, the first processing unit 305, comprising:
First processing module 3051, location information and the target parking stall location information in parking environment during for this car being in primary importance state in parking environment, as the input parameter of action network, obtains the first output parameter of action network.
Second processing module 3052, for the enhancing signal group determined according to enhancing learning algorithm, adopt first output parameter of evaluating network to action network to assess, obtain the second output parameter of action network, strengthen signal group and comprise this car at enhancing signal corresponding to each location status.
Accordingly, as shown in fig. 3 to 7, the second processing unit 306, comprising:
3rd processing module 3061, for path corresponding to the second output parameter using action network as target driving path.
In sum, the parking loading device that the embodiment of the present invention provides, can assess the first action control parameter, determine the second action control parameter, again according to the location status of second this car of action control parameter adjustment, if this location status is preset state, then according to self adaptation dynamic programming algorithm determination target driving path, thus control this car complete stop warehouse-in action, compared to prior art, improve the independent learning ability of vehicle, therefore, improve stability, reliability, manoevreability and alerting ability during vehicle parking warehouse-in.
Those skilled in the art can be well understood to, and for convenience and simplicity of description, the specific works process of the device of foregoing description, unit and module, with reference to the corresponding process in preceding method embodiment, can not repeat them here.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.