CN105128856B

CN105128856B - Stop storage method and device

Info

Publication number: CN105128856B
Application number: CN201510528810.9A
Authority: CN
Inventors: 方啸; 陈效华
Original assignee: Chery Automobile Co Ltd
Current assignee: Dazhuo Intelligent Technology Co ltd; Dazhuo Quxing Intelligent Technology Shanghai Co ltd
Priority date: 2015-08-24
Filing date: 2015-08-24
Publication date: 2018-06-26
Anticipated expiration: 2035-08-24
Also published as: CN105128856A

Abstract

The invention discloses a kind of parking storage method and devices, belong to field of automotive active safety.This method includes：First action control parameter is determined according to the status information of this vehicle；First action control parameter is assessed, determines the second action control parameter；According to this truck position state of the second action control parameter adjustment, this truck position state after adjustment is first position state；If first position state is preset state, the location information of location information and target parking stall in parking environment when this vehicle is in first position state in parking environment obtains output parameter as the input parameter of adaptive dynamic programming algorithm；Using output parameter respective path as target travel path；According to target travel path clustering, this vehicle completes parking storage action.The present invention solves the problem of parking storage method stability is poor, and reliability is relatively low, and flexibility is poor, realizes and improves stability, reliability and the effect of flexibility, for vehicle parking to be controlled to be put in storage.

Description

Stop storage method and device

Technical field

The present invention relates to field of automotive active safety, more particularly to a kind of parking storage method and device.

Background technology

With the development of science and technology, the especially rapid development of intelligence computation, the research of intelligent vehicle is looked forward to as each cart The hot spot of focusing.Parking storage is one of essential function in intelligent driving technology, in docking process, driving path How many qualities for determining parking method.

In the prior art, parking storage method utilizes detections of radar parking environment state, and parking stall line is detected using camera, The distance of this vehicle and the barrier such as distance of obstacle vehicle, Ben Che and parking stall line is controlled by corresponding control algolithm, then according to The vehicle running path of artificial settings adjusts the distance of the distance of this vehicle and obstacle vehicle, Ben Che and parking stall line.

Since the vehicle running path in above-mentioned parking storage method is to be manually set, this method belongs to supervised learning mistake Journey, and in practical applications, the initial position of the width of parking stall, the position of obstacle vehicle and vehicle is usually not fixed, so being based on The stability of the parking storage method of the vehicle running path of artificial settings is poor, and reliability is relatively low, and flexibility is poor.

Invention content

In order to solve existing the problem of storage method stability is poor, and reliability is relatively low, and flexibility is poor that stop, this hair It is bright to provide a kind of parking storage method and device.The technical solution is as follows：

In a first aspect, a kind of parking storage method is provided, the method includes：

First action control parameter is determined according to the status information of this vehicle, the status information includes this vehicle in parking environment In location information in the parking environment of location information, target parking stall, the first action control parameter includes throttle Or brake the first starting force angle value, direction disc spins the first initial angle；

The first action control parameter is assessed, determines the second action control parameter, second action control Parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake；

According to the location status of described this vehicle of second action control parameter adjustment, the location status of this vehicle after adjustment is the One location status；

Judge whether the first position state is preset state；

If the first position state is preset state, in parking environment when this vehicle is in the first position state In location information in the parking environment of location information and the target parking stall as adaptive dynamic programming algorithm Input parameter obtains the output parameter of the adaptive dynamic programming algorithm；

Using the corresponding path of output parameter of the adaptive dynamic programming algorithm as target travel path；

According to the target travel path clustering, this vehicle completes parking storage action.

Optionally, the status information according to this vehicle determines the first action control parameter, including：

Will be after status information described in action network inputs, the parameter of the action network output is determined as described first and moves Make control parameter, the action network is the multiple-input and multiple-output nonlinear neural network comprising hidden layer；

It is described that the first action control parameter is assessed, determine the second action control parameter, including：

The first action control parameter is assessed using evaluation network, determines the second of the action network output Action control parameter, the evaluation network is the multiple-input and multiple-output nonlinear neural network comprising hidden layer.

Optionally, it is described to judge whether the first position state is preset state, including：

Judge whether this vehicle collides with barrier；

Ruo Benche collides with the barrier, and it is preset state to determine the first position state；

Ruo Benche does not collide with the barrier, and it is not preset state to determine the first position state.

Optionally, the parking environment is divided into the grid of at least two area equations, and each grid corresponds to one Location status, before this vehicle completion parking storage action according to the target travel path clustering, the method is also wrapped It includes：

Ruo Benche does not collide with the barrier, then detects whether this vehicle reaches the target parking stall；

Ruo Benche reaches the target parking stall, then position when this vehicle being in the first position state in parking environment Confidence ceases and location information of the target parking stall in the parking environment is as the defeated of the adaptive dynamic programming algorithm Enter parameter, obtain the output parameter of the adaptive dynamic programming algorithm, and by the output of the adaptive dynamic programming algorithm The corresponding path of parameter is as target travel path；

Ruo Benche does not reach the target parking stall, then whether the mobile step number for detecting this vehicle is more than default step number, the shifting The grid number that dynamic step number is once passed through for the movement of this vehicle；

The mobile step number of Ruo Benche is more than the default step number, then stops when this vehicle being in the first position state The location information of location information and the target parking stall in the parking environment in environment is advised as the adaptive dynamic The input parameter of cost-effective method, obtains the output parameter of the adaptive dynamic programming algorithm, and by the adaptive Dynamic Programming The corresponding path of output parameter of algorithm is as target travel path；

The mobile step number of Ruo Benche is not more than the default step number, then determines third action control according to the current state of this vehicle Parameter processed, at the beginning of the third action control parameter includes throttle or the second starting force angle value of brake, the second of direction disc spins Beginning angle.

Optionally, location information when this vehicle is in the first position state in parking environment and the mesh Location information input parameter as adaptive dynamic programming algorithm of the parking stall in the parking environment is marked, is obtained described adaptive The output parameter of dynamic programming algorithm is answered, including：

Location information and target parking stall when this vehicle is in the first position state in parking environment is in institute Input parameter of the location information as the action network in parking environment is stated, obtains the first output ginseng of the action network Number；

According to the enhancing signal group that enhancing learning algorithm determines, using the evaluation network to the first of the action network Output parameter is assessed, and obtains the second output parameter of the action network, and the enhancing signal group includes this vehicle each The corresponding enhancing signal of location status；

The corresponding path of output parameter using the adaptive dynamic programming algorithm is as target travel path, packet It includes：

Using the corresponding path of the second output parameter of the action network as the target travel path.

Second aspect, provides a kind of parking loading device, and described device includes：

First determination unit, for determining the first action control parameter, the status information according to the status information of this vehicle Including this vehicle the location information of location information, target parking stall in the parking environment in parking environment, described first is dynamic Make the first initial angle that control parameter includes the first starting force angle value of throttle or brake, direction disc spins；

Second determination unit for assessing the first action control parameter, determines the second action control parameter, The second action control parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake；

Adjustment unit, for the location status according to described this vehicle of second action control parameter adjustment, this vehicle after adjustment Location status be first position state；

Judging unit, for judging whether the first position state is preset state；

First processing units, for when the first position state is preset state, this vehicle to be in described first The location information of location information and the target parking stall in parking environment when putting state in parking environment is used as certainly The input parameter of dynamic programming algorithm is adapted to, obtains the output parameter of the adaptive dynamic programming algorithm；

Second processing unit, for using the corresponding path of output parameter of the adaptive dynamic programming algorithm as target Driving path；

Control unit completes parking storage action for this vehicle according to the target travel path clustering.

Optionally, first determination unit, including：

First determining module, for will be to described in action network inputs after status information, the ginseng of the action network output Number is determined as the first action control parameter, and the action network is the multiple-input and multiple-output non-linear neural comprising hidden layer Network；

Second determination unit, including：

Second determining module for being assessed using evaluation network the first action control parameter, is determined described The second action control parameter of network output is acted, the evaluation network is the non-linear god of the multiple-input and multiple-output comprising hidden layer Through network.

Optionally, the judging unit, including：

Judgment module, for judging whether this vehicle collides with barrier；

Third determining module, for when Ben Che and the barrier collide, determining that the first position state is Preset state；

4th determining module, for when Ben Che and the barrier do not collide, determining the first position state It is not preset state.

Optionally, the parking environment is divided into the grid of at least two area equations, and each grid corresponds to one Location status, described device further include：

First detection unit, for when Ben Che and the barrier do not collide, detect this vehicle whether reach it is described Target parking stall；

Third processing unit, for when this vehicle reaches the target parking stall, this vehicle to be in the first position state When the location information of location information and the target parking stall in the parking environment in parking environment as described adaptive The input parameter of dynamic programming algorithm is answered, obtains the output parameter of the adaptive dynamic programming algorithm, and will be described adaptive The corresponding path of output parameter of dynamic programming algorithm is as target travel path；

Second detection unit, for when this vehicle does not reach the target parking stall, whether the mobile step number for detecting this vehicle to be big In the grid number that default step number, the mobile step number are once passed through for the movement of this vehicle；

Fourth processing unit, for when the mobile step number of this vehicle is more than the default step number, this vehicle to be in described the The location information of location information and the target parking stall in parking environment during one location status in parking environment is made For the input parameter of the adaptive dynamic programming algorithm, the output parameter of the adaptive dynamic programming algorithm is obtained, and will The corresponding path of output parameter of the adaptive dynamic programming algorithm is as target travel path；

Third determination unit, for the mobile step number of this vehicle be not more than the default step number when, according to the current of this vehicle State determines third action control parameter, the third action control parameter include throttle or brake the second starting force angle value, Second initial angle of direction disc spins.

Optionally, the first processing units, including：

First processing module, location information during for this vehicle to be in the first position state in parking environment and Input parameter of location information of the target parking stall in the parking environment as the action network, obtains the action First output parameter of network；

Second processing module, for the enhancing signal group determined according to enhancing learning algorithm, using the evaluation network pair First output parameter of the action network is assessed, and obtains the second output parameter of the action network, the enhancing letter Number group includes this vehicle in the corresponding enhancing signal of each location status；

The second processing unit, including：

Third processing module, for using the corresponding path of the second output parameter of the action network as the target line Sail path.

The present invention provides a kind of parking storage method and devices, the first action control parameter can be assessed, really Fixed second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status is pre- If state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage action, Compared to the prior art, the independent learning ability of vehicle is improved, this improves stability during vehicle parking storage, can By property and flexibility.

It should be understood that above general description and following detailed description are only exemplary and explanatory, not It can the limitation present invention.

Description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings Attached drawing.

Fig. 1 is a kind of flow chart of storage method that stops provided in an embodiment of the present invention；

Fig. 2-1 is the flow chart of another parking storage method provided in an embodiment of the present invention；

Fig. 2-2 is a kind of structure chart of adaptive Dynamic Programming system provided in an embodiment of the present invention；

Fig. 2-3 is a kind of schematic diagram of parking environment provided in an embodiment of the present invention；

Fig. 2-4 is a kind of flow of output parameter for obtaining adaptive dynamic programming algorithm provided in an embodiment of the present invention Figure；

Fig. 2-5 is a kind of setting rule schematic diagram for enhancing signal provided in an embodiment of the present invention；

A kind of Fig. 2-6 schematic diagrames of default enhancing signal provided in an embodiment of the present invention；

Fig. 2-7 is the enhancing signal schematic representation after being updated to the default enhancing signal in Fig. 2-6；

A kind of Fig. 2-8 schematic diagrames for enhancing signal group provided in an embodiment of the present invention

Fig. 2-9 is a kind of signal in corresponding path of the second output parameter for acting network provided in an embodiment of the present invention Figure；

Fig. 3-1 is a kind of structure diagram of loading device that stops provided in an embodiment of the present invention；

Fig. 3-2 is a kind of structure diagram of first determination unit provided in an embodiment of the present invention；

Fig. 3-3 is a kind of structure diagram of second determination unit provided in an embodiment of the present invention；

Fig. 3-4 is a kind of structure diagram of judging unit provided in an embodiment of the present invention

Fig. 3-5 is the structure diagram of another parking loading device provided in an embodiment of the present invention；

Fig. 3-6 is a kind of structure diagram of first processing units provided in an embodiment of the present invention；

Fig. 3-7 is a kind of structure diagram of second processing unit provided in an embodiment of the present invention.

Pass through above-mentioned attached drawing, it has been shown that the specific embodiment of the present invention will be hereinafter described in more detail.These attached drawings It is not intended to limit the range of present inventive concept by any mode with word description, but is by reference to specific embodiment Those skilled in the art illustrate idea of the invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

An embodiment of the present invention provides a kind of parking storage method, as shown in Figure 1, the parking storage method can include：

Step 101 determines the first action control parameter according to the status information of this vehicle, which is stopping including this vehicle The location information of location information, target parking stall in parking environment in vehicle environment, the first action control parameter include throttle Or brake the first starting force angle value, direction disc spins the first initial angle.

Step 102 assesses the first action control parameter, determines the second action control parameter, the second action control Parameter processed includes reference load angle value, the reference angle of direction disc spins of throttle or brake.

Step 103, the location status according to second action control parameter adjustment this vehicle, the location status of this vehicle after adjustment For first position state.

Step 104 judges whether first position state is preset state.

If step 105, first position state are preset state, in parking environment when this vehicle is in first position state In input parameter of the location information as adaptive dynamic programming algorithm in parking environment of location information and target parking stall, Obtain the output parameter of adaptive dynamic programming algorithm.

Step 106, using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target travel path.

Step 107, according to target travel path clustering, this vehicle completes parking storage action.

In conclusion parking storage method provided in an embodiment of the present invention, can comment the first action control parameter Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage Property, reliability and flexibility.

Optionally, step 101 includes：Will be after action network inputs status information, the parameter of action network output determines For the first action control parameter, which is the multiple-input and multiple-output nonlinear neural network comprising hidden layer.

Step 102 includes：The first action control parameter is assessed using evaluation network, determines action network output Second action control parameter, the evaluation network are the multiple-input and multiple-output nonlinear neural network comprising hidden layer.

Step 104 includes：Judge whether this vehicle collides with barrier；Ruo Benche collides with barrier, determines First position state is preset state；Ruo Benche does not collide with barrier, and it is not preset state to determine first position state.

Parking environment is divided into the grid of at least two area equations, and each grid corresponds to a location status, in step Before 107, which can also include：

Ruo Benche does not collide with barrier, then detects whether this vehicle reaches target parking stall；

Ruo Benche reaches target parking stall, then location information when this vehicle be in first position state in parking environment with Input parameter of location information of the target parking stall in parking environment as adaptive dynamic programming algorithm, obtains adaptive dynamic The output parameter of planning algorithm, and using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target travel road Diameter；

Ruo Benche does not reach target parking stall, then whether the mobile step number for detecting this vehicle is more than default step number, the movement step number The grid number once passed through for the movement of this vehicle；

The mobile step number of Ruo Benche is more than default step number, then when this vehicle being in first position state in parking environment Input parameter of the location information of location information and target parking stall in parking environment as adaptive dynamic programming algorithm, obtains The output parameter of adaptive dynamic programming algorithm, and using the corresponding path of the output parameter of adaptive dynamic programming algorithm as mesh Mark driving path；

The mobile step number of Ruo Benche is not more than default step number, then determines that third action control is joined according to the current state of this vehicle Number, the third action control parameter include the second initial angle of the second starting force angle value of throttle or brake, direction disc spins.

Optionally, step 105 includes：Location information and mesh when this vehicle is in first position state in parking environment Input parameter of location information of the parking stall in parking environment as action network is marked, obtains the first output ginseng of action network Number；According to the enhancing signal group that enhancing learning algorithm determines, the first output parameter for acting network is carried out using evaluation network Assessment, obtains the second output parameter of action network, and enhancing signal group includes this vehicle in the corresponding enhancing letter of each location status Number.

Correspondingly, step 106 includes：The corresponding path of the second output parameter of network will be acted as target travel road Diameter.

An embodiment of the present invention provides a kind of parking storage methods, and as shown in Fig. 2-1, which can wrap It includes：

Step 201 determines the first action control parameter according to the status information of this vehicle.Perform step 202.

The status information of this vehicle includes this vehicle the position of location information, target parking stall in parking environment in parking environment Confidence ceases, and the first action control parameter includes the first starting force angle value, the first initial angle of direction disc spins of throttle or brake Degree.

Parking storage method provided in an embodiment of the present invention determines target travel path using adaptive dynamic programming algorithm. The process of parking storage method based on adaptive dynamic programming algorithm is the process of a study, and vehicle (i.e. this vehicle) is in success With learning how that completing parking storage using shortest driving path acts in the experience of failure.Since vehicle is in learning process The phenomenon that in the presence of failure, therefore can first be tested on computers, treat vehicle (i.e. virtual vehicle) completion learning process it Afterwards, then by the correlation in learning algorithm it is transplanted in actual vehicle.Computer can be first set when being tested at the beginning of two Beginning parameter, such as maximum test number (TN) MaxTrail=1000, the i.e. default step number MaxStep=7 of the mobile step number of maximum.Parking environment The grid of at least two area equations can be divided into, each grid corresponds to a location status, and moves step number and refer to vehicle The mobile grid number once passed through.Fig. 2-2 shows the corresponding system construction drawing of adaptive dynamic programming algorithm, as Fig. 2- Shown in 2, which is made of action network and evaluation two neural networks of network, and action network and evaluation network are comprising hidden The multiple-input and multiple-output nonlinear neural network of layer is hidden, two Net works use the forward direction of Nonlinear Multi perceptron structure Transport net.Use the system allow vehicle carry out autonomous learning process for：Act current state of the network according to vehicle It measures X (t) and generates decision action U (t), wherein, X (t) is stopping including position of the vehicle in parking environment and target parking stall Position in vehicle environment, the corresponding set control parameters of decision action U (t), the action control parameter include throttle or brake The first starting force angle value, direction disc spins the first initial angle.Decision action U (t) can change the current position shape of vehicle State so that vehicle is converted to a new location status from current position state, accordingly obtains a new quantity of state X (t+1). At the same time, parking environment can feed back to one enhancing signal r (t) of vehicle, which fights to the finish for expression and instigate Make the return immediately of U (t).The reward or punishment that the embodiment of the present invention is subject to by enhancing signal expression vehicle.In general, enhancing letter It number is represented with numerical value, the size of numerical value represents " good " and " bad " of decision action.Enhancing signal can be input into evaluation network, Evaluation network is made to export cost function J (t), the decision for then making evaluation network and being generated to action network acts corresponding action Control parameter is assessed in real time.After two neural networks generate output, system will carry out feedback regulation to two outputs, In, evaluating the feedback regulation strategy of network is：Go approximate conversion enhancing signal infinite cumulative using the value of cost function J (t) and. The feedback regulation strategy of action network is by comparing utility function U_c(t) desired value and the size of cost function J (t), obtain Action error according to the action error, is adjusted, and then make action the weights of two neural networks using gradient descent method The action control parameter of network output tends to be optimal, achievees the purpose that decision action is made to tend to be optimal.Wherein, cost function J (t) For representing that vehicle acts the cost paid when driving, utility function U according to the decision that action network exports_c(t) for representing Relationship between the input parameter of adaptive dynamic programming algorithm system and decision action U (t).X (t) and X (t+ in Fig. 2-2 1) it is the input quantity of system, R (t) is the cumulative and U for enhancing signal_c(t) it is utility function, α is commutation factor, for representing Latter state is to the influence degree of previous state, the cost function of J (t-1) expression previous states.

Equally, for new quantity of state X (t+1), vehicle can also make new decision action U (t+1), and from parking environment One new enhancing signal r (t+1) of middle acquisition.And so on, i.e., vehicle can interact at each moment with parking environment, according to The enhancing signal value of parking environment feedback, on-line control action policy, to obtain maximum return in being acted in follow-up decision.

With reference to figure 2-2, step 201 specifically includes：Will be after action network inputs status information, the ginseng of action network output Number is determined as the first action control parameter.

The state of vehicle is randomly selected, and using the state as the original state of vehicle.When vehicle is in original state, Test number (TN) trail=0.Each time during on-test, mobile step number step=0.The state of vehicle refers to that vehicle stops The position of position and target parking stall in parking environment in environment.The embodiment of the present invention, such as can be with by parking environment discretization Parking environment is divided into multiple grids, each grid corresponds to a location status, and parking environment as Figure 2-3 is divided For 11*9 grid, in general, vehicle is travelled from current location to target parking stall, need by multiple grids.231 in Fig. 2-3 Represent other vehicles, 232 also illustrate that other vehicles, and 233 represent this vehicle, and 234 represent target parking stall.It should be added that In practical application, the divided grid number of parking environment is much more more than the grid number in Fig. 2-3, the embodiment of the present invention to this not It is construed as limiting.

It is exemplary, radar and video camera on vehicle can be installed, detect the state of vehicle in real time by radar and video camera, The status information of vehicle is obtained, such as by the boundary of detections of radar target parking stall and other vehicles, by imaging machine testing parking stall Line.Wherein, other vehicles are obstacle vehicle, and the boundary of obstacle vehicle and target parking stall is barrier.When vehicle is in original state, Action network will randomly generate a decision action, and then obtain the first action control parameter, be joined according to first action control Number can make vehicle reach a new location status.

Step 202 assesses the first action control parameter, determines the second action control parameter.Perform step 203.

Second action control parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake.Work as vehicle Reach a new location status when, vehicle can be given by parking environment one enhancing signal, evaluation network according to parking ring The enhancing signal that border is given generates cost function, and acting corresponding action control parameter to the decision that action network generates carries out Assessment in real time, obtains the second action control parameter.

With reference to figure 2-2, step 202 specifically includes：The first action control parameter is assessed using evaluation network, is determined Act the second action control parameter of network output.

Step 203, the location status according to second action control parameter adjustment this vehicle, the location status of this vehicle after adjustment For first position state.Perform step 204.

According to the location status of second action control parameter adjustment this vehicle, the location status of this vehicle after adjustment is first State is put, at this point, mobile step number step=step '+1, step ' represent that vehicle is in corresponding mobile step during a upper location status Number.

Step 204 judges whether this vehicle collides with barrier.Ruo Benche collides with barrier, performs step 205；Ruo Benche does not collide with barrier, performs step 208.

Judge whether vehicle collides with barrier, specifically, may determine that whether vehicle collides with obstacle vehicle, Or whether vehicle is exercised to the boundary of target parking stall.Ruo Benche collides with barrier, then is learned according to step 205 according to enhancing Algorithm update enhancing signal value is practised, and is tested next time, randomly selects the original state of vehicle again, it is final to obtain enhancing The output parameter of learning algorithm, so that it is determined that the target travel path of vehicle；Ruo Benche does not collide with barrier, then detects Whether this vehicle reaches target parking stall.

Step 205, location information when this vehicle is in first position state in parking environment and target parking stall are being stopped Input parameter of the location information as adaptive dynamic programming algorithm in vehicle environment, obtains the defeated of adaptive dynamic programming algorithm Go out parameter.Perform step 206.

Specifically, as in Figure 2-4, step 205 can include：

Step 2051, location information when this vehicle is in first position state in parking environment and target parking stall are being stopped Input parameter of the location information as action network in vehicle environment, obtains the first output parameter of action network.

Step 2052, according to the enhancing signal group that determines of enhancing learning algorithm, using evaluation network to the of action network One output parameter is assessed, and obtains the second output parameter of action network.

In the embodiment of the present invention, the setting rule of enhancing signal of the vehicle in original state can be：Vehicle reaches mesh It marks parking stall and obtains enhancing signal r=+1, vehicle collides with barrier obtains enhancing signal r=-0.2, other states are got off Obtain enhancing signal r=0.Fig. 2-5 shows the setting rule schematic diagram of corresponding enhancing signal.In Fig. 2-5,281 represent This vehicle, 282 represent other vehicles, and 283 represent target parking stalls, the digital representation vehicle beside arrow according to a certain decision act from One location status is transferred to the enhancing signal of another location state.

The enhancing signal group includes this vehicle in the corresponding enhancing signal of each location status.Wherein, (Q- is learnt according to enhancing Learning) algorithm determines that the process of enhancing signal group can be with reference chart 2-6 to Fig. 2-8,9 grids in Fig. 2-6 to Fig. 2-8 Represent that vehicle is likely located at 9 location status in parking environment, this 9 location status is respectively S1 to S9.The embodiment of the present invention Assuming that S3 is target parking stall.As shown in figures 2-6, arrow is used to indicate after a certain decision of vehicle selection acts from a location status It is transferred to another location state, the digital representation beside arrow presets enhancing signal.It is exemplary, it is advised according to the setting of enhancing signal Then, it is 0 that vehicle is transferred to the enhancing signal r12 (not identified in Fig. 2-5) of location status S2 from location status S1, and vehicle is from position The enhancing signal r23 (not identified in Fig. 2-5) that state S2 is transferred to location status S3 is 1.

Using enhancing more new formula, the enhancing signal in Fig. 2-6 is updated, obtains the enhancing signal in Fig. 2-7, it should Enhancing more new formula is：

Wherein, r is the enhancing signal in Fig. 2-5, represents that vehicle is transferred to next location status from current position state Enhancing signal, α is commutation factor, exemplary, α can be 0.8.X represents current position state, and x ' represents next position shape State, u ' represent that the corresponding decision of the next position state acts, and maxQ (x ', u ') represent vehicle in the next position condition selecting decision The maximum enhancing signal generated during action,For the enhancing signal in Fig. 2-7, represent that vehicle is selected in current position state Select the enhancing signal that a certain decision acts corresponding action control gain of parameter.Assuming that the enhancing signal r12 in Fig. 2-6 is carried out Update can then be obtained using enhancing more new formula：

It is 0.8 so as to obtain the updated enhancing signals of r12.As illustrated in figs. 2-7, it is a certain to be used to indicate vehicle selection for arrow Another location state is transferred to from a location status, the number beside arrow is Q values, which represents vehicle after decision action The largest cumulative for a decision being selected to act corresponding action control gain of parameter from a location status enhances signal.

Then on the basis of Fig. 2-7, according to maximum value formula, determine vehicle in the corresponding enhancing letter of each location status Number, obtain enhancing signal group.The maximum value formula is：

Wherein, x represents current position state, and u represents the corresponding decision action of current position state, V^*(x) vehicle is represented In the corresponding enhancing signal of current position state.Fig. 2-8 shows the schematic diagram of the enhancing signal group, and in Fig. 2-8, arrow is used for Instruction vehicle selects to be transferred to another location state from a location status after the action of a certain decision, and the number beside arrow is V Value represents vehicle in the corresponding enhancing signal of each location status, which also illustrates that vehicle obtained under a location status Largest cumulative enhances signal.

After enhancing signal group is determined by the above process, the evaluation network in adaptive Dynamic Programming system may be used The first output parameter for acting network is assessed, obtains the second output parameter of action network.Second output parameter is For action control parameter, evaluation process can refer to the related description in step 201.Increasing based on adaptive dynamic programming algorithm Strong signal group is to be fitted the enhancing signal group based on enhancing learning algorithm by evaluating the cost function of network generation.

Step 206, using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target travel path.It performs Step 207.

Step 206 specifically includes：The corresponding path of the second output parameter of network will be acted as target travel path.The Two output parameters are the parameter that step 2052 obtains.

Fig. 2-9 shows the schematic diagram in the corresponding path of the second output parameter of action network.Assuming that original state vehicle Location status S7 from Fig. 2-8, by Fig. 2-9 it is found that vehicle arrival target parking stall ascends the throne and puts the most short traveling of state S3 Path, that is, target travel path can have a plurality of, which can be S7-S4-S1-S2-S3, or S7- S8-S9-S6-S3 can also be S7-S4-S5-S6-S3 etc..Since there are many quantity of the corresponding grid of parking environment, so vehicle Traveling route be similar to smooth curve, travel route as Figure 2-3.

Step 207, according to target travel path clustering, this vehicle completes parking storage action.

Obtain target travel path, you can this vehicle completes parking storage action according to target travel path clustering.

Step 208 detects whether this vehicle reaches target parking stall.Ruo Benche reaches target parking stall, performs step 205, if this Vehicle does not reach target parking stall, performs step 209.

Ruo Benche reaches target parking stall, then updates enhancing signal value according to adaptive dynamic programming algorithm, and carry out next Secondary experiment randomly selects the original state of vehicle again.Ruo Benche does not reach target parking stall, then detects the mobile step number of this vehicle and be It is no to be more than default step number.

Whether step 209 detects the mobile step number of this vehicle more than default step number.The mobile step number of Ruo Benche is more than default step Number, performs step 205, and the mobile step number of Ruo Benche is not more than default step number, performs step 201.

The mobile step number of Ruo Benche is more than default step number, then updates enhancing signal value according to adaptive dynamic programming algorithm, And tested next time, the original state of vehicle is randomly selected again.The mobile step number of Ruo Benche is not more than default step number, then Step 201 is performed, the first action control parameter is determined according to the status information of vehicle.

Step 205 update enhancing signal value is repeated, until test number (TN) is more than maximum test number (TN) MaxTrail= 1000.Then the newer enhancing signal value, that is, learning value of last time is transplanted in actual vehicle so that actual vehicle is existing Parking storage process is completed according to the learning value in real parking environment.

It should be noted that if default enhancing signal sets bigger, the increasing obtained when reaching target parking stall such as vehicle Strong signal r=100, then after entire off-test, for the ease of calculating, can will enhance the output parameter of learning algorithm into Row normalized, that is, the newer value unit for enhancing signal value of last time is made to be [0,1], after then normalizing Value of the result as final parking storage, and be transplanted in actual vehicle so that actual vehicle is in real parking environment In can according to the learning value complete parking storage process.

Parking storage method provided in an embodiment of the present invention solves asking for the autonomous parking toll algorithm design of intelligent vehicle Topic, using adaptive dynamic programming algorithm, makes intelligent vehicle independently be interacted with parking environment, obtains corresponding enhancing signal, And pass through enhance signal value autonomous learning parking storage experience and store parking storage experience, finally so that intelligent vehicle it is wide in parking stall Degree, the position of obstacle vehicle and vehicle initial position it is unfixed in the case of, specific preferably stability and adaptivity.This stops Vehicle storage method can make intelligent vehicle autonomous learning, realize the optimal parking strategy under different parking environments, intelligent vehicle is made to exist Driving path during parking storage is minimum, so as to which intelligent vehicle be made independently to stop with better stability, adaptivity, mobility And flexibility.

In conclusion parking storage method provided in an embodiment of the present invention, can comment the first action control parameter Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage Property, reliability, mobility and flexibility.

An embodiment of the present invention provides a kind of parking loading devices, and as shown in figure 3-1, which can wrap It includes：

First determination unit 301, for determining the first action control parameter, status information packet according to the status information of this vehicle Include location information, target parking stall location information in parking environment of this vehicle in parking environment, first action control ginseng Number includes the first initial angle of the first starting force angle value of throttle or brake, direction disc spins.

Second determination unit 302 for assessing the first action control parameter, determines the second action control parameter, The second action control parameter includes reference load angle value, the reference angle of direction disc spins of throttle or brake.

Adjustment unit 303, for the location status according to second action control parameter adjustment this vehicle, this vehicle after adjustment Location status is first position state.

Judging unit 304, for judging whether first position state is preset state.

First processing units 305, for when first position state is preset state, this vehicle to be in first position state When the location information of location information and target parking stall in parking environment in parking environment calculated as adaptive Dynamic Programming The input parameter of method obtains the output parameter of adaptive dynamic programming algorithm.

Second processing unit 306, for using the corresponding path of the output parameter of adaptive dynamic programming algorithm as target Driving path.

Control unit 307 completes parking storage action for this vehicle according to target travel path clustering.

In conclusion parking loading device provided in an embodiment of the present invention, can comment the first action control parameter Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage Property, reliability and flexibility.

Optionally, as shown in figure 3-2, the first determination unit 301, including：

First determining module 3011, for the parameter for after action network inputs status information, acting network output is true It is set to the first action control parameter, which is the multiple-input and multiple-output nonlinear neural network comprising hidden layer.

As shown in Fig. 3-3, the second determination unit 302, including：

Second determining module 3021 for being assessed using evaluation network the first action control parameter, determines action Second action control parameter of network output, the evaluation network are the multiple-input and multiple-output non-linear neural net comprising hidden layer Network.

As shown in Figure 3-4, judging unit 304, including：

Judgment module 3041, for judging whether this vehicle collides with barrier.

Third determining module 3042, for when Ben Che and barrier collide, it to be default to determine first position state State.

4th determining module 3043, for when Ben Che and barrier do not collide, determining that first position state is not Preset state.

As in Figure 3-5, which can also include：

First detection unit 308, for when Ben Che and barrier do not collide, detecting whether this vehicle reaches target carriage Position.

Third processing unit 309, for when this vehicle reaches target parking stall, stopping when this vehicle is in first position state The location information of location information and target parking stall in parking environment in vehicle environment is as the defeated of adaptive dynamic programming algorithm Enter parameter, obtain the output parameter of adaptive dynamic programming algorithm, and the output parameter of adaptive dynamic programming algorithm is corresponded to Path as target travel path.

Second detection unit 310, for when this vehicle does not reach target parking stall, whether the mobile step number for detecting this vehicle to be more than Default step number, the grid number that mobile step number is once passed through for the movement of this vehicle.

Fourth processing unit 311, for when the mobile step number of this vehicle is more than default step number, this vehicle to be in first position The location information of location information and target parking stall in parking environment during state in parking environment is advised as adaptive dynamic The input parameter of cost-effective method, obtains the output parameter of adaptive dynamic programming algorithm, and by the defeated of adaptive dynamic programming algorithm Go out the corresponding path of parameter as target travel path.

Third determination unit 312, for when the mobile step number of this vehicle is not more than default step number, according to the current shape of this vehicle State determines third action control parameter, which includes the second starting force angle value, the direction of throttle or brake Second initial angle of disc spins.

Optionally, as seen in figures 3-6, first processing units 305, including：

First processing module 3051, location information during for this vehicle to be in first position state in parking environment and Input parameter of location information of the target parking stall in parking environment as action network obtains the first output ginseng of action network Number.

Second processing module 3052, for the enhancing signal group determined according to enhancing learning algorithm, using evaluation network pair First output parameter of action network is assessed, and obtains the second output parameter of action network, and enhancing signal group includes this vehicle In the corresponding enhancing signal of each location status.

Correspondingly, as shown in fig. 3 to 7, second processing unit 306, including：

Third processing module 3061, for the corresponding path of the second output parameter of network will to be acted as target travel road Diameter.

In conclusion parking loading device provided in an embodiment of the present invention, can comment the first action control parameter Estimate, determine the second action control parameter, further according to the location status of second action control parameter adjustment this vehicle, if the location status For preset state, then target travel path is determined according to adaptive dynamic programming algorithm, so as to which this vehicle be controlled to complete parking storage Action, compared to the prior art, improves the independent learning ability of vehicle, this improves stabilizations during vehicle parking storage Property, reliability, mobility and flexibility.

It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description, The specific work process of unit and module can refer to the corresponding process in preceding method embodiment, and details are not described herein.

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on should all be included in the protection scope of the present invention.

Claims

1. a kind of parking storage method, which is characterized in that the method includes：

After action network inputs status information, the parameter of the action network output is determined as the first action control parameter, institute It is the multiple-input and multiple-output nonlinear neural network comprising hidden layer to state action network, and the status information is stopped including this vehicle The location information of location information, target parking stall in the parking environment in environment, the first action control parameter include Throttle or the first starting force angle value of brake, the first initial angle of direction disc spins；

The first action control parameter is assessed using evaluation network, determines the second action of the action network output Control parameter, the evaluation network is the multiple-input and multiple-output nonlinear neural network comprising hidden layer, the second action control Parameter processed includes reference load angle value, the reference angle of direction disc spins of throttle or brake；

According to the location status of described this vehicle of second action control parameter adjustment, the location status of this vehicle after adjustment is first Put state；

Judge whether the first position state is preset state；

If the first position state is preset state, when this vehicle is in the first position state in parking environment Input of the location information as adaptive dynamic programming algorithm of location information and the target parking stall in the parking environment Parameter obtains the output parameter of the adaptive dynamic programming algorithm；

2. according to the method described in claim 1, it is characterized in that,

It is described to judge whether the first position state is preset state, including：

Judge whether this vehicle collides with barrier；

3. according to the method described in claim 2, it is characterized in that, the parking environment is divided at least two area equations Grid, each grid correspond to a location status, and parking is completed in described this vehicle according to the target travel path clustering Before storage action, the method further includes：

Ruo Benche reaches the target parking stall, then position letter when this vehicle being in the first position state in parking environment Breath and location information of the target parking stall in the parking environment are joined as the input of the adaptive dynamic programming algorithm Number, obtains the output parameter of the adaptive dynamic programming algorithm, and by the output parameter of the adaptive dynamic programming algorithm Corresponding path is as target travel path；

Ruo Benche does not reach the target parking stall, then whether the mobile step number for detecting this vehicle is more than default step number, the mobile step The grid number that number is once passed through for the movement of this vehicle；

The mobile step number of Ruo Benche is more than the default step number, then in parking environment when this vehicle being in the first position state In location information in the parking environment of location information and the target parking stall calculated as the adaptive Dynamic Programming The input parameter of method, obtains the output parameter of the adaptive dynamic programming algorithm, and by the adaptive dynamic programming algorithm The corresponding path of output parameter as target travel path；

The mobile step number of Ruo Benche is not more than the default step number, then determines that third action control is joined according to the current state of this vehicle Number, the third action control parameter include the second starting force angle value, the second initial angle of direction disc spins of throttle or brake Degree.

4. according to the method described in claim 1, it is characterized in that, described stopping when this vehicle is in the first position state The location information of location information and the target parking stall in the parking environment in vehicle environment is as adaptive Dynamic Programming The input parameter of algorithm obtains the output parameter of the adaptive dynamic programming algorithm, including：

Location information and the target parking stall when this vehicle is in the first position state in parking environment stop described Input parameter of the location information as the action network in vehicle environment, obtains the first output parameter of the action network；

According to the enhancing signal group that enhancing learning algorithm determines, using first output of the evaluation network to the action network Parameter is assessed, and obtains the second output parameter of the action network, and the enhancing signal group includes this vehicle in each position The corresponding enhancing signal of state；

The corresponding path of output parameter using the adaptive dynamic programming algorithm as target travel path, including：

5. a kind of parking loading device, which is characterized in that described device includes：

First determination unit, for after action network inputs status information, the parameter of the action network output to be determined as the One action control parameter, the network that acts is the multiple-input and multiple-output nonlinear neural network comprising hidden layer, the state Information includes this vehicle the location information of location information, target parking stall in the parking environment in parking environment, and described the One action control parameter includes the first initial angle of the first starting force angle value of throttle or brake, direction disc spins；

Second determination unit for being assessed using evaluation network the first action control parameter, determines the action Second action control parameter of network output, the evaluation network is the multiple-input and multiple-output non-linear neural net comprising hidden layer Network, the second action control parameter include reference load angle value, the reference angle of direction disc spins of throttle or brake；

Adjustment unit, for the location status according to described this vehicle of second action control parameter adjustment, the position of this vehicle after adjustment State is put as first position state；

Judging unit, for judging whether the first position state is preset state；

First processing units, for when the first position state is preset state, this vehicle to be in the first position shape The location information of location information and the target parking stall in parking environment during state in parking environment is as adaptive The input parameter of dynamic programming algorithm obtains the output parameter of the adaptive dynamic programming algorithm；

Second processing unit, for using the corresponding path of output parameter of the adaptive dynamic programming algorithm as target travel Path；

6. device according to claim 5, which is characterized in that

The judging unit, including：

Judgment module, for judging whether this vehicle collides with barrier；

Third determining module, for when Ben Che and the barrier collide, determining that the first position state is default State；

4th determining module, for when Ben Che and the barrier do not collide, determining that the first position state is not Preset state.

7. device according to claim 6, which is characterized in that the parking environment is divided at least two area equations Grid, each grid correspond to a location status, and described device further includes：

First detection unit, for when Ben Che and the barrier do not collide, detecting whether this vehicle reaches the target Parking stall；

Third processing unit, for when this vehicle reaches the target parking stall, when this vehicle is in the first position state The location information of location information and the target parking stall in the parking environment in parking environment is as described adaptive dynamic The input parameter of state planning algorithm, obtains the output parameter of the adaptive dynamic programming algorithm, and by the adaptive dynamic The corresponding path of output parameter of planning algorithm is as target travel path；

Second detection unit, for when this vehicle does not reach the target parking stall, detecting whether the mobile step number of this vehicle is more than in advance If step number, the mobile step number moves the grid number once passed through for this vehicle；

Fourth processing unit, for when the mobile step number of this vehicle is more than the default step number, this vehicle to be in described first The location information of location information and the target parking stall in parking environment when putting state in parking environment is as institute The input parameter of adaptive dynamic programming algorithm is stated, obtains the output parameter of the adaptive dynamic programming algorithm, and by described in The corresponding path of output parameter of adaptive dynamic programming algorithm is as target travel path；

Third determination unit, for the mobile step number of this vehicle be not more than the default step number when, according to the current state of this vehicle Determine third action control parameter, the third action control parameter includes the second starting force angle value, the direction of throttle or brake Second initial angle of disc spins.

8. device according to claim 5, which is characterized in that the first processing units, including：

First processing module, location information during for this vehicle to be in the first position state in parking environment and described Input parameter of location information of the target parking stall in the parking environment as the action network, obtains the action network The first output parameter；

Second processing module, for the enhancing signal group determined according to enhancing learning algorithm, using the evaluation network to described First output parameter of action network is assessed, and obtains the second output parameter of the action network, the enhancing signal group Including this vehicle in the corresponding enhancing signal of each location status；

The second processing unit, including：

Third processing module, for using the corresponding path of the second output parameter of the action network as the target travel road Diameter.