CN110083168A

CN110083168A - Small-sized depopulated helicopter based on enhancing study determines high control method

Info

Publication number: CN110083168A
Application number: CN201910369215.3A
Authority: CN
Inventors: 鲜斌; 安航; 杨晋生
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2019-05-05
Filing date: 2019-05-05
Publication date: 2019-08-02

Abstract

The present invention relates to indoor small unmanned helicopter smart altitude control methods, and to propose a kind of continuous control method based on offline enhancing study, realization small-sized depopulated helicopter is in the case where there is external disturbance, the height-lock control in localizing environment indoors.Thus, the technical solution adopted by the present invention is that, small-sized depopulated helicopter based on enhancing study determines high control method, helicopter model and Reward Program based on Markov sequence are constructed first, the controller parameter optimized again using the method training iteration of stochastic approximation, is finally brought the controller after training into true Helicopter System and implements control.Present invention is mainly applied to indoor small unmanned aerial vehicle design occasions that manufactures.

Description

Small-sized depopulated helicopter based on enhancing study determines high control method

Technical field

The present invention relates to a kind of intelligent control method of indoor small unmanned helicopter height control, specially one kind is based on The enhancing learning control method of offline flying quality.

Background technique

When small-sized depopulated helicopter is executed such as hovering, low-speed forward flight and cruising flight, needs to have and stablize itself The ability of height, fixed high control are also the optimized integration of helicopter stabilized flight and other many complex controls.But height is controlled The design of device processed is also faced with the problem of many sternnesses.First, the height holding of aircraft cannot be by the control of pitch angle come complete At target.When aircraft on vertical passage by constant value disturbance torque when, flying speed vector can be gradually deviated from former direction, therefore Height can be induced to generate drift.Second, in the dynamic process that pitch angle tends towards stability, when the variable quantity at ship trajectory inclination angle When average value is not zero, it can also cause the change of flying height.Third, helicopter is in the case where determining high state, due to main rotor itself Biggish air stream effect can be generated, biggish drift can be generated to the stable state of fuselage and interfered.4th, ground effect, The factors such as aircraft electricity can also make helicopter change in short transverse.5th, it is analyzed from control operability, although can By control elevator or the size of motor power is controlled to control flying height, but controls flight by means of controlling thrust Height inertia is very big, and reaction is slow.

Therefore, based on the observer and controller for determining modelling, there is larger with real system in actually control Deviation, control effect is bad.State the performance characteristics of system and by way of directly acquiring and analyzing flying quality with this Basic engineering controller is a relatively effective thinking, and data-driven is a kind of method taken extensively.Hou Zhongsheng etc. People is made that detailed theoretical explanation and experimental verification (periodical: automation journal to the method；Author: Xu Jianxin, Hou Zhongsheng； It publishes days: 2009；Article title: data-driven system method is summarized；The page number: 668-675).Although this method can be realized The stability contorting of system, but there is no restrict for the speed and precision of error convergence, control effect is more general.And it is same Sample is the intensified learning policy control method based on system data by the Reward Program by observing system, so that system Stability contorting is realized along the mode of total optimization, can obtain more accurate control effect.

In recent years, enhancing learning control techniques have obtained certain application in unmanned aerial vehicle (UAV) control field.Wu Enda et al. is adopted Collect the posture and the offline flying quality of control amount of helicopter, designs offline enhancing learning control method.And it is advised using local linear The method of drawing constructs the Markov model of system, and the alternative manner declined by gradient finally converges to height controller Optimization control parameter (periodical: Communications of the Acm；Author: Coates A, Abbeel P, Ng A Y；Out Version days: 2009；Article title: Apprenticeship learning for helicopter control；The page number: 97- 105).This method does not need the prior information of known models, is a kind of ideal for complicated small-sized depopulated helicopter system Control method selection.

Summary of the invention

In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of continuous control side based on offline enhancing study Method, realization small-sized depopulated helicopter is in the case where there is external disturbance, the height-lock control in localizing environment indoors.For this purpose, this Invention adopts the technical scheme that the small-sized depopulated helicopter based on enhancing study determines high control method, and building first is based on horse The helicopter model and Reward Program of Er Kefu sequence, then the controller for using the method training iteration of stochastic approximation to be optimized Parameter finally brings the controller after training into true Helicopter System and implements control.

Specific step is as follows:

Step 1) defines Markov (Markov) decision process:

Markoff process is expressed as hexa-atomic group of (S, J, A, { a P_sa() }, γ, R), wherein S indicate environment is all can The state set of energy；Objective function when J is decision；A indicates the set of actions in motion space；P_sa: S × A → S' indicates shape State transfer function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, arrive Up to the probability distribution of subsequent time state s (t+1)；γ is discount factor, and range is between 0 to 1；R is that corresponding actions is taken to obtain The Reward Program obtained, the P in the case where having known each state_saUnder the conditions of the environmental model of () and R, the method for applied dynamic programming is acquired Optimal policy；

Step 2) data processing and modeling:

In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, r_z(t),Successively indicate height, the speed of vertical direction and plus Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode Amount；After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in data MatrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, n_s The dimension of expression state, n_uThe dimension for indicating control amount, adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system is defeated Data out are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):

Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment S (t+1) defines weight matrix W ∈ R according to definition^m×mFor diagonal matrix, most common gaussian kernel function is used here, and is One diagonal matrix, the weight value expression such as formula (3) of matrix:

X in formula⁽ⁱ⁾It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current time^T u (t)^T]；Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ω_iiThe range influenced by priori data, at above-mentioned place After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation (4):

V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process Factor enables v=0；

The continuous Robust Control Law design of step 3):

First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t) Such as formula (5):

Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):

R in the formula_z,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed And acceleration；

The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):

Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that in a period of time Total Return value R_totalMaximum value is obtained, control amount expression formula weight at this time is desired optimal weights vector ω_best, strategy of the stochastic approximation strategy as parameter iteration is used here.

Stochastic approximation strategy comprises the concrete steps that, first init state, movement, control weight initial value, according to off-line data Obtained LWLR model obtains the state and Reward Program value at this time of subsequent time, according to subsequent time state and controls weight The movement of subsequent time is obtained, the process is constantly recycled, records the return value at each state moment, which is returned The summation of report value.And update is iterated to control weight accordingly, it constantly looks for until finding minimum Reward Program and corresponding control Weight processed, as training gained.

The features of the present invention and beneficial effect are:

1. the present invention using Helicopter System generate in offline flying quality, by designing based on Markov sequence Mathematical model realizes the intensified learning more new algorithm of control amount weight.It is not true Helicopter Dynamic Model has been effectively compensated for Qualitative and influence of the external disturbance to control performance, while control precision is improved compared to data-driven method.

2. the present invention devises the auto-flare system system based on micro indoor unmanned helicopter, flight is preferably realized The coordination of control performance, flight function and aircraft load limit.Intensified learning height control algolithm is applied in this system simultaneously In.

3. the stochastic approximation algorithm used when the perfect control amount weight iteration of the present invention is realized, so that Reward Program is being instructed It avoids falling into as gradient descent method in local optimum region in white silk, can be realized and find optimal solution in a wider context, Weight in optimization process can constantly be examined again simultaneously, find and be more suitable the suboptimum weight of Helicopter System.

Detailed description of the invention:

Fig. 1 is that control amount weight of the present invention updates algorithm logic block diagram；

Fig. 2 is experiment porch of the present invention；

Fig. 3 is the height error curve graph under simulated environment under unmanned helicopter PID control；

Fig. 4 is the height error curve graph under simulated environment under the control of unmanned helicopter intensified learning；

Fig. 5 is the altitude curve figure of the lower fixed high experiment unmanned helicopter of enhancing study control；

Fig. 6 is total screw pitch control amount curve graph of the lower fixed high experiment unmanned helicopter of enhancing study control.

Specific embodiment

In existing enhancing study control case, controlled device is mostly medium-sized helicopter, and is mostly outdoor control.This Invention is intended to provide a kind of continuous control method based on offline enhancing study, realizes that small-sized depopulated helicopter is having external disturbance In the case where, the height-lock control in localizing environment indoors.The technical solution adopted by the present invention is that building is based on Ma Erke first The helicopter model and Reward Program of husband's sequence, then the controller ginseng for using the method training iteration of stochastic approximation to be optimized Number, finally brings the controller after training into true Helicopter System and verifies.The following steps are included:

Step 1) defines Markov (Markov) decision process:

Markov approach extensive application, main concept in many Sequence Decision problems have decision moment, system State, behavior, return value and transition probability.It is specifically exactly to choose an action under a certain state of system to generate a report It fulfills, and determines the state at next moment by transition probability.Policymaker needs the plan for selecting to optimize by certain mode Slightly.The process can be expressed as hexa-atomic group of (S, J, A, { a P_sa() }, γ, R) wherein S indicates all possible shape of environment to State set；Objective function when J is decision；A indicates the set of actions in motion space；P_sa: S × A → S' indicates state transfer Function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, reaches next The probability distribution of moment state s (t+1), wherein t represents current time；γ is discount factor, and range is between 0 to 1；R is to adopt The Reward Program for taking corresponding actions to obtain.It is general to be that current state is shifted to NextState for the essence of Markov decision process in fact Rate and return value are solely dependent upon current state and i.e. actions to be taken, and unrelated with historic state and movement.Therefore it is obtaining P under each state of cicada_saUnder the conditions of the environmental model of () and R, the method for capableing of applied dynamic programming acquires optimal policy.

Step 2) micro helicopter flies control hardware design:

Experimental duties and environment condition analysis design requirement are primarily based on, mainly there is the following.1. due to unmanned plane control System real-time control with higher processed requires, therefore main control chip needs to have higher system dominant frequency.2. due to needing Carry out posture and position control, it is therefore desirable to using the Position and attitude sensor having compared with high measurement accuracy.3. since helicopter needs Own location information is obtained under the indoor environment based on optitrack positioning capturing system, it is therefore desirable to carry data transmission Zigbee in module.4. due to needing to collect the online posture information of helicopter as Offline training data, it is therefore desirable to carry number Transmission module ground station is transmitted.5. in order to save Air Diary data, design on piece eeprom chip carries out data Storage.6. considering the weight bearing requirement and cruising ability of the micro helicopter system when design, selection volume and weight as far as possible is small Chip and design corresponding winged control plate.

Consider the above demand, final master control borad designed size is 35mm × 35mm, using the Freescale K60 of 100 pins Chip, dominant frequency 100MHz；Using single MPU6500 chip as Position and attitude sensor, directly connected by SPI mode with main control chip It connects, the precision and frequency of the chip meet actual demand, small in size；The zigbee module of selection embedded antenna is communicated, greatly Weight bearing is reduced greatly.EEPROM module selects AT24C256, possesses 256K memory；Voltage stabilizing chip uses two XC6206P332 cores Piece generates 3.3V voltage, powers respectively to receiver and other sensors.In addition to this, the external serial ports of UART there are two design, one A I2C external tapping is ready for use on connection barometer.5 external 5v contact pins are finally designed to be used for 3 steering engines and 2 motor input controls Signal processed.The weight of complete machine is 94g, can satisfy the weight bearing requirement that helicopter takes off.

Step 3) data processing and modeling:

Due to the non-linear relation of helicopter lift and main rotor speed, blade vibration and surrounding flow that main rotor generates The series of factors such as interference, so that altitude channel has bigger fluctuation compared to horizontal attitude channel.This makes dynamic based on height The controller design of mechanical model and the no small error of physical presence.On the other hand, although it is highly logical in practical implementation Road can reach preferable control precision using PID controller, but it is necessary to have experiences abundant to change for tune ginseng process complexity Parameter area and trend cause the development cycle slow.Using the optimisation strategy of offline intensified learning, above-mentioned ask can be preferably solved Topic.

In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, r_z(t),Successively indicate height, the speed of vertical direction and plus Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode Amount；After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in matrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, n_sIt indicates The dimension of state, n_uThe dimension for indicating control amount adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system output Data are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):

V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process Factor, such as ambient wind are disturbed.It needs to match a stochastic variable for each stator channel in principle, which passes through The method of maximal possibility estimation acquires corresponding parameter.But it is found in actual experiment, the system effect after stochastic variable is added There is no promotions, or even not as good as the system model that random signal is not added.Analysis reason may be in the collection process of data There are deviations, if adding stochastic variable, can further increase offset, the discreet value beyond system, therefore enable v=here 0。

The continuous Robust Control Law design of step 4):

Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that Total Return R_total Maximum value is obtained, even if the desired optimal weights vector ω of control amount expression formula weight at this time_best.The plan of parameter iteration There are two types of slightly general, stochastic gradient descent and stochastic approximation strategy, the former fast speed, but it is easily trapped into local optimum；Afterwards There are certain uncertainties in practical operation by person, but are easier to break through the weight that local optimum searches out better effect Parameter, therefore second of parameter iteration strategy is used herein.Specifically, first at the beginning of init state, movement, control weight Value, the LWLR model obtained according to off-line data obtain the state and Reward Program value at this time of subsequent time.According to lower a period of time Quarter state and control restore the movement of subsequent time, constantly recycle the process, record the return value at each state moment, The return value summation that the sequence is generated.And update is iterated to control weight accordingly, it constantly looks for until finding minimum return Report function and corresponding control weight.As training gained.Detailed renewal process such as Fig. 1.

The present invention is intended to provide a kind of control method based on offline enhancing study, realization have in small-sized no helicopter In the case where system parameter uncertainty and external disturbance, fixed high stabilized flight is realized.Below with reference to experimental verification example pair The invention is described in detail.

Markov sequence is based on the technical solution adopted by the present invention is that establishing by the history flying quality for acquiring helicopter The mathematics nonlinear model of column then designs the expression formula of Reward Program and control amount, using stochastic approximation iterative strategy to control The weight of amount processed is updated, and achievees the purpose that optimal control effect.The following steps are included:

Step 1) defines Markov (Markov) decision process:

Step 2) micro helicopter flies control hardware design:

Step 3) data processing and modeling:

The continuous Robust Control Law design of step 4):

Lower part is specific experiment porch introduction and experimental procedure.

One, experiment porch brief introduction

The present invention is carried out using flight experiment platform in the full freedom degree room of small-sized depopulated helicopter of this research group design Real-time height-lock control experimental verification.Experiment porch is as shown in Fig. 2, it is controlled pair that the experiment porch, which opens up 150x helicopter with Asia, As airborne winged control plate is using Freescale k60 chip as main control chip, frequency 100MHz.Position and attitude sensor uses MPU6500 core Piece, the roll angle and pitch angle error range of measurement are ± 0.2 °, and the error range of yaw angle is ± 0.5 °.Meanwhile equipped with The zigbee chip of embedded antenna is used to the optitrack location information that satellite receiver is sent, and number passes chip to the ground It stands and sends fused posture information.

Two, flight experiment is verified

In order to verify the validity and practicability of controller in the present invention, the exploitation of this study group autonomous Design is utilized herein Unmanned helicopter posture flight experiment platform has carried out the emulation and real-time experimental verification of fixed high control.In simulations, from Fig. 3 It can be seen that there are overshoot during PID control, stablizing the time is about 10 seconds.See in Fig. 4, intensified learning controller energy Error, is directly reduced to the range of a very little, not the generation of overshoot, and after 5 seconds by enough modes based on optimal control Stable state is had reached, control speed is faster.In the experimental verification of intensified learning control, see from Fig. 5, Desired Height In the case where 120cm, height controls error substantially within 15cm.See from Fig. 6, corresponding control amount stabilization will not generate Fall in high range.In conclusion height error and control input in good reasonable range, to demonstrate this paper base In the reasonability of the controller of enhancing study.

Claims

1. a kind of small-sized depopulated helicopter based on enhancing study determines high control method, characterized in that building is based on Ma Er first The helicopter model and Reward Program of section's husband's sequence, then the controller ginseng for using the method training iteration of stochastic approximation to be optimized Number finally brings the controller after training into true Helicopter System and implements control.

2. the small-sized depopulated helicopter as described in claim 1 based on enhancing study determines high control method, characterized in that specific Steps are as follows:

Step 1) defines Markov (Markov) decision process:

Markoff process is expressed as hexa-atomic group of (S, J, A, { a P_sa() }, γ, R), wherein S indicates that environment is all possible State set；Objective function when J is decision；A indicates the set of actions in motion space；P_sa: S × A → S' indicates that state turns Function is moved, is meant for each moment state s (t) ∈ S, is acted a (t) ∈ A, the movement is taken in this state, under arrival The probability distribution of one moment state s (t+1)；γ is discount factor, and range is between 0 to 1；R is that corresponding actions is taken to obtain Reward Program, the P in the case where having known each state_saUnder the conditions of the environmental model of () and R, the method for applied dynamic programming acquires optimal Strategy；

Step 2) data processing and modeling:

In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system,Successively indicate height, the speed and acceleration of vertical direction Degree, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups of phases altogether Hope each two minutes flying qualities of height, the control including three variables and corresponding steering engine vertical direction in system mode Amount；After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in data MatrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, n_s The dimension of expression state, n_uThe dimension for indicating control amount, adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system is defeated Data out are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):

Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state s (t+ at next moment 1), according to definition, weight matrix W ∈ R is defined^m×mFor diagonal matrix, most common gaussian kernel function is used here, and is one Diagonal matrix, the weight value expression such as formula (3) of matrix:

X in formula⁽ⁱ⁾It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current time^T u(t)^T]； Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ω_iiThe range influenced by priori data is completed in above-mentioned processing Afterwards, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula (4) of foundation:

V in formula is the noise bias compensation rate between real system, is exactly the other factors not accounted in modeling process, Enable v=0；

The continuous Robust Control Law design of step 3):

R in the formula_z,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed and acceleration Degree；

The design for carrying out control iterative algorithm after Reward Program has been determined, has obtained control amount u using reference input and current state (t) value, i.e. u (t)=π (s (t), ω) realize expression formula such as formula (7):

Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that total in a period of time Return value R_totalMaximum value is obtained, control amount expression formula weight at this time is desired optimal weights vector ω_best, this In use strategy of the stochastic approximation strategy as parameter iteration.

3. the small-sized depopulated helicopter as described in claim 1 based on enhancing study determines high control method, characterized in that random Approximation Strategy comprises the concrete steps that, first init state, movement, control weight initial value, the LWLR mould obtained according to off-line data Type obtains the state and Reward Program value at this time of subsequent time, restores subsequent time according to subsequent time state and control Movement, constantly recycle the process, record the return value at each state moment, to the sequence generate return value sum.And Update is iterated to control weight accordingly, is constantly looked for until finding minimum Reward Program and corresponding control weight, as Training gained.