CN110083168A - Small-sized depopulated helicopter based on enhancing study determines high control method - Google Patents

Small-sized depopulated helicopter based on enhancing study determines high control method Download PDF

Info

Publication number
CN110083168A
CN110083168A CN201910369215.3A CN201910369215A CN110083168A CN 110083168 A CN110083168 A CN 110083168A CN 201910369215 A CN201910369215 A CN 201910369215A CN 110083168 A CN110083168 A CN 110083168A
Authority
CN
China
Prior art keywords
control
state
formula
helicopter
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910369215.3A
Other languages
Chinese (zh)
Inventor
鲜斌
安航
杨晋生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910369215.3A priority Critical patent/CN110083168A/en
Publication of CN110083168A publication Critical patent/CN110083168A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/0088Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/04Control of altitude or depth
    • G05D1/042Control of altitude or depth specially adapted for aircraft
    • G05D1/046Control of altitude or depth specially adapted for aircraft to counteract a perturbation, e.g. gust of wind

Abstract

The present invention relates to indoor small unmanned helicopter smart altitude control methods, and to propose a kind of continuous control method based on offline enhancing study, realization small-sized depopulated helicopter is in the case where there is external disturbance, the height-lock control in localizing environment indoors.Thus, the technical solution adopted by the present invention is that, small-sized depopulated helicopter based on enhancing study determines high control method, helicopter model and Reward Program based on Markov sequence are constructed first, the controller parameter optimized again using the method training iteration of stochastic approximation, is finally brought the controller after training into true Helicopter System and implements control.Present invention is mainly applied to indoor small unmanned aerial vehicle design occasions that manufactures.

Description

Small-sized depopulated helicopter based on enhancing study determines high control method
Technical field
The present invention relates to a kind of intelligent control method of indoor small unmanned helicopter height control, specially one kind is based on The enhancing learning control method of offline flying quality.
Background technique
When small-sized depopulated helicopter is executed such as hovering, low-speed forward flight and cruising flight, needs to have and stablize itself The ability of height, fixed high control are also the optimized integration of helicopter stabilized flight and other many complex controls.But height is controlled The design of device processed is also faced with the problem of many sternnesses.First, the height holding of aircraft cannot be by the control of pitch angle come complete At target.When aircraft on vertical passage by constant value disturbance torque when, flying speed vector can be gradually deviated from former direction, therefore Height can be induced to generate drift.Second, in the dynamic process that pitch angle tends towards stability, when the variable quantity at ship trajectory inclination angle When average value is not zero, it can also cause the change of flying height.Third, helicopter is in the case where determining high state, due to main rotor itself Biggish air stream effect can be generated, biggish drift can be generated to the stable state of fuselage and interfered.4th, ground effect, The factors such as aircraft electricity can also make helicopter change in short transverse.5th, it is analyzed from control operability, although can By control elevator or the size of motor power is controlled to control flying height, but controls flight by means of controlling thrust Height inertia is very big, and reaction is slow.
Therefore, based on the observer and controller for determining modelling, there is larger with real system in actually control Deviation, control effect is bad.State the performance characteristics of system and by way of directly acquiring and analyzing flying quality with this Basic engineering controller is a relatively effective thinking, and data-driven is a kind of method taken extensively.Hou Zhongsheng etc. People is made that detailed theoretical explanation and experimental verification (periodical: automation journal to the method;Author: Xu Jianxin, Hou Zhongsheng; It publishes days: 2009;Article title: data-driven system method is summarized;The page number: 668-675).Although this method can be realized The stability contorting of system, but there is no restrict for the speed and precision of error convergence, control effect is more general.And it is same Sample is the intensified learning policy control method based on system data by the Reward Program by observing system, so that system Stability contorting is realized along the mode of total optimization, can obtain more accurate control effect.
In recent years, enhancing learning control techniques have obtained certain application in unmanned aerial vehicle (UAV) control field.Wu Enda et al. is adopted Collect the posture and the offline flying quality of control amount of helicopter, designs offline enhancing learning control method.And it is advised using local linear The method of drawing constructs the Markov model of system, and the alternative manner declined by gradient finally converges to height controller Optimization control parameter (periodical: Communications of the Acm;Author: Coates A, Abbeel P, Ng A Y;Out Version days: 2009;Article title: Apprenticeship learning for helicopter control;The page number: 97- 105).This method does not need the prior information of known models, is a kind of ideal for complicated small-sized depopulated helicopter system Control method selection.
Summary of the invention
In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of continuous control side based on offline enhancing study Method, realization small-sized depopulated helicopter is in the case where there is external disturbance, the height-lock control in localizing environment indoors.For this purpose, this Invention adopts the technical scheme that the small-sized depopulated helicopter based on enhancing study determines high control method, and building first is based on horse The helicopter model and Reward Program of Er Kefu sequence, then the controller for using the method training iteration of stochastic approximation to be optimized Parameter finally brings the controller after training into true Helicopter System and implements control.
Specific step is as follows:
Step 1) defines Markov (Markov) decision process:
Markoff process is expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R), wherein S indicate environment is all can The state set of energy;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates shape State transfer function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, arrive Up to the probability distribution of subsequent time state s (t+1);γ is discount factor, and range is between 0 to 1;R is that corresponding actions is taken to obtain The Reward Program obtained, the P in the case where having known each statesaUnder the conditions of the environmental model of () and R, the method for applied dynamic programming is acquired Optimal policy;
Step 2) data processing and modeling:
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, rz(t),Successively indicate height, the speed of vertical direction and plus Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in data MatrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, ns The dimension of expression state, nuThe dimension for indicating control amount, adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system is defeated Data out are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment S (t+1) defines weight matrix W ∈ R according to definitionm×mFor diagonal matrix, most common gaussian kernel function is used here, and is One diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u (t)T];Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data, at above-mentioned place After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation (4):
V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process Factor enables v=0;
The continuous Robust Control Law design of step 3):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t) Such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed And acceleration;
The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that in a period of time Total Return value RtotalMaximum value is obtained, control amount expression formula weight at this time is desired optimal weights vector ωbest, strategy of the stochastic approximation strategy as parameter iteration is used here.
Stochastic approximation strategy comprises the concrete steps that, first init state, movement, control weight initial value, according to off-line data Obtained LWLR model obtains the state and Reward Program value at this time of subsequent time, according to subsequent time state and controls weight The movement of subsequent time is obtained, the process is constantly recycled, records the return value at each state moment, which is returned The summation of report value.And update is iterated to control weight accordingly, it constantly looks for until finding minimum Reward Program and corresponding control Weight processed, as training gained.
The features of the present invention and beneficial effect are:
1. the present invention using Helicopter System generate in offline flying quality, by designing based on Markov sequence Mathematical model realizes the intensified learning more new algorithm of control amount weight.It is not true Helicopter Dynamic Model has been effectively compensated for Qualitative and influence of the external disturbance to control performance, while control precision is improved compared to data-driven method.
2. the present invention devises the auto-flare system system based on micro indoor unmanned helicopter, flight is preferably realized The coordination of control performance, flight function and aircraft load limit.Intensified learning height control algolithm is applied in this system simultaneously In.
3. the stochastic approximation algorithm used when the perfect control amount weight iteration of the present invention is realized, so that Reward Program is being instructed It avoids falling into as gradient descent method in local optimum region in white silk, can be realized and find optimal solution in a wider context, Weight in optimization process can constantly be examined again simultaneously, find and be more suitable the suboptimum weight of Helicopter System.
Detailed description of the invention:
Fig. 1 is that control amount weight of the present invention updates algorithm logic block diagram;
Fig. 2 is experiment porch of the present invention;
Fig. 3 is the height error curve graph under simulated environment under unmanned helicopter PID control;
Fig. 4 is the height error curve graph under simulated environment under the control of unmanned helicopter intensified learning;
Fig. 5 is the altitude curve figure of the lower fixed high experiment unmanned helicopter of enhancing study control;
Fig. 6 is total screw pitch control amount curve graph of the lower fixed high experiment unmanned helicopter of enhancing study control.
Specific embodiment
In existing enhancing study control case, controlled device is mostly medium-sized helicopter, and is mostly outdoor control.This Invention is intended to provide a kind of continuous control method based on offline enhancing study, realizes that small-sized depopulated helicopter is having external disturbance In the case where, the height-lock control in localizing environment indoors.The technical solution adopted by the present invention is that building is based on Ma Erke first The helicopter model and Reward Program of husband's sequence, then the controller ginseng for using the method training iteration of stochastic approximation to be optimized Number, finally brings the controller after training into true Helicopter System and verifies.The following steps are included:
Step 1) defines Markov (Markov) decision process:
Markov approach extensive application, main concept in many Sequence Decision problems have decision moment, system State, behavior, return value and transition probability.It is specifically exactly to choose an action under a certain state of system to generate a report It fulfills, and determines the state at next moment by transition probability.Policymaker needs the plan for selecting to optimize by certain mode Slightly.The process can be expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R) wherein S indicates all possible shape of environment to State set;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates state transfer Function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, reaches next The probability distribution of moment state s (t+1), wherein t represents current time;γ is discount factor, and range is between 0 to 1;R is to adopt The Reward Program for taking corresponding actions to obtain.It is general to be that current state is shifted to NextState for the essence of Markov decision process in fact Rate and return value are solely dependent upon current state and i.e. actions to be taken, and unrelated with historic state and movement.Therefore it is obtaining P under each state of cicadasaUnder the conditions of the environmental model of () and R, the method for capableing of applied dynamic programming acquires optimal policy.
Step 2) micro helicopter flies control hardware design:
Experimental duties and environment condition analysis design requirement are primarily based on, mainly there is the following.1. due to unmanned plane control System real-time control with higher processed requires, therefore main control chip needs to have higher system dominant frequency.2. due to needing Carry out posture and position control, it is therefore desirable to using the Position and attitude sensor having compared with high measurement accuracy.3. since helicopter needs Own location information is obtained under the indoor environment based on optitrack positioning capturing system, it is therefore desirable to carry data transmission Zigbee in module.4. due to needing to collect the online posture information of helicopter as Offline training data, it is therefore desirable to carry number Transmission module ground station is transmitted.5. in order to save Air Diary data, design on piece eeprom chip carries out data Storage.6. considering the weight bearing requirement and cruising ability of the micro helicopter system when design, selection volume and weight as far as possible is small Chip and design corresponding winged control plate.
Consider the above demand, final master control borad designed size is 35mm × 35mm, using the Freescale K60 of 100 pins Chip, dominant frequency 100MHz;Using single MPU6500 chip as Position and attitude sensor, directly connected by SPI mode with main control chip It connects, the precision and frequency of the chip meet actual demand, small in size;The zigbee module of selection embedded antenna is communicated, greatly Weight bearing is reduced greatly.EEPROM module selects AT24C256, possesses 256K memory;Voltage stabilizing chip uses two XC6206P332 cores Piece generates 3.3V voltage, powers respectively to receiver and other sensors.In addition to this, the external serial ports of UART there are two design, one A I2C external tapping is ready for use on connection barometer.5 external 5v contact pins are finally designed to be used for 3 steering engines and 2 motor input controls Signal processed.The weight of complete machine is 94g, can satisfy the weight bearing requirement that helicopter takes off.
Step 3) data processing and modeling:
Due to the non-linear relation of helicopter lift and main rotor speed, blade vibration and surrounding flow that main rotor generates The series of factors such as interference, so that altitude channel has bigger fluctuation compared to horizontal attitude channel.This makes dynamic based on height The controller design of mechanical model and the no small error of physical presence.On the other hand, although it is highly logical in practical implementation Road can reach preferable control precision using PID controller, but it is necessary to have experiences abundant to change for tune ginseng process complexity Parameter area and trend cause the development cycle slow.Using the optimisation strategy of offline intensified learning, above-mentioned ask can be preferably solved Topic.
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, rz(t),Successively indicate height, the speed of vertical direction and plus Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in matrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, nsIt indicates The dimension of state, nuThe dimension for indicating control amount adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system output Data are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment S (t+1) defines weight matrix W ∈ R according to definitionm×mFor diagonal matrix, most common gaussian kernel function is used here, and is One diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u (t)T];Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data, at above-mentioned place After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation (4):
V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process Factor, such as ambient wind are disturbed.It needs to match a stochastic variable for each stator channel in principle, which passes through The method of maximal possibility estimation acquires corresponding parameter.But it is found in actual experiment, the system effect after stochastic variable is added There is no promotions, or even not as good as the system model that random signal is not added.Analysis reason may be in the collection process of data There are deviations, if adding stochastic variable, can further increase offset, the discreet value beyond system, therefore enable v=here 0。
The continuous Robust Control Law design of step 4):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t) Such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed And acceleration;
The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that Total Return Rtotal Maximum value is obtained, even if the desired optimal weights vector ω of control amount expression formula weight at this timebest.The plan of parameter iteration There are two types of slightly general, stochastic gradient descent and stochastic approximation strategy, the former fast speed, but it is easily trapped into local optimum;Afterwards There are certain uncertainties in practical operation by person, but are easier to break through the weight that local optimum searches out better effect Parameter, therefore second of parameter iteration strategy is used herein.Specifically, first at the beginning of init state, movement, control weight Value, the LWLR model obtained according to off-line data obtain the state and Reward Program value at this time of subsequent time.According to lower a period of time Quarter state and control restore the movement of subsequent time, constantly recycle the process, record the return value at each state moment, The return value summation that the sequence is generated.And update is iterated to control weight accordingly, it constantly looks for until finding minimum return Report function and corresponding control weight.As training gained.Detailed renewal process such as Fig. 1.
The present invention is intended to provide a kind of control method based on offline enhancing study, realization have in small-sized no helicopter In the case where system parameter uncertainty and external disturbance, fixed high stabilized flight is realized.Below with reference to experimental verification example pair The invention is described in detail.
Markov sequence is based on the technical solution adopted by the present invention is that establishing by the history flying quality for acquiring helicopter The mathematics nonlinear model of column then designs the expression formula of Reward Program and control amount, using stochastic approximation iterative strategy to control The weight of amount processed is updated, and achievees the purpose that optimal control effect.The following steps are included:
Step 1) defines Markov (Markov) decision process:
Markov approach extensive application, main concept in many Sequence Decision problems have decision moment, system State, behavior, return value and transition probability.It is specifically exactly to choose an action under a certain state of system to generate a report It fulfills, and determines the state at next moment by transition probability.Policymaker needs the plan for selecting to optimize by certain mode Slightly.The process can be expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R) wherein S indicates all possible shape of environment to State set;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates state transfer Function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, reaches next The probability distribution of moment state s (t+1), wherein t represents current time;γ is discount factor, and range is between 0 to 1;R is to adopt The Reward Program for taking corresponding actions to obtain.It is general to be that current state is shifted to NextState for the essence of Markov decision process in fact Rate and return value are solely dependent upon current state and i.e. actions to be taken, and unrelated with historic state and movement.Therefore it is obtaining P under each state of cicadasaUnder the conditions of the environmental model of () and R, the method for capableing of applied dynamic programming acquires optimal policy.
Step 2) micro helicopter flies control hardware design:
Experimental duties and environment condition analysis design requirement are primarily based on, mainly there is the following.1. due to unmanned plane control System real-time control with higher processed requires, therefore main control chip needs to have higher system dominant frequency.2. due to needing Carry out posture and position control, it is therefore desirable to using the Position and attitude sensor having compared with high measurement accuracy.3. since helicopter needs Own location information is obtained under the indoor environment based on optitrack positioning capturing system, it is therefore desirable to carry data transmission Zigbee in module.4. due to needing to collect the online posture information of helicopter as Offline training data, it is therefore desirable to carry number Transmission module ground station is transmitted.5. in order to save Air Diary data, design on piece eeprom chip carries out data Storage.6. considering the weight bearing requirement and cruising ability of the micro helicopter system when design, selection volume and weight as far as possible is small Chip and design corresponding winged control plate.
Consider the above demand, final master control borad designed size is 35mm × 35mm, using the Freescale K60 of 100 pins Chip, dominant frequency 100MHz;Using single MPU6500 chip as Position and attitude sensor, directly connected by SPI mode with main control chip It connects, the precision and frequency of the chip meet actual demand, small in size;The zigbee module of selection embedded antenna is communicated, greatly Weight bearing is reduced greatly.EEPROM module selects AT24C256, possesses 256K memory;Voltage stabilizing chip uses two XC6206P332 cores Piece generates 3.3V voltage, powers respectively to receiver and other sensors.In addition to this, the external serial ports of UART there are two design, one A I2C external tapping is ready for use on connection barometer.5 external 5v contact pins are finally designed to be used for 3 steering engines and 2 motor input controls Signal processed.The weight of complete machine is 94g, can satisfy the weight bearing requirement that helicopter takes off.
Step 3) data processing and modeling:
Due to the non-linear relation of helicopter lift and main rotor speed, blade vibration and surrounding flow that main rotor generates The series of factors such as interference, so that altitude channel has bigger fluctuation compared to horizontal attitude channel.This makes dynamic based on height The controller design of mechanical model and the no small error of physical presence.On the other hand, although it is highly logical in practical implementation Road can reach preferable control precision using PID controller, but it is necessary to have experiences abundant to change for tune ginseng process complexity Parameter area and trend cause the development cycle slow.Using the optimisation strategy of offline intensified learning, above-mentioned ask can be preferably solved Topic.
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, rz(t),Successively indicate height, the speed of vertical direction and plus Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in matrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, nsIt indicates The dimension of state, nuThe dimension for indicating control amount adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system output Data are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment S (t+1) defines weight matrix W ∈ R according to definitionm×mFor diagonal matrix, most common gaussian kernel function is used here, and is One diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u (t)T];Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data, at above-mentioned place After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation (4):
V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process Factor, such as ambient wind are disturbed.It needs to match a stochastic variable for each stator channel in principle, which passes through The method of maximal possibility estimation acquires corresponding parameter.But it is found in actual experiment, the system effect after stochastic variable is added There is no promotions, or even not as good as the system model that random signal is not added.Analysis reason may be in the collection process of data There are deviations, if adding stochastic variable, can further increase offset, the discreet value beyond system, therefore enable v=here 0。
The continuous Robust Control Law design of step 4):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t) Such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed And acceleration;
The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that Total Return Rtotal Maximum value is obtained, even if the desired optimal weights vector ω of control amount expression formula weight at this timebest.The plan of parameter iteration There are two types of slightly general, stochastic gradient descent and stochastic approximation strategy, the former fast speed, but it is easily trapped into local optimum;Afterwards There are certain uncertainties in practical operation by person, but are easier to break through the weight that local optimum searches out better effect Parameter, therefore second of parameter iteration strategy is used herein.Specifically, first at the beginning of init state, movement, control weight Value, the LWLR model obtained according to off-line data obtain the state and Reward Program value at this time of subsequent time.According to lower a period of time Quarter state and control restore the movement of subsequent time, constantly recycle the process, record the return value at each state moment, The return value summation that the sequence is generated.And update is iterated to control weight accordingly, it constantly looks for until finding minimum return Report function and corresponding control weight.As training gained.Detailed renewal process such as Fig. 1.
Lower part is specific experiment porch introduction and experimental procedure.
One, experiment porch brief introduction
The present invention is carried out using flight experiment platform in the full freedom degree room of small-sized depopulated helicopter of this research group design Real-time height-lock control experimental verification.Experiment porch is as shown in Fig. 2, it is controlled pair that the experiment porch, which opens up 150x helicopter with Asia, As airborne winged control plate is using Freescale k60 chip as main control chip, frequency 100MHz.Position and attitude sensor uses MPU6500 core Piece, the roll angle and pitch angle error range of measurement are ± 0.2 °, and the error range of yaw angle is ± 0.5 °.Meanwhile equipped with The zigbee chip of embedded antenna is used to the optitrack location information that satellite receiver is sent, and number passes chip to the ground It stands and sends fused posture information.
Two, flight experiment is verified
In order to verify the validity and practicability of controller in the present invention, the exploitation of this study group autonomous Design is utilized herein Unmanned helicopter posture flight experiment platform has carried out the emulation and real-time experimental verification of fixed high control.In simulations, from Fig. 3 It can be seen that there are overshoot during PID control, stablizing the time is about 10 seconds.See in Fig. 4, intensified learning controller energy Error, is directly reduced to the range of a very little, not the generation of overshoot, and after 5 seconds by enough modes based on optimal control Stable state is had reached, control speed is faster.In the experimental verification of intensified learning control, see from Fig. 5, Desired Height In the case where 120cm, height controls error substantially within 15cm.See from Fig. 6, corresponding control amount stabilization will not generate Fall in high range.In conclusion height error and control input in good reasonable range, to demonstrate this paper base In the reasonability of the controller of enhancing study.

Claims (3)

1. a kind of small-sized depopulated helicopter based on enhancing study determines high control method, characterized in that building is based on Ma Er first The helicopter model and Reward Program of section's husband's sequence, then the controller ginseng for using the method training iteration of stochastic approximation to be optimized Number finally brings the controller after training into true Helicopter System and implements control.
2. the small-sized depopulated helicopter as described in claim 1 based on enhancing study determines high control method, characterized in that specific Steps are as follows:
Step 1) defines Markov (Markov) decision process:
Markoff process is expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R), wherein S indicates that environment is all possible State set;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates that state turns Function is moved, is meant for each moment state s (t) ∈ S, is acted a (t) ∈ A, the movement is taken in this state, under arrival The probability distribution of one moment state s (t+1);γ is discount factor, and range is between 0 to 1;R is that corresponding actions is taken to obtain Reward Program, the P in the case where having known each statesaUnder the conditions of the environmental model of () and R, the method for applied dynamic programming acquires optimal Strategy;
Step 2) data processing and modeling:
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system,Successively indicate height, the speed and acceleration of vertical direction Degree, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups of phases altogether Hope each two minutes flying qualities of height, the control including three variables and corresponding steering engine vertical direction in system mode Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in data MatrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, ns The dimension of expression state, nuThe dimension for indicating control amount, adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system is defeated Data out are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state s (t+ at next moment 1), according to definition, weight matrix W ∈ R is definedm×mFor diagonal matrix, most common gaussian kernel function is used here, and is one Diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u(t)T]; Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data is completed in above-mentioned processing Afterwards, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula (4) of foundation:
V in formula is the noise bias compensation rate between real system, is exactly the other factors not accounted in modeling process, Enable v=0;
The continuous Robust Control Law design of step 3):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t) such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed and acceleration Degree;
The design for carrying out control iterative algorithm after Reward Program has been determined, has obtained control amount u using reference input and current state (t) value, i.e. u (t)=π (s (t), ω) realize expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that total in a period of time Return value RtotalMaximum value is obtained, control amount expression formula weight at this time is desired optimal weights vector ωbest, this In use strategy of the stochastic approximation strategy as parameter iteration.
3. the small-sized depopulated helicopter as described in claim 1 based on enhancing study determines high control method, characterized in that random Approximation Strategy comprises the concrete steps that, first init state, movement, control weight initial value, the LWLR mould obtained according to off-line data Type obtains the state and Reward Program value at this time of subsequent time, restores subsequent time according to subsequent time state and control Movement, constantly recycle the process, record the return value at each state moment, to the sequence generate return value sum.And Update is iterated to control weight accordingly, is constantly looked for until finding minimum Reward Program and corresponding control weight, as Training gained.
CN201910369215.3A 2019-05-05 2019-05-05 Small-sized depopulated helicopter based on enhancing study determines high control method Pending CN110083168A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910369215.3A CN110083168A (en) 2019-05-05 2019-05-05 Small-sized depopulated helicopter based on enhancing study determines high control method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910369215.3A CN110083168A (en) 2019-05-05 2019-05-05 Small-sized depopulated helicopter based on enhancing study determines high control method

Publications (1)

Publication Number Publication Date
CN110083168A true CN110083168A (en) 2019-08-02

Family

ID=67418677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910369215.3A Pending CN110083168A (en) 2019-05-05 2019-05-05 Small-sized depopulated helicopter based on enhancing study determines high control method

Country Status (1)

Country Link
CN (1) CN110083168A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
CN113049202A (en) * 2021-03-08 2021-06-29 中国地震局工程力学研究所 Local weighted regression correction method and system for acceleration integral displacement
CN113423060A (en) * 2021-06-22 2021-09-21 广东工业大学 Online optimization method for flight route of unmanned aerial communication platform
CN113892070A (en) * 2020-04-30 2022-01-04 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning
CN114967729A (en) * 2022-03-28 2022-08-30 广东工业大学 Multi-rotor unmanned aerial vehicle height control method and system
CN113892070B (en) * 2020-04-30 2024-04-26 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357166A (en) * 2017-04-29 2017-11-17 天津大学 The model-free adaption robust control method of small-sized depopulated helicopter
CN109683624A (en) * 2019-01-31 2019-04-26 天津大学 Nonlinear robust control method for small-sized depopulated helicopter gesture stability
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357166A (en) * 2017-04-29 2017-11-17 天津大学 The model-free adaption robust control method of small-sized depopulated helicopter
CN109683624A (en) * 2019-01-31 2019-04-26 天津大学 Nonlinear robust control method for small-sized depopulated helicopter gesture stability
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GAO W 等: "Sampled-data-based adaptive optimal output feedback control of a 2-degree-of-freedom helicopter", 《IET CONTROL THEORY & APPLICATIONS》 *
苏立军 等: "基于强化学习的四旋翼高度控制器设计", 《测控技术》 *
蔡文澜: "基于增强学习的小型无人直升机控制方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技II辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026147A (en) * 2019-12-25 2020-04-17 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
CN111026147B (en) * 2019-12-25 2021-01-08 北京航空航天大学 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
CN113892070A (en) * 2020-04-30 2022-01-04 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning
CN113892070B (en) * 2020-04-30 2024-04-26 乐天集团股份有限公司 Learning device, information processing device, and control model for completing learning
CN113049202A (en) * 2021-03-08 2021-06-29 中国地震局工程力学研究所 Local weighted regression correction method and system for acceleration integral displacement
CN113049202B (en) * 2021-03-08 2022-07-12 中国地震局工程力学研究所 Local weighted regression correction method and system for acceleration integral displacement
CN113423060A (en) * 2021-06-22 2021-09-21 广东工业大学 Online optimization method for flight route of unmanned aerial communication platform
CN113423060B (en) * 2021-06-22 2022-05-10 广东工业大学 Online optimization method for flight route of unmanned aerial communication platform
CN114967729A (en) * 2022-03-28 2022-08-30 广东工业大学 Multi-rotor unmanned aerial vehicle height control method and system

Similar Documents

Publication Publication Date Title
CN110083168A (en) Small-sized depopulated helicopter based on enhancing study determines high control method
CN110806759B (en) Aircraft route tracking method based on deep reinforcement learning
CN110502033B (en) Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning
CN105607473B (en) The attitude error Fast Convergent self-adaptation control method of small-sized depopulated helicopter
CN109625333A (en) A kind of space non-cooperative target catching method based on depth enhancing study
Nie et al. Three-dimensional path-following control of a robotic airship with reinforcement learning
Lu et al. Real-time simulation system for UAV based on Matlab/Simulink
Moshayedi et al. The quadrotor dynamic modeling and study of meta-heuristic algorithms performance on optimization of PID controller index to control angles and tracking the route
dos Santos et al. Design of attitude and path tracking controllers for quad-rotor robots using reinforcement learning
Zhang et al. Recurrent neural network-based model predictive control for multiple unmanned quadrotor formation flight
CN111221346A (en) Method for optimizing PID (proportion integration differentiation) control four-rotor aircraft flight by crowd search algorithm
CN107065897A (en) Three Degree Of Freedom helicopter explicit model forecast Control Algorithm
CN110135076A (en) A kind of holder mechanical structure multiple target integrated optimization method based on ISIGHT associative simulation
Salamat et al. Adaptive nonlinear PID control for a quadrotor UAV using particle swarm optimization
Kose et al. Simultaneous design of morphing hexarotor and autopilot system by using deep neural network and SPSA
Grauer A learn-to-fly approach for adaptively tuning flight control systems
Zhou et al. Nonlinear system identification and trajectory tracking control for a flybarless unmanned helicopter: theory and experiment
Ferdaus et al. Fuzzy clustering based modelling and adaptive controlling of a flapping wing micro air vehicle
CN116301007A (en) Intensive task path planning method for multi-quad-rotor unmanned helicopter based on reinforcement learning
Flores et al. Implementation of a neural network for nonlinearities estimation in a tail-sitter aircraft
Zhou et al. Parameter Optimization on FNN/PID compound controller for a three-axis inertially stabilized platform for aerial remote sensing applications
CN115407661A (en) Multi-unmanned aerial vehicle system nonlinear robust tracking control method based on azimuth measurement information
CN114757086A (en) Multi-rotor unmanned aerial vehicle real-time remaining service life prediction method and system
Ahsan et al. Grey box modeling of lateral-directional dynamics of a uav through system identification
Xu et al. UAV swarm communication aware formation control via deep Q network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190802